Many APIs implement some form of pagination: they break up large datasets into “pages” of results, and return a single page at a time. To get the full dataset, we need to make multiple requests, and combine the results.
Unfortunately, there isn’t a standard way to document API pagination.
Therefore, we cannot automatically generate pagination code. You will
need to edit your 010-call.R
file to implement
pagination.
Finding pagination information
Before you can implement pagination in your package, you will need to
find out how the API implements pagination. You can usually find this
information in the API documentation. Sometimes this information is in a
separate “Pagination” section at the top of the documentation. Often it
is described in the individual endpoint documentation (even if it is
separately described in its own section). If it isn’t clearly described,
watch for pagination-related endpoint parameters, such as
page
, pageSize
, perPage
limit
, offset
, or cursor
.
For more tips on finding pagination information, see How can I get a lot of data from an API? in Web APIs with R.
Implementing pagination
The req_perform_iterative() function from {httr2} helps to implement pagination. It uses the request and some helper functions to create a new request to fetch the next page. This family of functions is experimental, so be sure to check the latest documentation in case the functions have changed.
To implement pagination in your package, you will need to edit the
010-call.R
file generated by {beekeeper}. By default, the
perform step is handled by
nectar::req_perform_opinionated()
resp <- nectar::req_perform_opinionated(req)
This function calls httr2::req_perform()
if you only
give it a req
object, or
httr2::req_perform_iterative()
if you supply an iteration
helper function in the next_req
parameter. For example, if
every endpoint of your API uses a page
parameter to
paginate, you could replace the line above with something like this:
is_complete <- function(resp) {
as.logical(length(httr2::resp_body_json(resp)$data))
}
resp <- nectar::req_perform_opinionated(
req,
next_req = httr2::iterate_with_offset("page", resp_complete = is_complete)
)
By default, nectar::req_perform_opinionated()
only
returns 2 responses (max_reqs = 2
). Once you have verified
that your pagination strategy works, you will likely want to increase
this limit, usually to Inf
.
nectar::req_perform_opinionated()
also implements a basic
httr2::req_retry()
to try each request up to 3 times, using
the default httr2::retry_retry()
settings to decide if a
failure is transient.
More complicated pagination
If you would like to implement more complex pagination, or apply
other transformations to the req
object such as
httr2::req_retry()
or httr2::req_throttle()
,
you can create your own perform
function. I name these
functions {api_abbr}_req_perform()
. For example, this is
the perform
function for the {fecapi} package:
.fec_req_perform <- function(req,
pagination,
per_page,
max_results,
max_reqs,
call) {
next_req <- .choose_pagination_fn(pagination, call = call)
max_reqs <- min(max_reqs, ceiling(max_results / per_page))
nectar::req_perform_opinionated(
req,
next_req = next_req,
max_reqs = max_reqs
)
}
.choose_pagination_fn <- function(pagination, call = rlang::caller_env()) {
pagination <- .validate_pagination(pagination, call)
switch(pagination,
basic = .iterator_fn_basic(),
none = NULL
)
}
.validate_pagination <- function(pagination, call = rlang::caller_env()) {
rlang::arg_match0(
pagination,
c("none", "basic"),
error_call = call
)
}
.iterator_fn_basic <- function() {
httr2::iterate_with_offset(
"page",
resp_pages = function(resp) {
httr2::resp_body_json(resp)$pagination$pages
}
)
}
Within 010-call.R
, I apply the function like this:
resp <- .fec_req_perform(
req,
pagination = pagination,
per_page = query$per_page,
max_results = max_results,
max_reqs = max_reqs
)
Help us improve
If you find a pattern in pagination implementation from the API description and/or endpoint function parameters, please submit an issue or a pull request to help us improve the output of this package.