Pagination • beekeeper

Many APIs implement some form of pagination: they break up large datasets into “pages” of results, and return a single page at a time. To get the full dataset, we need to make multiple requests, and combine the results.

Unfortunately, there isn’t a standard way to document API pagination. Therefore, we cannot automatically generate pagination code. You will need to edit your 010-call.R file to implement pagination.

Finding pagination information

Before you can implement pagination in your package, you will need to find out how the API implements pagination. You can usually find this information in the API documentation. Sometimes this information is in a separate “Pagination” section at the top of the documentation. Often it is described in the individual endpoint documentation (even if it is separately described in its own section). If it isn’t clearly described, watch for pagination-related endpoint parameters, such as page, pageSize, perPage limit, offset, or cursor.

For more tips on finding pagination information, see How can I get a lot of data from an API? in Web APIs with R.

Implementing pagination

The req_perform_iterative() function from {httr2} helps to implement pagination. It uses the request and some helper functions to create a new request to fetch the next page. This family of functions is experimental, so be sure to check the latest documentation in case the functions have changed.

To implement pagination in your package, you will need to edit the 010-call.R file generated by {beekeeper}. By default, the perform step is handled by nectar::req_perform_opinionated()

resp <- nectar::req_perform_opinionated(req)

This function calls httr2::req_perform() if you only give it a req object, or httr2::req_perform_iterative() if you supply an iteration helper function in the next_req parameter. For example, if every endpoint of your API uses a page parameter to paginate, you could replace the line above with something like this:

is_complete <- function(resp) {
  as.logical(length(httr2::resp_body_json(resp)$data))
}
resp <- nectar::req_perform_opinionated(
  req, 
  next_req = httr2::iterate_with_offset("page", resp_complete = is_complete)
)

By default, nectar::req_perform_opinionated() only returns 2 responses (max_reqs = 2). Once you have verified that your pagination strategy works, you will likely want to increase this limit, usually to Inf. nectar::req_perform_opinionated() also implements a basic httr2::req_retry() to try each request up to 3 times, using the default httr2::retry_retry() settings to decide if a failure is transient.

More complicated pagination

If you would like to implement more complex pagination, or apply other transformations to the req object such as httr2::req_retry() or httr2::req_throttle(), you can create your own perform function. I name these functions {api_abbr}_req_perform(). For example, this is the perform function for the {fecapi} package:

.fec_req_perform <- function(req,
                             pagination,
                             per_page,
                             max_results,
                             max_reqs,
                             call) {
  next_req <- .choose_pagination_fn(pagination, call = call)
  max_reqs <- min(max_reqs, ceiling(max_results / per_page))
  nectar::req_perform_opinionated(
    req,
    next_req = next_req,
    max_reqs = max_reqs
  )
}

.choose_pagination_fn <- function(pagination, call = rlang::caller_env()) {
  pagination <- .validate_pagination(pagination, call)
  switch(pagination,
    basic = .iterator_fn_basic(),
    none = NULL
  )
}

.validate_pagination <- function(pagination, call = rlang::caller_env()) {
  rlang::arg_match0(
    pagination,
    c("none", "basic"),
    error_call = call
  )
}

.iterator_fn_basic <- function() {
  httr2::iterate_with_offset(
    "page",
    resp_pages = function(resp) {
      httr2::resp_body_json(resp)$pagination$pages
    }
  )
}

Within 010-call.R, I apply the function like this:

  resp <- .fec_req_perform(
    req,
    pagination = pagination,
    per_page = query$per_page,
    max_results = max_results,
    max_reqs = max_reqs
  )

Help us improve

If you find a pattern in pagination implementation from the API description and/or endpoint function parameters, please submit an issue or a pull request to help us improve the output of this package.