Handling CRAN Requirements for Web API R Packages

programming
r
Author

James Balamuta

Published

November 2, 2024

When developing R packages that interact with web APIs, care is required due to CRAN’s policy on internet connectivity. If your not careful, your package could fail CRAN checks on submission or later down the road. If either of these cases happen, you have to re-submit your package with the issues fixed or face the possibility of your package not being put on CRAN or if its already there being archived (removed) from CRAN. Having been bitten in the past on the {ucimlrepo} package by this policy, I’ve learned some valuable lessons about implementing robust internet-dependent functionality while satisfying CRAN’s policies.

The CRAN Policy

CRAN’s policy states that:

Packages which use Internet resources should fail gracefully with an informative message if the resource is not available or has changed (and not give a check warning nor error).

CRAN Repository Policy - Revision: 6286

This specifically refers to behavior during R CMD check, not necessarily during regular package usage.

Surviving CRAN Checks

For the remainder of the post, we’ll focus on how to handle internet connectivity issues in R packages that interact with web APIs to meet CRAN’s requirements and our own for retaining informative error messages during regular usage. To that end, we’ve created a flow chat showing the desired implementation for a Web API function that handles successful and error scenarios gracefully. The blue-colored decision nodes (internet check and API request) represent critical decision points in the flow, while the gray nodes indicate process steps. We’ve also included two subgraphs to show the balance between regular usage and CRAN check environments.

flowchart LR
    A[API Function Call] --> B{Internet Check}
    B -->|Available| C{API Request}
    B -->|Not Available| D[Graceful Failure]
    
    C -->|Success| E[Process Response]
    C -->|Error| F[Handle API Error]
    
    D --> G[User Feedback]
    F --> G
    E --> H[Return Results]
    
    subgraph Regular["Regular Usage"]
        D
        F
        G
    end
    
    subgraph CRAN["CRAN Check"]
        B
        C
        E
        H
    end
    
    style A fill:#f8f9fa,stroke:#6c757d
    style B fill:#4582ec,stroke:#4582ec,color:#ffffff
    style C fill:#4582ec,stroke:#4582ec,color:#ffffff
    style D fill:#f8f9fa,stroke:#6c757d
    style E fill:#f8f9fa,stroke:#6c757d
    style F fill:#f8f9fa,stroke:#6c757d
    style G fill:#f8f9fa,stroke:#6c757d
    style H fill:#f8f9fa,stroke:#6c757d
    style Regular fill:#f8f9fa,stroke:#6c757d
    style CRAN fill:#f8f9fa,stroke:#6c757d

Documentation Examples That Won’t Fail CRAN Checks

When writing documentation examples for functions that require internet connectivity, we need to be cognizant of CRAN’s check environment. In particular, CRAN runs package checks in a non-interactive environment where internet access may be limited or unavailable. For a Web API package, the limitation of internet access is hugely problematic. So, we need to ensure that our examples do not fail during package checking by designing conditions that allow the examples to run only in interactive sessions.

The overview of the process is shown in the flowchart below:

flowchart LR
    A[R Package with API Functions] --> B{CRAN Check Environment}
    B -->|Interactive| C[Run Examples with examplesIf]
    B -->|Not Interactive| D[Skip Examples]
    
    C --> E{Check Results}
    D --> E
    
    E -->|Pass| F[Package Accepted]
    E -->|Fail| G[Package Rejected]
    
    style A fill:#f8f9fa,stroke:#6c757d
    style B fill:#4582ec,stroke:#4582ec,color:#ffffff
    style C fill:#f8f9fa,stroke:#6c757d
    style D fill:#f8f9fa,stroke:#6c757d
    style E fill:#4582ec,stroke:#4582ec,color:#ffffff
    style F fill:#f8f9fa,stroke:#6c757d
    style G fill:#f8f9fa,stroke:#6c757d

Modern Approach

When writing example code that requires internet connectivity, you have a few options to ensure that the examples run smoothly during package checking. The preferred method is to use conditional execution through @exampleIf tag from the roxygen2 package, e.g.

#' @examplesIf some_condition()
#' my_function()
#' another_function()

This will only run the examples if some_condition() is TRUE.

For web API functions, we need to check for internet connectivity, interactivity, and, if necessary, an API key. However, since CRAN checks are non-interactive, we can rely on interactive() to check for an interactive session. For greater peace of mind, we can also check for internet connectivity using curl::has_internet() function.

Option 1: Only run examples in interactive sessions to avoid the examples failing during package checking.

#' @examplesIf interactive()
#' fetch_api_data("some_endpoint")

Option 2: Check if the package is running in an interactive session and for internet connectivity before running the examples.

#' @examplesIf interactive() && curl::has_internet()
#' fetch_api_data("some_endpoint")

Option 3: Check interactivity, internet connectivity, and an API key being set in the environment if the API requires authorization.

#' @examplesIf interactive()  && curl::has_internet() && Sys.getenv("API_KEY") != ""
#' fetch_secure_data("premium/endpoint")

Order matters in the @examplesIf tag. The first condition that fails will prevent the examples from running. Thus, if you have multiple conditions, place the most restrictive condition first.

For more details, refer to the Chapter 16 of the R Packages (2e) book or the {roxygen2} Documenting functions vignette.

The Legacy Approach

While less elegant, this approach still works at the expense of maintainability since it requires a selection statement to be directly placed around the example code, e.g.

#' @examples
#' \donttest{
#' if (interactive()) {
#'   my_api_function("some_endpoint")
#' }
#' }

Web API Testing Strategy

When testing R packages that interact with web APIs, you need to consider how to handle internet connectivity issues, the availability of the API responses, and the need to test with real data. There are several strategies you can use to ensure your package functions correctly under various conditions.

For a high-level overview, we can use a state diagram to illustrate the process of testing web API functionality in R packages. The diagram shows the steps involved in checking for internet connectivity, running tests with real or mocked data, and handling the test results.

stateDiagram-v2
    direction LR
    
    [*] --> TestCheck
    TestCheck --> LiveTests: Internet Available
    TestCheck --> NoInternet: No Internet
    
    NoInternet --> MockedTests: Use Mocks
    NoInternet --> SkipTests: skip_if_offline()
    
    state LiveTests {
        direction LR
        RealAPI: Test with Real API
        VCRRecord: Record Response
        VCRPlayback: Replay Response
        
        RealAPI --> VCRRecord
        VCRRecord --> VCRPlayback
    }
    
    state MockedTests {
        direction LR
        HttpTest: Mocks
        MockBindings: Call Functions
        
        HttpTest --> MockBindings
    }
    
    LiveTests --> Results
    MockedTests --> Results
    SkipTests --> Results: Tests Skipped
    Results --> [*]
    
    state "Test Results" as Results

    classDef critical fill:#4582ec,stroke:#4582ec,color:#ffffff
    classDef container fill:#f8f9fa,stroke:#6c757d
    classDef headerstyle color:#4582ec,font-weight:bold
    class TestCheck,NoInternet,LiveTests,MockedTests critical
    class LiveTests,MockedTests container
    class LiveTests:header headerstyle
    class MockedTests:header headerstyle

Skip Tests When Offline

The quickest way to handle internet connectivity issues is to skip tests that require internet connectivity. So, if the user is offline, the tests will be not run. We can use skip_if_offline() from the {testthat} package in the form of:

test_that("API connection works", {
  skip_if_offline()
  # Your test code here
})

Mock API responses

For APIs that might change or are rate-limited, you can test your package by mocking API responses using local_mocked_bindings() from {testthat} package. When we mock the API responses, we can control the response data and status code. This allows you to test your package without requiring internet connectivity OR real API responses at the cost of maintaining the mock to the real data.

test_that("API checked with mocked data", {
  # Using testthat's mocking
  local_mocked_bindings(
    fetch_api_data = function(...) {
      list(
        status_code = 200,
        content = '{"users": [{"id": 1, "name": "Test"}]}'
      )
    }
  )
  
  result <- fetch_api_data("users")
  expect_equal(result$name, "Test")
})

Record Real API Responses

vcr

Use the {vcr} package to record real API responses and replay them during testing. This allows you to test your package with real data without requiring internet connectivity

test_that("API integration works with real data", {
  vcr::use_cassette("user_api_response", {
    result <- fetch_api_data("users")
  })
  
  expect_gt(nrow(result), 0)
})

For more details, see {vcr} Vignette: Introduction to vcr.

httptest2

We could also use the httptest2 package to record real API responses and replay them during testing. Similar to before, this will allow you to test your package with real data without requiring internet connectivity. Though, the syntax is different from {vcr} but the concept is the same.

with_mock_dir("person", {
  test_that("We can get people", {
    result <- fetch_api_data("users")
    expect_gt(nrow(result), 0)
  })
})

For more details, see httptest2: A Test Environment for HTTP Requests in R.

Regular Usage

While CRAN requires graceful failures during package checking, your actual package functions should still provide meaningful errors when things go wrong. For example, we can use a tryCatch() block to handle API request failures and provide informative error messages to the user.

fetch_api_data <- function(endpoint) {
  tryCatch(
    make_api_request(endpoint),
    error = function(e) {
      cli::cli_abort(
        c(
          "x" = "API request failed: {endpoint}",
          "i" = "Error message: {conditionMessage(e)}",
          ">" = "Check the API documentation or try again later."
        )
      )
      # Or stop("API request failed: ", endpoint, "\n", conditionMessage(e))
    }
  )
}

Fin

Developing R packages that interact with web APIs requires a delicate balance being struck between providing a good user experience and meeting CRAN’s requirements. From writing documentation examples that won’t fail CRAN Checks using @examplesIf, implementing a solid unit tests, and providing informative error messages, you can create R packages that handle internet connectivity issues gracefully and don’t trigger the ire of CRAN. There’s always more hiccups that can occur, but these strategies will help you navigate the waters of web API R package development.

Again, CRAN’s requirements specifically targets package checking behavior – not your actual package functions! So, please make sure to provide meaningful errors and feedback during regular usage.

Resources