Automating R package checks across platforms with GitHub Actions and Docker in a portable way

Introduction

Automating the execution, testing and deployment of R code is a powerful tool to ensure the reproducibility, quality and overall robustness of the code that we are building. A relatively recent feature in GitHub - GitHub actions - allows us to do just that without using additional tools such as Travis or Jenkins for our repositories stored on GitHub.

In this post, we will examine using GitHub actions and Docker to test our R packages across platforms in a portable way and show how this setup works for the CRAN package languageserversetup.

Many different tools, many different syntaxes. And low portability

The motivation behind this post stems mostly from my experience with many different automation tools which we could in simplified terms refer to as CI/CD tools. Some of them offer a wide variety of features such as Jenkins, Bamboo or Travis, others, such as GitLab CI and GitHub Actions are perhaps less feature-rich but offer simplicity and very good out-of-the-box integration with the repository hosting.

What all these tools share apart from the usefulness of the features is however a bit less appealing for teams trying to build portable CI/CD pipelines - their own syntax.

One good example is the amazing work done to integrate R with Travis. Thanks to this integration, we can work with R relative well with Travis. It would likely require a similar effort to enable such integration on the other CI/CD tools.

What this means for development teams thinking about CI/CD pipelines is that building portable setups using tool-native syntax can quickly become an endeavor on its own - we have written about some examples of Jenkins-based solutions with regards to environments here and with regards to parallelization here. Porting such a setup built using a specific tool to another tool becomes increasingly difficult.

Containerizing and shell scripting our way to portable setups

Because of the experience described above, when setting up CI pipelines for R packages I find it beneficial and efficient to choose a route of portability instead. When setting up with GitLab CI a few years ago, the approach was:

  • create a Docker image in which R-related commands will run
  • write a simple shell script that wraps around it

This process is described in detail in there 2 posts:

Perhaps the biggest advantage of such an approach is that we can simply pick that shell script up and place it to a different tool and, assuming that the new tool supports Docker.

Everything will run just fine, apart from a few details that still stay tool-based, such as working with environment variables and authentication secrets.

Continuous integration for R-based applications with GitHub Actions

When creating the languageserversetup package, it was very important to test each change across many platforms automatically and since I opted to host the open-source code on GitHub instead of GitLab this time, GitHub Actions seemed like a natural choice for a CI/CD setup.

The current GitHub action for a CRAN-like checks looks as follows:

name: check_cran
on: [push]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v1
    - name: Check for CRAN
      env:
        DOCKER_LOGIN_TOKEN: ${{ secrets.DOCKER_LOGIN_TOKEN }}
        LANGSERVERSETUP_RUN_DEPLOY: false
      run: sh ci/docker_stage.sh ci/check_rhub.R "cran"

As we can see, apart from the skeleton that you get for free, the only line that does some work is the very last one. It tells the GitHub Actions executor to run the shell script docker_stage.sh with 2 arguments:

sh ci/docker_stage.sh ci/check_rhub.R "cran"

This setup is very portable. You could take it almost verbatim and use it within Jenkins, GitLab CI and probably most other CI/CD tools.

The Docker-wrapping shell script

What is the docker_stage.sh script used for? In our case, it is 3-fold:

  1. Run CRAN-like checks automatically
  2. Run containerized deployments
  3. Run and report test coverage

What they have in common is that they all happen in a Docker container and are described with an R script that can be executed via Rscript. That means that this shell script is just a helper that will:

  • Pull the needed Docker image
  • Create a container from that image
  • Copy the code checked-out by the (GitHub Actions) runner into the container
  • Execute the R script provided as the first command-line argument (ci/check_rhub.R above) and other arguments if needed
  • Stop and remove the container when done

The R scripts executed within the Docker container

Now the R scripts that are executed within the container can do almost any actions that you require, from checking the package, running unit tests to the execution of your data science models.

One fully automated example using this exact approach is how the sparkfromr.com book is deployed. The repositories are open-sourced, you can read more in this post.

The only important condition is that your Docker container can run that script successfully. In the R world, that mostly entails having R, all the R packages and their system dependencies installed. This is made amazingly easy by the Rocker Project, which provides versioned base R images, but also images with RStudio. For tidyverse fans, they even have an image with the entire tidyverse ready for use.

This is however very easily testable, as the setup using sh ci/docker_stage.sh ci/check_rhub.R "cran" will not only run via the CI/CD tools, but also on your development machine. Note that on Windows, you might need to enable the Windows Subsystem for Linux for that to be fully true.

Setting up the process this way may nudge you to a containerized development process, where you develop the project within a container. In that case, the fact that everything works is just an automatic consequence of the development process and the containerization has no overhead, because we can use that very same image for CI/CD purposes.

The GitHub actions yaml, environment variables and secrets

Of the few elements of the setup that are not fully portable, notable are environment variables and secrets. For GitHub Actions, we can do it with the env: clause, for example:

      env:
        DOCKER_LOGIN_TOKEN: ${{ secrets.DOCKER_LOGIN_TOKEN }}
        LANGSERVERSETUP_RUN_DEPLOY: false

The above will set the LANGSERVERSETUP_RUN_DEPLOY environment variable to false and the will expose the encrypted secret named DOCKER_LOGIN_TOKEN to an environment variable of the same name. The secrets can be created via your repository’s Settings -> Secrets menu on GitHub.

A concrete example - Checking an R package automatically using R Hub in 4 steps

Now with all the information above, let us look at a quick walk-through of a setup that will let us check your R package on multiple platforms using R Hub. We need:

  1. An R script that will run and evaluate the check via R Hub - For the package languageserver setup, this looks as follows: ci/check_rhub.R. Note that this script is years old and quite possibly needlessly long and complicated.
  2. A shell script that will run the R script, such as ci/docker_stage.sh
  3. A docker container in which the R script can run. We have covered this in some detail in Preparing a private docker image to use with R-hub
  4. A .yaml file in the .github/workflows directory of your repository, for example, .github/workflows/check_cran.yml

And that is it. Now we will have our package checked each time we push a commit to our repository:

GitHub Action log for package check via R Hub

GitHub Action log for package check via R Hub

Other uses - Test coverage reporting and script-based deployments

Since the languageserversetup repository is completely open, you can also look at the other GitHub actions setup for that repository. Note that all of the GitHub actions use the very same docker_stage.sh script, the only thing that changes are the R scripts per purpose:

  • Test coverage reporting with covr and codecov.io
    • R script running the coverage computation with covr and publishing it to codecov.io
    • GitHub Action definition
  • Debian-based script deployments

TL;DR - just show me the code

An example implementation of package testing with the CRAN package languageserversetup:

An example implementation of bookdown publication publishing with sparkfromr.com

References