Introduction
Automating the execution, testing and deployment of R code is a powerful tool to ensure the reproducibility, quality and overall robustness of the code that we are building. A relatively recent feature in GitHub - GitHub actions - allows us to do just that without using additional tools such as Travis or Jenkins for our repositories stored on GitHub.
In this post, we will examine using GitHub actions and Docker to test our R packages across platforms in a portable way and show how this setup works for the CRAN package languageserversetup.
Contents
- Many different tools, many different syntaxes. And low portability
- Containerizing and shell scripting our way to portable setups
- Continuous integration for R-based applications with GitHub Actions
- A concrete example - Checking an R package automatically using R Hub in 4 steps
- TL;DR - just show me the code
- References
Many different tools, many different syntaxes. And low portability
The motivation behind this post stems mostly from my experience with many different automation tools which we could in simplified terms refer to as CI/CD tools. Some of them offer a wide variety of features such as Jenkins, Bamboo or Travis, others, such as GitLab CI and GitHub Actions are perhaps less feature-rich but offer simplicity and very good out-of-the-box integration with the repository hosting.
What all these tools share apart from the usefulness of the features is however a bit less appealing for teams trying to build portable CI/CD pipelines - their own syntax.
One good example is the amazing work done to integrate R with Travis. Thanks to this integration, we can work with R relative well with Travis. It would likely require a similar effort to enable such integration on the other CI/CD tools.
What this means for development teams thinking about CI/CD pipelines is that building portable setups using tool-native syntax can quickly become an endeavor on its own - we have written about some examples of Jenkins-based solutions with regards to environments here and with regards to parallelization here. Porting such a setup built using a specific tool to another tool becomes increasingly difficult.
Containerizing and shell scripting our way to portable setups
Because of the experience described above, when setting up CI pipelines for R packages I find it beneficial and efficient to choose a route of portability instead. When setting up with GitLab CI a few years ago, the approach was:
- create a Docker image in which R-related commands will run
- write a simple shell script that wraps around it
This process is described in detail in there 2 posts:
- How to easily automate R analysis, modeling and development work using CI/CD, with working examples
- Setting up continuous multi-platform R package building, checking and testing with R-Hub, Docker and GitLab CI/CD for free, with a working example
Perhaps the biggest advantage of such an approach is that we can simply pick that shell script up and place it to a different tool and, assuming that the new tool supports Docker.
Everything will run just fine, apart from a few details that still stay tool-based, such as working with environment variables and authentication secrets.
Continuous integration for R-based applications with GitHub Actions
When creating the languageserversetup
package, it was very important to test each change across many platforms automatically and since I opted to host the open-source code on GitHub instead of GitLab this time, GitHub Actions seemed like a natural choice for a CI/CD setup.
The current GitHub action for a CRAN-like checks looks as follows:
name: check_cran
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- name: Check for CRAN
env:
DOCKER_LOGIN_TOKEN: ${{ secrets.DOCKER_LOGIN_TOKEN }}
LANGSERVERSETUP_RUN_DEPLOY: false
run: sh ci/docker_stage.sh ci/check_rhub.R "cran"
As we can see, apart from the skeleton that you get for free, the only line that does some work is the very last one. It tells the GitHub Actions executor to run the shell script docker_stage.sh
with 2 arguments:
sh ci/docker_stage.sh ci/check_rhub.R "cran"
This setup is very portable. You could take it almost verbatim and use it within Jenkins, GitLab CI and probably most other CI/CD tools.
The Docker-wrapping shell script
What is the docker_stage.sh
script used for? In our case, it is 3-fold:
- Run CRAN-like checks automatically
- Run containerized deployments
- Run and report test coverage
What they have in common is that they all happen in a Docker container and are described with an R script that can be executed via Rscript
. That means that this shell script is just a helper that will:
- Pull the needed Docker image
- Create a container from that image
- Copy the code checked-out by the (GitHub Actions) runner into the container
- Execute the R script provided as the first command-line argument (
ci/check_rhub.R
above) and other arguments if needed - Stop and remove the container when done
The R scripts executed within the Docker container
Now the R scripts that are executed within the container can do almost any actions that you require, from checking the package, running unit tests to the execution of your data science models.
One fully automated example using this exact approach is how the sparkfromr.com book is deployed. The repositories are open-sourced, you can read more in this post.
The only important condition is that your Docker container can run that script successfully. In the R world, that mostly entails having R, all the R packages and their system dependencies installed. This is made amazingly easy by the Rocker Project, which provides versioned base R images, but also images with RStudio. For tidyverse fans, they even have an image with the entire tidyverse ready for use.
This is however very easily testable, as the setup using sh ci/docker_stage.sh ci/check_rhub.R "cran"
will not only run via the CI/CD tools, but also on your development machine.
Note that on Windows, you might need to enable the Windows Subsystem for Linux for that to be fully true.
Setting up the process this way may nudge you to a containerized development process, where you develop the project within a container. In that case, the fact that everything works is just an automatic consequence of the development process and the containerization has no overhead, because we can use that very same image for CI/CD purposes.
The GitHub actions yaml, environment variables and secrets
Of the few elements of the setup that are not fully portable, notable are environment variables and secrets. For GitHub Actions, we can do it with the env:
clause, for example:
env:
DOCKER_LOGIN_TOKEN: ${{ secrets.DOCKER_LOGIN_TOKEN }}
LANGSERVERSETUP_RUN_DEPLOY: false
The above will set the LANGSERVERSETUP_RUN_DEPLOY
environment variable to false
and the will expose the encrypted secret named DOCKER_LOGIN_TOKEN
to an environment variable of the same name. The secrets can be created via your repository’s Settings -> Secrets menu on GitHub.
A concrete example - Checking an R package automatically using R Hub in 4 steps
Now with all the information above, let us look at a quick walk-through of a setup that will let us check your R package on multiple platforms using R Hub. We need:
- An R script that will run and evaluate the check via R Hub - For the package languageserver setup, this looks as follows: ci/check_rhub.R. Note that this script is years old and quite possibly needlessly long and complicated.
- A shell script that will run the R script, such as ci/docker_stage.sh
- A docker container in which the R script can run. We have covered this in some detail in Preparing a private docker image to use with R-hub
- A .yaml file in the
.github/workflows
directory of your repository, for example, .github/workflows/check_cran.yml
And that is it. Now we will have our package checked each time we push a commit to our repository:
Other uses - Test coverage reporting and script-based deployments
Since the languageserversetup repository is completely open, you can also look at the other GitHub actions setup for that repository. Note that all of the GitHub actions use the very same docker_stage.sh
script, the only thing that changes are the R scripts per purpose:
- Test coverage reporting with covr and codecov.io
- R script running the coverage computation with covr and publishing it to codecov.io
- GitHub Action definition
- Debian-based script deployments
- R script running an example deployment and some tests
- GitHub Action definition
TL;DR - just show me the code
An example implementation of package testing with the CRAN package languageserversetup:
- GitHub Actions workflows for the languageserversetup package
- Docker-based shell script to execute R scripts
- R script for package checks with R Hub
- R script for reporting test coverage using Codecov.io and covr
An example implementation of bookdown publication publishing with sparkfromr.com
- GitHub Actions workflows for the book deployment
- Docker-based shell script to deploy the book. Note that there is no need for a separate R script because the action to be done is trivial.
References
- Docker images for R on the Rocker Project
- Get started with Docker official documentation
- GitHub Actions: Creating and storing encrypted secrets
- GitHub Actions: Documentation