Recently I was involved in a task that included reading and writing quite large amounts of data, totaling more than 1 TB worth of csvs without the standard big data infrastructure. After trying multiple approaches, the one that made this possible was using data.table’s reading and writing facilities - fread() and fwrite(). This motivated me to look at benchmarking data.table’s fread() and how it compares to other packages such as tidyverse’s readr and base R for reading tabular data from text files such as csvs.
In this summertime post in the case4base series, we will look at useful tools in base R, which let us profile our code without any extra packages needed to be installed. We will cover simple and easy to use speed profiling, more complex profiling of performance and memory and, as always, look at alternatives to base R as well, with a special shout out to profiling integration in RStudio.
Profiling our code is a very useful tool to determine how well the code performs on different metrics. The addin we will create in this article will let us use a keyboard shortcut to run profiling on R code selected in RStudio without blocking the session or requiring any external packages. Specifically for very simple overview use, it may be beneficial to look at the time needed for a set of expressions to compute, e.