Introduction
It has been more than ten years since I wrote my first R code. And in those years, the R world has changed dramatically, and mostly to the better. I believe that the current time may be one of the best times to start working with R.
In this new year’s post we will look at the R world 10 years ago and today, and provide links to many tools that helped it become a great language to solve and present everyday tasks with a welcoming community of users and developers.
Contents
My first exposure to R, more than ten years ago
The year was 2007 and I was studying probability and mathematical statistics at my faculty when one of the professors introduced us to R - a free programming language that we could use to solve many statistical tasks, from simple matrix operations and fitting models, to data visualization. This sounded great, as many other solutions that were traditionally used such as Stata or SPSS were even not free to use, let alone open source.
Now to get a bit of context, my most recent exposures to programming at that time were using Borland’s Deplhi 7 and C++ Builder, both mature IDEs with very pleasant and advanced user interfaces and features, where you could literally have a Windows application with a nice UI ready, compiled and running in an hour.
Rgui times
When I first opened the RGui it felt, well, slightly underwhelming:
But why did you not use RStudio?
Well, the first beta version of RStudio was released about 3 years later in February 2011. By the wat, those RStudio blog posts from 2011 still have comment sections available below them and I really enjoy reading through them. Anyway, I was stuck with the Rgui and it was not a pleasant experience. At that time, I disliked that experience so much, I still wrote some of the code in Delphi or C++ Builder.
Which currently popular packages existed?
But dplyr syntax makes everything so easy, why not use that?
Looking at the CRAN snapshots from the beginning of 2008, the latest released R version at that time was R-2.6.1 and there were around 1200 packages available on CRAN. At the time of writing of this post the number of packages available on CRAN reached 13600.
Looking at the top 40 most downloaded packages in the past month, only two of those packages existed on CRAN at that time - ggplot2 and digest - no filter
, summarize
or group_by
for me back then.
StackOverflow, GitHub and Twitter communities
Why did you not just ask StackOverflow, Twitter or check GitHub ?
According to Wikipedia, StackOverflow was launched 15th September 2008 and GitHub on 10th of April 2008, so in the beginning of 2008 none of the two today’s giants even existed.
Not that I was using Twitter at that time, but the first #rstats tweet I was able to find is from 4th April 2009:
RT @ChrisAlbon @drewconway #rstats is the official R statistical language hashtag. #rstats (because #R doesn't cut it)
— brendan o'connor (@brendan642) April 4, 2009
For comparison, R itself was first released 29th of February 2000, a date easily remembered.
The growth of R
There are many ways to look at a growth of a programming language and this does not mean to be a comprehensive and objective growth assessment. I rather took a look at 2 metrics I found interesting that show some trends in the R world.
If you are interested in the topic of programming language popularity, there are indices such as PYPL and TIOBE, and of course they have their criticisms.
Downloads of R packages
RStudio’s CRAN mirror provides a REST API from which we can look at and visualize the number of monthly downloads of R packages in the past 5 years. The chart speaks for itself:
Interest on StackOverflow
Another interesting point of view is the statistics on trends on StackOverflow, paraphrasing their blogpost:
When we see a rapid growth in the number of questions about a technology, it usually reflects a real change in what developers are using and learning.
And how does R look within the StackOverflow trends compared to other languages? Looks like the growth of R is so remarkable, even the data scientists at StackOverflow itself noticed and wrote a blogpost about it in 2017:
R now versus then - A much better world
Going back to that story of my first R codes, I think time has made working with R much better than it was before in many ways. I will list just few of the many reasons why with the links to relevant resources to follow:
Availability of free information and support is great
- The amazing amount of free information readily available such as (tidyverse oriented) R for Data Science, or Advanced R books make R more accessible to learn and use
- Communities of R users such as the one on StackOverflow make it easy to ask questions and get answers, the #rstats hashtag on Twitter is a good way to interact with the community
- Many user and developer blogs on r-bloggers.com and curated selections of content on RWeekly.org can serve as an inspiration and overview of the news in the community
Software tools that make working with R efficient
- Tools like RStudio make using R a much more pleasant experience compared to the original RGui, with many useful features and a Server version running in browser
- Well documented R packages that make common data science tasks easier and/or more performant such as the popular tidyverse or data.table make it easier to start
- R packages that support development, testing and documentation such as devtools, testthat and roxygen2 make R code efficient to develop, test and document
- For portability, reproducibility and dependency management, tools such as packrat can make life less painful
- Code repository managers such as GitHub, GitLab or others make it easy to share code, collaborate and even perform CI/CD tasks where necessary
Professionally presenting and publishing R results is simple
- Tools like RMarkdown, Bookdown, Blogdown and others make it easy to publish the results of your work, be it an interactive dashboard, a paper in pdf, a presentation, even a book or a blog (such as this one)
- Many packages for generating interactive charts, maps and animations such as highcharter, leaflet and more help create amazing data visualizations
- Shiny takes it to the next level allowing for advanced interactive web applications
Mature interfaces to programming languages, file formats and more
Guidance on packages per topic on CRAN
- CRAN task views provide guidance on R packages per topic, such as Web Technologies and Services, High-Performance and Parallel Computing, Machine Learning & Statistical Learning and many more
If you really came for that ugly old code
I hope this post motivated you to dive a bit deeper into the R world and check some of the many amazing contributions created by developers and users in the R community mentioned above.
But if you really feel like having a good laugh first, feel free check some of the oldest R scripts I was able to find unedited on GitLab here.
They date somewhere to the end of 2007/beginning of 2008 and, for it’s worth, should still be runnable.
Thank you for reading and
have a happy new yeaR