Java

A guide to retrieval and processing of data from relational database systems using Apache Spark and JDBC with R and sparklyr August 15, 2020

The {sparklyr} package lets us connect and use Apache Spark for high-performance, highly parallelized, and distributed computations. We can also use Spark’s capabilities to improve and streamline our data processing pipelines, as Spark supports reading and writing from many popular sources such as Parquet, Orc, etc. and most database systems via JDBC drivers. In this post, we will explore using R to perform data loads to Spark and optionally R from relational database management systems such as MySQL, Oracle, and MS SQL Server and show how such processes can be simplified.

A primer in using Java from R - part 2 July 07, 2018

In this part of the primer we discuss creating and using custom .jar archives within our R scripts and packages, handling of Java exceptions from R and a quick look at performance comparison between the low and high-level interfaces provided by rJava. In the first part we talked about using the rJava package to create objects, call methods and work with arrays, we examined the various ways to call Java methods and calling Java code from R directly via execution of shell commands.