Table of contents

Using Spark in RStudio

Although the RStudio IDE cannot be started in a Spark with R environment runtime, you can use Spark in your R scripts and Shiny apps by accessing Spark kernels programmatically.

RStudio uses the sparklyr package to connect to Spark from R. The sparklyr package includes a dplyr interface to Spark data frames as well as an R interface to Spark’s distributed machine learning pipelines.

There are two methods of connecting to Spark from RStudio:

  • By connecting to a Spark kernel that runs locally in the RStudio container in IBM Watson Studio
  • By connecting to a remote Spark kernel that runs outside of IBM Watson Studio in an Analytics Engine powered by Apache Spark service instance.

RStudio includes sample code snippets that show you how to connect