Tutorial: Use the Spark service provided by DSX within RStudio
This tutorial shows you how to set up RStudio to use the Spark service provided with IBM® Data Science Experience Local (DSX Local) on Integrated Analytics System.
This tutorial shows you how to carry out the following tasks:
Time required
5 minutesScenario
You are a data scientist who wants to use Apache Spark for analytics from within RStudio.
Difficulty
Beginner
Audience
Data scientists or anyone that wants to use Spark in RStudio on Integrated Analytics System.
Prerequisites
You will need to have a user account on Integrated Analytics System. If you don't have one, contact your Integrated Analytics System system administrator to create one for you. You will also need a user ID and password for DSX Local on that system. Finally, you need to start RStudio in DSX Local.
Open a sample R script provided with RStudio in DSX Local
The RStudio installation provided with DSX Local includes several R scripts that demonstrate how to use the Spark service.
Procedure
- Click the ibm-sparkaas-demos folder in the Files tab in RStudio.
-
Click one of the provided R scripts, such as spark_flights.R.
The script is opened in the R editor. However, the script cannot be successfully executed yet because there is no connection to the Spark service.
Configure the Spark service for use in RStudio
Procedure
- Click the Spark tab.
- Click New Connection.
- In the Connect to Spark window, click Connect.
- Return to the R source editor tab and run the current script.