Important:

IBM Cloud Pak® for Data Version 4.7 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.7 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

Working in RStudio with default Git integration (Watson Studio)

You can create R scripts and R Shiny applications in projects with default Git integration.

R Shiny is an R package that makes it easy to develop interactive web applications straight from R. You can create, develop and refine Shiny apps in RStudio, whether to create a unique data visualization dashboard or publish applications into different places, for example to deployment spaces.

Creating R scripts and Shiny apps

The directory browser should be open when RStudio is launched, at the bottom right.

The Git repository referenced in the project is initially cloned into the project storage at the time the project is created. You can view the current versions of your files in the Git repository in the Files browser at the bottom right of the RStudio GUI in the folder called userfs. You must make all your changes to your R files in that folder or any subfolders to be able to sync with Git.

Important:

The clone is pulled from the Git repository branch that you see next to the Git icon () on the project's action bar.

Note that if a folder or subfolder is thought to contain RShiny App files (that is files with the names app.R, ui.R or server.R), all files in that folder are considered to belong to the Shiny app (including .R files). Otherwise, all .R files are considered R script assets.

Optional: Add collaborators to the project if you want to work on the same file with others. See Collaboration.
Optional: Preinstall any R libraries that you source for your Shiny app from an external network at the global location /cc-home/_global_/R or in a persistent storage volume to avoid installing these libraries everytime the Shiny app is deployed. Ensure that you are connected to the storage volume when you deploy the Shiny app.
Start working on R scipts:
1. Select New File > R Script or uploading an R file from your local machine.
2. Save your file changes. Don't save the file under userfs/assets as the assets directory is reserved for files related to project assets.
3. You can test your scripts before you commit them to the Git repository by creating a job from your project's Jobs page and running it.
  1. From the project's Jobs page, select New job.
  2. Select your file, enter the job settings and run it.
  3. Validate the run results by clicking the job run.
Or start working on Shiny apps:
1. Click New File > Shiny Web App.... A new Shiny application creation window pops up.
2. Enter a name for your Shiny application and leave userfs as the Create within directory setting. To enable synching with the Git repository, you must work in this directory or any subdirectories, however not in the /assets directory as this directory is reserved for files related to project assets.
  
  Both app.R and ui.R/server.R contain instructions needed to build your app and provides a sample app you can test run.
3. You can choose to create a single file application (app.R) if your application is simple and can be contained within one file, for simplicity.
4. Or, you can choose to create multiple files application (ui.R/server.R) if your application is more complex and needs to have its different facets edited separately.
5. When you are done with configuration, click Create.
6. You can test run your app by clicking Run App in the top right corner of the editing panel. A pop-up window will be launched displaying your application.
You can use data from a data set in your scripts or apps. Supported formats of datasets include text, CSV, SPSS, SAS, and Stata. You can use data assets already imported into the project by clicking Import Dataset under the Environment tab, or clicking File and browsing for the file under userfs/assets/data-asset, or uploading them locally by clicking Upload in the Data panel on the bottom right. You can preview the data assets in the editing panel.

Note: You can't preview datasets larger than 5 MB in RStudio.
When your files are ready, push your changes to the Git respository:
- By clicking the Git version control menu on the top menu bar of the main editing panel:
  1. Click Commit and select all the files that you have made changes to and would like to push to the Git repository.
  2. Add a change description and commit your staged changes to the local clone of your repository in your RStudio session.
  3. Click Push to push your your changes to the remote repository where your changes can be seen and accessed by other users.
  4. By clicking Pull in the Git actions panel, you can also pull file changes made by collaborators to your repository clone.
- By selecting the Git icon from the project's action bar:
  1. Click Commit from the menu.
  2. Add a description, select the Git repository branch you chose for the project, select the files and commit the changes.
  3. Click Push from the Git menu to push your commits to the repository.
The R files in the Git repository that are pushed are not added as assets to the project's Assets page. You can however select those files to run as jobs from the project's Jobs page.

Storing intermediate .rda files

You can store any intermediate files, for example .rda and .md files, log or text files in any storage volume installed with Cloud Pak for Data. This storage volume is automatically mounted at the time that an RStudio session is started and hence these files can be accessed by all project collaborators, and in R Shiny applications or jobs that run R scripts. For details on using a storage volume, see Managing storage volumes.

Working with data files

In RStudio, you can work with data files from different sources:

From the Files view in the RStudio UI, you can work with:

RStudio files and R scripts RStudio files and R scripts are stored in the directory called userfs.

Note that if you add data files directly under userfs, these files will not show up as Data assets in the project and can't be opened and previewed in RStudio. Also, if you want to go on and use these data files in a Watson Studio tool, for example in Data Refinery, you need to add the files as Data assets to the project, see Adding data files as project assets.
Project data assets

Data assets are listed in the directory called assets/data_asset under userfs. You can open, view, and work with these assets in RStudio. Note that if you add a regular file to this directory, the file is not automatically added as a Data asset to the project. To add a file as a Data asset to a project, see Adding project assets. After you have added the file as a Data asset to the project, it can be used in different tools like Data Preview, Data Refinery, or SPSS Modeler in Watson Studio.

Connected data assets cannot be opened and viewed in the assets/data_asset directory. You can only access connected data assets programmatically from an R script in RStudio. Note that you currently cannot use the ibm-watson-studio-lib library in RStudio.
Files in subdirectories under userfs

The files that you create in the local file system of your RStudio session under userfs are persisted. If you stop RStudio, and restart again on another day for example, you will see all your files from previous sessions.

Loading and accessing data

You can't generate code directly in RStudio to load data from files or connections. However, you can use code that you generated in a sample R notebook and then copy this code to use in your scripts in RStudio.

To use generated code in RStudio with Git integration, you need another project with no Git integration in which you create an R notebook to which to add the generated code that you want to copy. Alternatively, you can prepare this code in an R script that you reference at the time you run the code that you developed in RStudio in a job.

Create connections to the data sources you want to work with in your R scripts. See Connecting to data sources.
Add these data sources as assets, for example the data files or connections, to your project. See Adding data to a project.
Open an R notebook in a project with no Git integration in edit mode, click the Code snippets icon (), click Read data and then select the data file or database connection from the project.
1. For a database connection, select the schema and choose a table.
2. Select the load option.
3. Click in an empty code cell in your notebook and then click to insert the generated code. Alternatively, click to copy the generated code to the clipboard and then paste the code into your notebook.
  
  Note: Only select data load options that use the Flight service based on Apache Arrow Flight to communicate with a database connection or connected data asset (data accessible through a connection). Do not use the options that are tagged as being deprecated.
Go back to your project with Git integration, open RStudio and copy the R code from the notebook to your R script.

Note:
If the generated code contains lines that use the IRdisplayfunction, you must delete those lines to avoid errors.
Run the copied code in RStudio. Check that the file, connection or connected asset was accessed correctly and that data was loaded to the RStudio environment.

Adding data files as project assets

You should upload data files to use in RStudio by clicking the Upload asset to project icon ( Shows the upload asset to project icon ) on your project's Assets page because these files are automatically added as Data assets to your project.

However, if you uploaded or created data files in RStudio, you can add these files to your project as project data assets. These files must be in the assets/data_asset folder in RStudio. To add these files as data assets to the project:

On the Assets page of the project, click Import assets.
Select Project files and the file in the project_data_assets folder that you want to add to the project as asset.

Running an R script in a job

You can run the script as a job in an RStudio environment in Watson Studio or on a remote Hadoop cluster. See:

To create a job to run an R script in an RStudio environment, see Creating code-based jobs.
To create a job to run an R script on a Hadoop cluster, you need a Hadoop cluster that supports R and R scripts, and you'll need to enable the feature on the Hadoop cluster by modifying a configuration file. See Administering Apache Hadoop clusters, sub-section scriptLanguages under Details on the content of the json files for more details. In addition, all the libraries that you need for your R script must be available on the cluster.

To run a job on the Hadoop cluster, you must first create a Hadoop environment. After you have created this Hadoop Yarn environment, you can select it when you create the job for the R script from the Jobs page of the project.

Creating a Hadoop Yarn environment

The Watson Studio adminstrator needs to add the Hadoop cluster configuration to your platform.
1. Open the drop down menu from the sandwich button on Watson Studio's home page, and click on Configure Platform.
2. Click on Add Registration to add the Hadoop cluster to the project's configuration.
Now go to your project, click on the Environments page. Click on New template to create a custom environment.
After you give the custom environment a name, select Hadoop as the environment type.
Select the Hadoop configuration you would like to use.
A Hadoop cluster set up for R scripts needs to be able to use Yarn, as certain R scripts require usage of Yarn. If the cluster is set up correctly, a field called Execution type appears, in which the user can select Yarn as the execution type. If you do not see an option for Execution type, it is likely your Hadoop admin has not set up the Hadoop cluster and configuration file to support the R environment. Once the set up is done on the hadoop side, your admin will need to refresh the Hadoop registration before Execution Type option would be available. You can select "Yarn" to run R script.
Select the language, Yarn size and Yarn container memory. These fields are bounded by the admin's settings.
Click Create to complete the creation of the environment.
You can change the default settings of the custom environment later by clicking on the environment under the Environments page, for example, increase or decrease the memory of the Yarn container.

Deploying scripts in a space

You can move assets from a project with default Git integration to a deployment space by creating a Git archive file (a ZIP file containing the contents of your repository from a particular branch or tag) in your Git provider's user interface and importing this ZIP file to an existing deployment space. A Code Package asset will be created containing all the code files you created using RStudio. See Importing spaces and projects into existing deployment spaces.

Learn more

Parent topic: RStudio