Analyzing data with RStudio (RStudio Server with R 3.6)
R is a popular statistical analysis and machine-learning package that includes tests, models, analyses, and graphics, and enables data management. RStudio provides an IDE for working with R.
Service The RStudio Server with R 3.6 service is not available by default. An administrator must install this service on the IBM Cloud Pak for Data platform. To determine whether the service is installed, open the Services catalog and check whether the service is enabled.
- Required services
- RStudio Server with R 3.6
- Optional service
- Watson Machine Learning
- Data format
- All data file types in the RStudio server file structure
- Tables in relational data sources
- Data size
You can use the RStudio IDE in an analytics project with or without Git integration.
RStudio without Git integration
When you work in RStudio from a project that does not have integration with a Git repository, you can create R scripts or Shiny apps, and work with data assets from the project, but you can’t add your scripts or Shiny apps to the project as assets to share with other users, nor can you deploy applications from a deployment space.
RStudio with Git integration
When you work in RStudio from a project that does have integration with a Git repository, you can share your R scripts and Shiny apps with other users in your project.
If your project is integrated with a Git repository, then you can create Shiny apps and R scripts and pull them into the project as assets. If you have the Watson Machine Learning service installed, you can deploy your applications in a deployment space as URLs that are accessible to users. You can integrate a project with a Git repository only while you’re creating the project. See Git integration.
Collaboration with Git integration
With the Git version control sytem added through the Git extension in RStudio, users can share their work on files in RStudio. To enable sharing when working on files, users must be added to the project as collaborators and must have access to the associated project Git repository.
To enable users in a project to collaborate on file changes in RStudio:
- Add users as collaborators to the project and assign them either Admin or Editor role. You can invite only users who have an existing IBM Cloud Pak for Data account. See Adding collaborators.
- Give all collaborators the appropriate access permissions to the project Git repository.
Instruct all collaborators to create their own personal access token for the associated project repository. See Creating personal access tokens for Git repositories.
When you open RStudio, you will see your personal Git access token in the list. Select it to begin working on the RStudio project.
You access RStudio from within an analytics project. The RStudio IDE runs in an RStudio environment. A default RStudio environment is included with the RStudio Server with R 3.6 service. You can also create custom RStudio environment definitions if you have the execution engine for Apache Hadoop. See RStudio environments.
To start RStudio in your project:
- Click RStudio from the Launch IDE menu on your project’s action bar.
- Select the environment runtime.
- If the project is integrated with a Git repository, select your token.
- Click Launch.
The environment runtime is initiated and the development environment opens.
If you restart RStudio after it crashed and integration to the associated Git repository is broken, the reason is that the RStudio session workspace is in an incorrect state. See Git integration broken when RStudio crashes to restore the session workspace.
Working with data files
In RStudio, you can work with data files from different sources:
Files in the RStudio server file structure, which you can view by clicking Files in the bottom right section of RStudio. This is where you can create folders, upload files from your local system, and delete files.
To access these files in R, you need to set the working directory to the directory with the files. You can do this by navigating to the directory with the files and clicking More > Set as Working Directory.
Be aware that files stored in the
Homedirectory of your RStudio instance are persistent within your instance only and cannot be shared across environments nor within your project.
Watch this video to see how to load data to RStudio.
This video provides a visual method as an alternative to following the written steps in this documentation.
Project data assets which you can view by clicking Files > Home in the bottom right section of RStudio. The data assets are in the folder called
project_data_asset. You can select to view the content of a file or import the data set by clicking the asset.
If you add a data file to this folder, the file is not added as a data asset to the project. To add data files as project data assets, see Adding project assets.
Data stored in a database system.
Adding or deleting project assets
You should upload data files to use in RStudio from the find and add data sidebar on your project’s Assets page because these files are automatically added as data assets to your project.
However, if you uploaded or created data files in RStudio, you can add these files to your project as project data assets. These files must be in the
Home/project_data_asset folder in RStudio. To add these files as data assets to the project:
- On the Assets page of the project, click the Find and Add Data icon () and select the Files tab.
- Select the files you want to add to the project as assets.
- From the Actions list, select Add as data asset and apply your changes.
If you delete a data asset from the
Home/project_data_asset folder in RStudio, the file is no longer listed in the add data sidebar under the Files tab when you click the Find and Add Data icon on the project Assets page. However it is still listed as a data asset on the Assets page of your project. To delete the entry in the data assets list:
- Select the data asset from the list.
- From the Actions menu, select Remove.
When working in a project with Git integration: