Working in RStudio with default Git integration (Watson Studio)
You can create R scripts and R Shiny applications in projects with default Git integration.
R Shiny is an R package that makes it easy to develop interactive web applications straight from R. You can create, develop, and refine Shiny apps in RStudio, whether to create a unique data visualization dashboard or publish applications into different places, for example to deployment spaces.
Creating R scripts and Shiny apps
The Git repository that is referenced in the project is initially cloned into the project storage when the project is created. You can view the current versions of your files in the Git repository in the Files browser at the lower right of the
RStudio GUI in the folder called userfs
. You must make all your changes to your R files in that folder or any subfolders to be able to sync with Git.
The clone is pulled from the Git repository branch that you see next to the Git icon on the project's action bar.
If a folder or subfolder contains RShiny App files (that is files with the names app.R
, ui.R
or server.R
), all files in that folder are considered to belong to the Shiny app (including .R files). Otherwise,
all .R
files are considered R script assets.
-
Optional: Add collaborators to the project if you want to work on the same file with others. See Collaboration.
-
Optional: Preinstall any R libraries that you source for your Shiny app from an external network at the global location
/cc-home/_global_/R
or in a persistent storage volume to avoid installing these libraries every time the Shiny app is deployed. Make sure that you are connected to the storage volume when you deploy the Shiny app. -
To start working on R scripts:
-
Select New File > R Script or upload an R file from your local machine.
-
Save your file changes. Don't save the file under
userfs/assets
. Theassets
directory is reserved for files that are related to project assets. -
You can test your scripts before you commit them to the Git repository by creating a job from your project's Jobs page and running it.
- From the project's Jobs page, select New job.
- Select your file, enter the job settings, and run it.
- Validate the run results by clicking the job run.
-
-
Or start working on Shiny apps:
- Click New File > Shiny Web App.... A new Shiny application creation window pops up.
- Enter a name for your Shiny application and leave
userfs
as the Create within directory setting. To enable syncing with the Git repository, you must work in this directory or any of its subdirectories. Do not work in the/assets
directory. This directory is reserved for files that are related to project assets. Bothapp.R
andui.R
/server.R
contain instructions that are needed to build your app and provide a sample app that you can test run. - You can choose to create a single file application (
app.R
) if your application is simple and can be contained within one file, for simplicity. - Or, you can choose to create an application that uses multiple files (
ui.R
/server.R
) if your application is more complex and needs to have its different facets edited separately. - When you are done with configuration, click Create.
- You can test run your app by clicking Run App. When you click Run App, a pop-up window that contains your application shows on the screen.
-
You can use data from a data set in your scripts or apps. Supported formats of data sets include text, CSV, SPSS, SAS, and Stata. You can use data assets that are already imported into the project by clicking Import Dataset under the Environment tab, or clicking File and browsing for the file under
userfs/assets/data-asset
, or uploading them locally by clicking Upload in the Data panel on the lower right. You can preview the data assets in the editing panel.Note: You can't preview data sets larger than 5 MB in RStudio. -
When your files are ready, push your changes to the Git repository:
-
By clicking the Git version control menu from the menu bar of the main editing panel:
- Click Commit and select all the changed files that you would like to push to the Git repository.
- Add a change description and commit your staged changes to the local clone of your repository in your RStudio session.
- Click Push to push your changes to the remote repository where your changes can be seen and accessed by other users.
- By clicking Pull in the Git actions panel, you can also pull file changes that were made by collaborators to your repository clone.
-
By selecting the Git icon from the project's action bar:
- Click Commit from the menu.
- Add a description, select the Git repository branch that you chose for the project, select the files and commit the changes.
- Click Push from the Git menu to push your commits to the repository.
The R files in the Git repository that are pushed are not added as assets to the project's Assets page. However, you can select those files to run as jobs from the project's Jobs page.
-
Storing intermediate .rda files
You can store any intermediate files, for example .rda
and .md
files, logs or text files in any storage volume installed with Cloud Pak for Data. This storage volume is automatically mounted at the time that an RStudio
session is started. Thanks to this, these files can be accessed by all project collaborators, and in R Shiny applications or jobs that run R scripts.
For details on using a storage volume, see Managing storage volumes.
Working with data files
In RStudio, you can work with data files from different sources:
From the Files view in the RStudio UI, you can work with:
-
RStudio files and R scripts RStudio files and R scripts are stored in the directory called
userfs
.If you add data files directly under
userfs
, these files do not show up as Data assets in the project and can't be opened and previewed in RStudio. Also, if you want to go on and use these data files in a Watson Studio tool, for example in Data Refinery, you need to add the files as Data assets to the project, see Adding data files as project assets. -
Project data assets
Data assets are listed in the directory called
assets/data_asset
underuserfs
. You can open, view, and work with these assets in RStudio. If you add a regular file to this directory, the file is not automatically added as a Data asset to the project. To add a file as a Data asset to a project, see Adding project assets. After you add the file as a Data asset to the project, it can be used in different tools like Data Preview, Data Refinery, or SPSS Modeler in Watson Studio.It is not possible to open and view connected data assets cannot be in the
assets/data_asset
directory. You can access connected data assets programmatically only from an R script in RStudio. You currently cannot use theibm-watson-studio-lib
library in RStudio. -
Files in subdirectories under
userfs
The files that you create in the local file system of your RStudio session under
userfs
are persisted. If you stop RStudio, and restart again on another day for example, you will see all your files from previous sessions.
Loading and accessing data
Data loading options per compute engine type
Data loading options | Anaconda R distribution | R + Spark |
---|---|---|
Load data into a sparkSessionDataFrame | ✓ | |
Load data into an R data frame | ✓ | ✓ |
Generating code that loads data directly to RStudio
Loading data from local files
To generate code that inserts data from local files to RStudio:
- Click the Code snippets icon
and then click Read data.
- Select the data source from your project and then select Copy to clipboard.
- Paste the code in the RStudio file editor.
Supported file types:
- CSV/delimited files
- Excel files (.xls, .xlsx, .xlsm)
- JSON files
- SAS files
Loading data from data source connections
Before you can load data from an IBM data service or from an external data source, you must create or add a connection to your project. See Adding connections to projects.
To generate code that inserts data from database connections to RStudio:
- Click the Code snippets icon
and then click Read data.
- Select the connection from your project.
- Select the data source from the connection and then select Copy to clipboard.
- Paste the code in the RStudio file editor. The generated code serves as a quick start to begin working with a data set or connection. For production systems, carefully review the inserted code to determine whether you must write your own code that better meets your needs.
- If necessary, enter your personal credentials for locked data connections that are marked with the Key icon
. This is a one-time step that permanently unlocks the connection for you. After you unlock the connection, the key icon is no longer displayed. See Adding connections to projects.
- If no code can be generated for the connection, load the credentials and open the database connection that references your credentials. Write code to load the data.
RStudio supports the same database connections as Jupyter notebooks. For details, see Data load support in notebooks.
Adding data files as project assets
You should upload data files to use in RStudio by clicking the Upload asset to project icon on
your project's Assets page because these files are automatically added as Data assets to your project.
However, if you uploaded or created data files in RStudio, you can add these files to your project as project data assets. These files must be in the assets/data_asset
folder in RStudio. To add these files as data assets to the
project:
- On the Assets page of the project, click Import assets.
- Select Project files and the file in the
project_data_assets
folder that you want to add to the project as asset.
Running an R script as a job
You can run the script as a job in an RStudio environment in Watson Studio or on a remote Hadoop cluster. See:
-
To create a job to run an R script in an RStudio environment, see Creating code-based jobs.
-
To create a job to run an R script on a Hadoop cluster, you need a Hadoop cluster that supports R and R scripts. Additionally, you must enable the feature on the Hadoop cluster by modifying a configuration file. See Administering Apache Hadoop clusters, subsection
scriptLanguages
under Details on the content of the JSON files for more details. All the libraries that you need for your R script must be available on the cluster.To run a job on the Hadoop cluster, you must first create a Hadoop environment. After you create this Hadoop Yarn environment, you can select it when you create the job for the R script from the Jobs page of the project.
Creating a Hadoop Yarn environment
- The Watson Studio adminstrators must add the Hadoop cluster configuration to your platform.
- Open the drop-down menu from the sandwich button on Watson Studio's home page, and click on Configure Platform.
- Click on Add Registration to add the Hadoop cluster to the project's configuration.
- Now go to your project, click on the Environments page. Click on New template to create a custom environment.
- After you give the custom environment a name, select Hadoop as the environment type.
- Select the Hadoop configuration that you want to use.
- A Hadoop cluster set up for R scripts needs to be able to use Yarn, as certain R scripts require usage of Yarn. If the cluster is set up correctly, a field called Execution type appears, in which the user can select Yarn as the execution type. If you do not see an option for Execution type, it is likely that your Hadoop admin did not set up the Hadoop cluster and configuration file to support the R environment. When the setup is done on the Hadoop side, your admin must refresh the Hadoop registration before Execution Type option would be available. You can select "Yarn" to run R script.
- Select the language, Yarn size, and Yarn container memory. These fields are bounded by the admin's settings.
- Click Create to complete the creation of the environment.
- You can change the default settings of the custom environment (for example, increase or decrease the memory of the Yarn container) later by clicking on the environment under the Environments page.
Deploying scripts in a space
You can move assets from a project with default Git integration to a deployment space by:
- Creating a Git archive file (a ZIP file that contains the contents of your repository from a particular branch or tag) in your Git provider's user interface.
- Importing this ZIP file into an existing deployment space.
As the result, a Code Package asset is created that contains all the code files that you created using RStudio. See Importing spaces and projects into existing deployment spaces.
Working with prompts
If the watsonx.ai service is installed on your cluster, you can add various sample prompts for specific models into your R code. To add a sample prompt, click the Code snippets icon , select Prompt Engineering, and browse the various categories to find a sample prompt. When you select a prompt, click Copy to clipboard and then paste the code
in the RStudio file editor.
Learn more
- Creating a job to run an R script
- RStudio Overview
- Hadoop Environments
- Using Spark in RStudio
- Using libs from Anaconda Repository
- Accessing data in MySQL databases by using the RMariaDB library
- Connecting your Shiny application to a persistent storage volume
Parent topic: RStudio