Working in RStudio with default Git integration (Watson Studio)

You can create R scripts and R Shiny applications in projects with default Git integration.

R Shiny is an R package that makes it easy to develop interactive web applications straight from R. You can create, develop, and refine Shiny apps in RStudio, whether to create a unique data visualization dashboard or publish applications into different places, for example to deployment spaces.

Creating R scripts and Shiny apps

The Git repository that is referenced in the project is initially cloned into the project storage when the project is created. You can view the current versions of your files in the Git repository in the Files browser at the lower right of the RStudio GUI in the folder called userfs. You must make all your changes to your R files in that folder or any subfolders to be able to sync with Git.

Important:

The clone is pulled from the Git repository branch that you see next to the Git icon Git icon on the project's action bar.

If a folder or subfolder contains RShiny App files (that is files with the names app.R, ui.R or server.R), all files in that folder are considered to belong to the Shiny app (including .R files). Otherwise, all .R files are considered R script assets.

  1. Optional: Add collaborators to the project if you want to work on the same file with others. See Collaboration.

  2. Optional: Preinstall any R libraries that you source for your Shiny app from an external network at the global location /cc-home/_global_/R or in a persistent storage volume to avoid installing these libraries every time the Shiny app is deployed. Make sure that you are connected to the storage volume when you deploy the Shiny app.

  3. To start working on R scripts:

    1. Select New File > R Script or upload an R file from your local machine.

    2. Save your file changes. Don't save the file under userfs/assets. The assets directory is reserved for files that are related to project assets.

    3. You can test your scripts before you commit them to the Git repository by creating a job from your project's Jobs page and running it.

      1. From the project's Jobs page, select New job.
      2. Select your file, enter the job settings, and run it.
      3. Validate the run results by clicking the job run.
  4. Or start working on Shiny apps:

    1. Click New File > Shiny Web App.... A new Shiny application creation window pops up.
    2. Enter a name for your Shiny application and leave userfs as the Create within directory setting. To enable syncing with the Git repository, you must work in this directory or any of its subdirectories. Do not work in the /assets directory. This directory is reserved for files that are related to project assets. Both app.R and ui.R/server.R contain instructions that are needed to build your app and provide a sample app that you can test run.
    3. You can choose to create a single file application (app.R) if your application is simple and can be contained within one file, for simplicity.
    4. Or, you can choose to create an application that uses multiple files (ui.R/server.R) if your application is more complex and needs to have its different facets edited separately.
    5. When you are done with configuration, click Create.
    6. You can test run your app by clicking Run App. When you click Run App, a pop-up window that contains your application shows on the screen.
  5. You can use data from a data set in your scripts or apps. Supported formats of data sets include text, CSV, SPSS, SAS, and Stata. You can use data assets that are already imported into the project by clicking Import Dataset under the Environment tab, or clicking File and browsing for the file under userfs/assets/data-asset, or uploading them locally by clicking Upload in the Data panel on the lower right. You can preview the data assets in the editing panel.

    Note: You can't preview data sets larger than 5 MB in RStudio.
  6. When your files are ready, push your changes to the Git repository:

    • By clicking the Git version control menu from the menu bar of the main editing panel:

      1. Click Commit and select all the changed files that you would like to push to the Git repository.
      2. Add a change description and commit your staged changes to the local clone of your repository in your RStudio session.
      3. Click Push to push your changes to the remote repository where your changes can be seen and accessed by other users.
      4. By clicking Pull in the Git actions panel, you can also pull file changes that were made by collaborators to your repository clone.
    • By selecting the Git icon from the project's action bar:

      1. Click Commit from the menu.
      2. Add a description, select the Git repository branch that you chose for the project, select the files and commit the changes.
      3. Click Push from the Git menu to push your commits to the repository.

    The R files in the Git repository that are pushed are not added as assets to the project's Assets page. However, you can select those files to run as jobs from the project's Jobs page.

Using shared certificates

You can use shared, custom, platform-level CA certificates in RStudio IDE in projects with default Git integration. Custom platform-level certificates are installed by admins in a centralized location as Kubernetes secrets (in the cpd-custom-ca-certs secret) so that multiple users can use the same certificate while working with multiple services. The secret is automatically inserted when RStudio IDE starts so you automatically have access to these certificates. For more information, see Creating a secret to store shared custom certificates.

Storing intermediate .rda files

You can store any intermediate files, for example .rda and .md files, logs or text files in any storage volume installed with Cloud Pak for Data. This storage volume is automatically mounted at the time that an RStudio session is started. Thanks to this, these files can be accessed by all project collaborators, and in R Shiny applications or jobs that run R scripts.

For details on using a storage volume, see Managing storage volumes.

Working with data files

In RStudio, you can work with data files from different sources:

From the Files view in the RStudio UI, you can work with:

  • RStudio files and R scripts RStudio files and R scripts are stored in the directory called userfs.

    If you add data files directly under userfs, these files do not show up as Data assets in the project and can't be opened and previewed in RStudio. Also, if you want to go on and use these data files in a Watson Studio tool, for example in Data Refinery, you need to add the files as Data assets to the project, see Adding data files as project assets.

  • Project data assets

    Data assets are listed in the directory called assets/data_asset under userfs. You can open, view, and work with these assets in RStudio. If you add a regular file to this directory, the file is not automatically added as a Data asset to the project. To add a file as a Data asset to a project, see Adding project assets. After you add the file as a Data asset to the project, it can be used in different tools like Data Preview, Data Refinery, or SPSS Modeler in Watson Studio.

    It is not possible to open and view connected data assets cannot be in the assets/data_asset directory. You can access connected data assets programmatically only from an R script in RStudio. You currently cannot use the ibm-watson-studio-lib library in RStudio.

  • Files in subdirectories under userfs

    The files that you create in the local file system of your RStudio session under userfs are persisted. If you stop RStudio, and restart again on another day for example, you will see all your files from previous sessions.

Loading and accessing data

Data loading options per compute engine type

Data loading options per compute engine type
Data loading options Anaconda R distribution R + Spark
Load data into a sparkSessionDataFrame
Load data into an R data frame

Generating code that loads data directly to RStudio

Loading data from local files

To generate code that inserts data from local files to RStudio:

  1. Click the Code snippets icon Code snippets icon and then click Read data.
  2. Select the data source from your project and then select Copy to clipboard.
  3. Paste the code in the RStudio file editor.

Supported file types:

  • CSV/delimited files
  • Excel files (.xls, .xlsx, .xlsm)
  • JSON files
  • SAS files

Loading data from data source connections

Before you can load data from an IBM data service or from an external data source, you must create or add a connection to your project. See Adding connections to projects.

To generate code that inserts data from database connections to RStudio:

  1. Click the Code snippets icon Code snippets icon and then click Read data.
  2. Select the connection from your project.
  3. Select the data source from the connection and then select Copy to clipboard.
  4. Paste the code in the RStudio file editor. The generated code serves as a quick start to begin working with a data set or connection. For production systems, carefully review the inserted code to determine whether you must write your own code that better meets your needs.
  5. If necessary, enter your personal credentials for locked data connections that are marked with the Key icon Key icon. This is a one-time step that permanently unlocks the connection for you. After you unlock the connection, the key icon is no longer displayed. See Adding connections to projects.
  6. If no code can be generated for the connection, load the credentials and open the database connection that references your credentials. Write code to load the data.

RStudio supports the same database connections as Jupyter notebooks. For details, see Data load support in notebooks.

Adding data files as project assets

You should upload data files to use in RStudio by clicking the Upload asset to project icon Upload asset to project icon on your project's Assets page because these files are automatically added as Data assets to your project.

However, if you uploaded or created data files in RStudio, you can add these files to your project as project data assets. These files must be in the assets/data_asset folder in RStudio. To add these files as data assets to the project:

  1. On the Assets page of the project, click Import assets.
  2. Select Project files and the file in the project_data_assets folder that you want to add to the project as asset.

Running an R script as a job

You can run the script as a job in an RStudio environment in Watson Studio or on a remote Hadoop cluster. See:

  • To create a job to run an R script in an RStudio environment, see Creating code-based jobs.

  • To create a job to run an R script on a Hadoop cluster, you need a Hadoop cluster that supports R and R scripts. Additionally, you must enable the feature on the Hadoop cluster by modifying a configuration file. See Administering Apache Hadoop clusters, subsection scriptLanguages under Details on the content of the JSON files for more details. All the libraries that you need for your R script must be available on the cluster.

    To run a job on the Hadoop cluster, you must first create a Hadoop environment. After you create this Hadoop Yarn environment, you can select it when you create the job for the R script from the Jobs page of the project.

Creating a Hadoop Yarn environment

  1. The Watson Studio adminstrators must add the Hadoop cluster configuration to your platform.
    1. Open the drop-down menu from the sandwich button on Watson Studio's home page, and click on Configure Platform.
    2. Click on Add Registration to add the Hadoop cluster to the project's configuration.
  2. Now go to your project, click on the Environments page. Click on New template to create a custom environment.
  3. After you give the custom environment a name, select Hadoop as the environment type.
  4. Select the Hadoop configuration that you want to use.
  5. A Hadoop cluster set up for R scripts needs to be able to use Yarn, as certain R scripts require usage of Yarn. If the cluster is set up correctly, a field called Execution type appears, in which the user can select Yarn as the execution type. If you do not see an option for Execution type, it is likely that your Hadoop admin did not set up the Hadoop cluster and configuration file to support the R environment. When the setup is done on the Hadoop side, your admin must refresh the Hadoop registration before Execution Type option would be available. You can select "Yarn" to run R script.
  6. Select the language, Yarn size, and Yarn container memory. These fields are bounded by the admin's settings.
  7. Click Create to complete the creation of the environment.
  8. You can change the default settings of the custom environment (for example, increase or decrease the memory of the Yarn container) later by clicking on the environment under the Environments page.

Deploying scripts in a space

You can move assets from a project with default Git integration to a deployment space by:

  1. Creating a Git archive file (a ZIP file that contains the contents of your repository from a particular branch or tag) in your Git provider's user interface.
  2. Importing this ZIP file into an existing deployment space.

As the result, a Code Package asset is created that contains all the code files that you created using RStudio. See Importing spaces and projects into existing deployment spaces.

Working with prompts

If the watsonx.ai service is installed on your cluster, you can add various sample prompts for specific models into your R code. To add a sample prompt, click the Code snippets icon Code snippets icon, select Prompt Engineering, and browse the various categories to find a sample prompt. When you select a prompt, click Copy to clipboard and then paste the code in the RStudio file editor.

Learn more

Parent topic: RStudio