VS Code development environment - Spark labs
The VS Code development environment is a Spark-based development environment that enables you to interactively program, debug, submit, and test Spark applications on a Spark cluster running on the Spark engine.
It is available as a Visual Studio Code extension and you can install it in your local system to access Spark IDE using Visual Studio Code. It reduces the time for development and increases usability.
Before you begin
- Install a desktop version of Visual Studio Code.
- Install watsonx.data extension from VS Code Marketplace. Ensure that the Spark engine is started and is in running status.
- Install the extension
Remote - SSHfrom Visual Studio Code marketplace.
Important: As Spark labs are ephemeral in nature, you must back up the data stored
periodically to prevent potential data loss during upgrades or a Spark master crash.
About this task
- Setting up the Spark labs
-
- Open Visual Studio Code. You view the watsonx.data icon in the left navigation window. Click the icon. The Welcome to IBM watsonx.data extension window opens.
- Click Manage Connection. The Manage Connection watsonx.data window opens.
- Configure one of the following details:
- JSON Inputs
- Form Inputs
- To configure JSON Inputs, click JSON Inputs and
specify the following details:
- API Key : Provide the platform API key. To generate the API key, see Platform API key.
- Connection JSON : Provide the connection details from the watsonx.data user interface. To do that:
- Log in to your watsonx.data page.
- From the navigation menu, click Connection Information.
- Click VS Code. Copy the configuration from the VS Code connection configuration field and use this as the Connection JSON field value. For more information, see Getting connection information.
- To configure Form Inputs, click Form Inputs and
specify the following details:
- Host address of watsonx.data console : Provide the host IP address of watsonx.data. To retrieve the host IP address, see Getting connection information.
- Environment Type : Select Software.
- Username : The watsonx.data login username.
- API Key : Provide the platform API key. To generate the API key, see Platform API key.
- Click Test & Save.
Retrieved Spark Clustersmessage is displayed. The available Spark engines are displayed in the WATSONX.DATA:ENGINES section. - Create a Spark lab.
- To create a new Spark lab, from the WATSONX.DATA:ENGINES section, select
the required Spark cluster and click the + icon (Add cluster) against it. The
Create Spark Lab window opens. Specify a unique name for the Spark lab and
select the Spark Version. The default Spark version is
3.5. You can modify the other optional fields if required.Note: Thespark.hadoop.wxd.apikeyparameter is configured in the Spark configurations field by default while creating Spark lab.Note: The Visual Studio Code development environment does not support Apache Spark version 4.0 when using watsonx.data Spark engine, and it also does not support Spark version 3.5 when using the Apache Gluten accelerated Spark engine. - Click Create. Click Refresh to see the Spark lab in the left window. This is the dedicated Spark cluster for application development.
- Click to open the Spark lab window to access the file system, terminal, and work with it.
- In the Explorer menu, you can view the file system, where you can upload the files, and view logs.
Note: To delete an already running Spark lab, hover the mouse over the name of the Spark lab in the watsonx.data left navigation pane and click on Delete icon. - To create a new Spark lab, from the WATSONX.DATA:ENGINES section, select
the required Spark cluster and click the + icon (Add cluster) against it. The
Create Spark Lab window opens. Specify a unique name for the Spark lab and
select the Spark Version. The default Spark version is
- Developing a Spark application
- Develop a Spark application in the Spark lab. You can work with a Spark application in one of
the following ways:
- Create your own Python file
-
- From Visual Studio Code, click the Spark lab. A new window opens.
- In the new Spark lab window, click New File. You get a New
File prompt with the following file types:
- Text File : Select to create a text file.
- Python File : Select to create a Python application.
- Jupyter Notebook: Select to create a Jupyter Notebook file.
- Select Python File. A new
.pyfile opens. You can start working on the Python file and save it laterNote: You can also drag the Python application file to the Explorer page. The file opens in the right pane of Visual Studio Code application. - Run the following command in the terminal to execute your Python application. This initiates a
Python session and you can see the acknowledgment message in the
terminal.
python <filename>
- Create Jupyter Notebooks
-
- From Visual Studio Code, click the Spark lab. A new window opens.
- Install the
Jupyterextension in the new Spark lab window to work with Jupyter Notebooks.From the Extensions menu in the new Spark lab window, browse for the
Jupyterextension (You can also find it from the VS Code Marketplace) and install the extension.Note: Make sure that you install theJupyterextension from the new Spark lab window. - In the Explorer page, click New File. You get a
New File prompt with the following file types:
- Text File : Select to create a text file.
- Python File : Select to create a Python application.
- Jupyter Notebook: Select to create a Jupyter Notebook file.
- Select Jupyter Notebook. A new
.ipynbfile opens. You can start working on the Jupyter Notebook file and save it later. - From the Jupyter Notebook file, click the Select Kernel link.
- Select Python Environment to run your file.
- Select the file path that contains
conda/envs/python/bin/python. - The Jupyter Notebook is now ready to use. You can write your code and execute it cell by
cell.Note: When you save the file, the file path is automatically displayed in the Save As prompt. You can modify the path or Click OK to save.
- Limitations
- If your environment uses an IBM Power system with a ppc64le processor, you won’t be able to run Spark Labs to achieve the intended development experience, as Spark Labs is designed for use on x86_64-based systems.