Working with watsonx.ai Jupyter Notebook

watsonx.data integrates with watsonx.ai to allow a web-based working experience with Jupyter Notebook. You can use the watsonx.ai interface to build your own code in the Jupyter Notebook, and run it by using watsonx.data Spark as the runtime environment.

Applies to :

Spark engine

Apache Gluten accelerated Spark engine

For more information, see Notebooks and scripts (Watson Studio).

Before you begin

Install watsonx.ai on your system and create a watsonx.ai project. For information, see Creating a project.

Required permissions
To work with watsonx.ai Jupyter Notebook, you must have User role to Project or User role to engine.

Running Jupyter Notebook by using native Spark Engine

To run the Jupyter Notebook on your watsonx.data spark engine, do the following:
  1. Create watsonx.ai project
    To create a watsonx.ai project, see Creating a project.
  2. Create a Spark engine environment
    To run a Jupyter Notebook, you must create a runtime environment template.
    To do that, access the watsonx. ai project from the UI. Go toManage tab. Create a template.
    For more information about creating environment templates, see Creating environment templates..
    While creating the template, select Type as Spark and from the Spark engine list, select the native Spark engine that you provisioned in the watsonx.data instance.
  3. Create a Jupyter Notebook asset and access it from the Jupyter Notebook editor tool
    To create a notebook file in the notebook editor, see Creating a notebook file in the notebook editor.
    When you create the notebook, specify the runtime environment to be considered for the watsonx.data spark engine.
    The notebook opens in edit mode. You can start working on it. For more information, see Creating a notebook file in the notebook editor.

Accessing the watsonx.data catalog

Run the notebook to access the watsonx.data catalog
Add the following code snippet in the notebook cell and run it. The code snippet includes configurations that are required to connect to the associated watsonx.data catalog.
conf=spark.sparkContext.getConf()
spark.stop()

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, to_utc_timestamp
import base64,getpass

wxd_username=getpass.getpass("Please enter your username with hms access:").strip() #Prompt for username
wxd_hms_username="ibmlhapikey_"+wxd_username
wxd_hms_password=getpass.getpass("Please enter your api key with hms access:").strip() #Prompt for api key
string_to_encode=wxd_username+":"+wxd_hms_password
wxd_encoded_apikey="ZenApiKey "+base64.b64encode(string_to_encode.encode("utf-8")).decode("utf-8")

conf.setAll([("spark.hive.metastore.client.plain.username", wxd_hms_username), \
    ("spark.hive.metastore.client.plain.password", wxd_hms_password), \
    ('spark.hadoop.hive.wxd.user.name', wxd_username), \
    ("spark.hadoop.wxd.apikey", wxd_encoded_apikey)
])

spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate()
When you run the code snippet. It prompts for the username and password. Username is the username whose api key is used to access the data bucket. Note: You must have `Metastore Admin` access on Hive Metastore. For more information, see Managing access to the Metadata Service (MDS). The Zen API Key here is the API key of the user accessing the Object store bucket. To generate API key, log in into the watsonx.data console and navigate to Profile > Profile and Settings > API Keys and generate a new API key.
Note: You can add more code snippets based on your use case and continue. For more information, see Creating a notebook file in the notebook editor.