Create a notebook
After you set up a project, you can create notebooks to analyze data.
Create the notebook file
To create or get a notebook file in your project:
- From your project, click the add notebook link.
- In the Create Notebook window, specify the method to use to create your
notebook. You can:
- Create a blank notebook.
- Upload a notebook file from your file system. You must select a .ipynb file.
- Upload a notebook file from a URL. You must specify the URL of a You must specify a .ipynb file.
- Specify the rest of the details for your notebook.
- Click Create Notebook.
Alternatively, you can copy a sample notebook from the community page. The sample notebooks are
based on real-world scenarios and contain useful examples of computations and visualizations that
you can adapt to your analysis needs. To work with a copy of the sample notebook, click the
Open Notebook icon (
) and specify your project and the Spark service for the notebook.
For information about the notebook interface, see parts of a notebook.
Create the SparkContext
A SparkContext setup is required to connect Jupyter notebooks to the Spark
execution environment. See the sample notebooks for guidance on the setup.
By default, SparkContext is not set up for R notebooks. You can modify one of
the following templates to create a SparkContext setup for R notebooks:
- sparklyr library
- For Python 2.7:
- SparkR library
- For Python 2.7, use
master="spark://spark-master-svc:7077". For Python 3.5, usemaster="spark://spark-master221-svc:7077".
Set Spark resources
Based on the user cases, you might need to change the resources allocated for the Spark application. The default settings for Spark are as follows:
| Parameter | Default | Meaning |
|---|---|---|
spark.cores.max |
3 | The maximum amount of CPU cores to request for the application from across the cluster (not from each machine). |
spark.dynamicAllocation.initialExecutors |
3 | Initial number of executors to run. |
spark.executor.cores |
1 | The number of cores to use on each executor. |
spark.executor.memory |
4g | Amount of memory to use per executor process. |
- Stop the pre-created
scand then create a new spark context with the proper resource configuration. Python example:sc.stop() from pyspark import SparkConf, SparkContext conf = (SparkConf() .set("spark.cores.max", "15") .set("spark.dynamicAllocation.initialExecutors", "3") .set("spark.executor.cores", "5") .set("spark.executor.memory", "6g")) sc=SparkContext(conf=conf) - Verify the new settings by running the following command in a cell using the new
sc:for item in sorted(sc._conf.getAll()): print(item)
Analyze data in the notebook
Now you're ready for the real work to begin!
Typically, you'll install any necessary libraries, load the data, and then start analyzing it. You and your collaborators can prepare the data, visualize data, make predictions, make prescriptive recommendations, and more.
- The default auto-save interval in a Jupyter notebook is 20 seconds. If want to save the notebook
immediately, click the save button. To change the autosave interval for an individual notebook, use
the
%autosavemagic command in the cell, for example,%autosave 5. - When a notebook runs the
%%javascript Jupyter.notebook.session.delete();command to stop the kernel, note that the preceding cell might still appear to be running ([*]) even though it has actually finished.