Organizing your code files

When you work on Python scripts and notebook files in JupyterLab, you might split your coding into multiple files for easier maintenance and to enable using utility functions in several files without copying their definitions into each file. You do this in Python using modules that contain the Python definitions and statements, and packages which are collections of modules.

For details, see Modules and packages. The Python code in one module is made available to Python code in another model by using Python imports. For details, see The import system.

If you want to run these files in JupyterLab or as a code package asset in a deployment space in a uniform and sound manner, it is important that you organizing these files correctly.

The following sections describe the recommended best practices to ensure the proper working of Python imports in files in JupyterLab and in jobs that run code packages in deployment spaces.

Structuring files in JupyterLab

When you open JupyterLab, you will see the following 2 folders on the File Browser tab in the left sidebar:

assettypes
assets

These folders are important as they contain project assets like Data Refinery flows and data assets, and should not be used to store any of your code files.

Let's assume you have a notebook file (main.ipynb) or a Python script (main.py) which start the execution of your code files (also referred to as a main module) and you have 2 additional helper files in a folder named utils.

You can store the files as follows:

Sample 1:

assettypes
assets
main.py
main.iypnb
utils
    helper.py
    helper2.py

Or you can store the files to better differentiate between code and non-code files as follows:

Sample 2:

assettypes
assets
src
    main.py
    main.iypnb
    utils
        helper.py
        helper2.py

Importing modules from the main module

To import the helper module from the main module named main.py or main.iypnb, use:

  • For the folder structure in sample 1:

    import utils.helper as helper
    
  • For the folder structure in sample 2:

    import src.utils.helper as helper
    

Some tools will allow you to use the path import utils.helper as helper if you use a folder structure as depicted in sample 2 because the folder of the running script or notebook is automatically added to the Python path. However, as that is not done in all tools, you should always use absolute imports. See Intra-package References.

If you use a relative path in a module that is used as a main module, and your tool doesn't support relative paths, you might get an import error message like:

ImportError: attempted relative import with no known parent package

Importing modules from other modules

In modules that are not intended to be a main module (that never start the execution but are always imported), you can use relative file path links.

For example, if you want helper.py to include code from helper2.py, you can use:

from . import helper2

Structuring files in code packages

If you want to deploy your code as a code package in a deployment space, you need to ensure that the ZIP archive file that you create includes all the files you need to run your code in the same folder structure that you used in JuypterLab to ensure that your coded imports will resolve correctly.

ZIP folder structure for sample 1:

main.py
main.iypnb
utils
    helper.py
    helper2.py

ZIP folder structure for sample 2:

src
    main.py
    main.iypnb
    utils
        helper.py
        helper2.py

If you don't include src in the ZIP file for sample 2, your absolute imports will no longer work.

Folder naming convention for packages

You need to be careful when you choose a folder name for a package that the name that you pick is not the name of a module that is imported via pre-installed libraries in your running environment as this will result in a conflict.

For example, if in sample 2, you had used the folder named code instead of src and you ran the following code in a Default Python environment:

import code.utils

you would see an error message stating that there is no module named code.utils and that code is not a package.

This error occurs because a module named code already exists and therefore code can't be a package.

If you are unsure about choosing names for folder packages, you can test by running the following:

import myproposedfoldername

You should get the following response:

ModuleNotFoundError: No module named 'myproposedfoldername'

If you don't see that response, the name is already used by a module in the system and should not be used.

Parent topic: JupyterLab with default Git integration