Analyzing with Python

You can run Python scripts that retrieve data from and write data to a Db2® database.

Such scripts can be powerful tools to help you analyze your data. For example, you can use them to generate statistical models based on data in your database, and to plot the results of these models. Scripts can be run directly from within a Python execution environment, or from within a Jupyter notebook.

Important: This functionality is provided as-is and leverages an inactive project.

In-database analytics using Python

To process data, most native Python functions require that the data first is extracted from a database to working memory. Such a function is called an in-application function. A different type of function, called an in-database function, operates directly on data in a database, without requiring the data to be extracted. Consequently, you can use an in-database function to analyze large amounts of data that would be impractical or impossible to extract.

In-database functions can use performance-enhancing features of the underlying database management system, such as columnar technology. Using in-database functions also avoids security issues that are associated with extracting data and ensures that the data that is being analyzed is as current as possible. Some in-database functions additionally use lazy loading to load only those parts of the data that are actually required, to further increase efficiency.

Both in-application and in-database functions are equally easy to use. An in-application function operates on a construct called a data frame, which is a container that holds, in memory, a copy of the data to be processed. An in-database function operates on a similar construct called an IDA data frame. An IDA data frame does not hold any data directly. Instead, it holds a reference to a table or view in the database or to a selection of rows and columns within that table or view. When a function or method is applied to an IDA data frame, it is usually not run in the application, but is translated into an SQL query. The query is then run against the database, and the result is translated into a Python object.

The ibmdbPy package

The ibmdbPy package provides methods to read data from, write data to, and sample data from a Db2 database. It also provides access methods for in-database analytic or geospatial functions. To download the ibmdbPy package and for more information, visit the following websites:

Jupyter notebooks

You can connect to a Db2 database from within a Jupyter notebook. The notebook can run Python code, including code that uses ibmdbPy functions. To download sample notebooks and for more information, visit the following websites: