Specifying a database

Specify a database to use for model evaluation. IBM Watson OpenScale for IBM Cloud Pak for Data uses a Db2 database to store model-related data, such as scoring payload and feedback data, and calculated metrics. Lite Db2 plans are not currently supported. Model evaluation requires a table space with a page size of at least 32 kB (32768).

Connecting to your database

In addition to setting up a database, you can also select a schema for your database. A schema is a named collection of tables in the database. For Db2 options that are part of your cluster, see Services, Data Sources where you find options, such as Db2 Warehouse and Db2 Advanced Enterprise Server Edition. For an external database, you can use IBM Db2 Database.

Click the Configure icon and then click Database.
Select your Db2 database type from the Database drop-down menu, and then provide the following information:
- Hostname or IP address
- SSL Port
- Database (name)
- Username
- Password
After you successfully connect, you can select a schema and save your work. The schema name needs to be provided explicitly if you provide a Db2 instance with limited access, which does not allow the schema name to be automatically generated.

Supported Db2 databases for Data Mart

If you're using Data Mart, you can specify one of the following database options. Watson OpenScale is tested and verified with the following databases:

Db2 from the Cloud Pak for Data service catalog
Db2 Warehouse multinode (MPP) from the Cloud Pak for Data service catalog
Db2 Warehouse single node (SMP) from the Cloud Pak for Data service catalog
External Db2 with the following settings:
- Product name: “DB2 Enterprise Server Edition”
- Product identifier: “db2ese”
- Version information: "11.5 or later"

Db2 database performance tuning required for large data sets

For large Db2 data sets, such as the databases that support batch processing of millions of records, you might be required to adjust the size of the log file. The following symptoms and solution apply to any batch scenarios for which Watson OpenScale monitors fail.

Symptoms

The symptoms appear in log files during normal processing of the batch records.

The following error appears in the Spark job logs:

 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 10, 10.254.26.49, executor 0): com.ibm.db2.jcc.am.BatchUpdateException: [jcc][t4][102][10040][4.26.14] Batch failure.  The batch was submitted, but at least one exception occurred on an individual member of the batch.

 Use getNextException() to retrieve the exceptions for specific batched elements. ERRORCODE=-4229, SQLSTATE=null

The corresponding Db2 instance db2diag.log log file indicates that the log file is full: DIA8309C Log file was full

Solution

To resolve these issues, you must increase the size of the log space that is allocated. You can calculate how much space is required by using the following formula:

(LOGPRIMARY + LOGSECOND) * LOGFILSIZ * 4KB

where

LOGPRIMARY is the number of primary log files; LOGSECOND is the number of secondary log files; and LOGFILSIZ is the log file size.

From the command-line interface, run the following db2 update commands, substituting your database name and values, and then restart the database:

db2 update db cfg for <database name> using LOGFILSIZ 10000
db2 update db cfg for <database name> using LOGPRIMARY 80
db2 update db cfg for <database name> using LOGSECOND 40

Next steps

Parent topic: Preparing to evaluate models