Datasets

IBM Spectrum Conductor Deep Learning Impact supports various types of datasets including LMDB and TFRecord. Each dataset can include training data, test data and validation data.

IBM Spectrum Conductor Deep Learning Impact requires that the dataset has at least training and test data. However, if you plan to use the dataset for validation, make sure to include all three data types as part of your dataset. Data types include:
  • Training data: The sample of data used for learning.
  • Test data: The sample of data used to evaluate the model during the training phase.
  • Validation data: The sample of data used to evaluate the final model.

IBM Spectrum Conductor Deep Learning Impact assumes that you have collected your raw data and labeled the raw data using a label file or organized the data into folders. In order to create a dataset, you must put the raw data in a folder on the shared file system that IBM Spectrum Conductor Deep Learning Impact has access to. The raw data must be in one of the formats accepted by IBM Spectrum Conductor Deep Learning Impact. The egoadmin and execute user must have read and write permissions to the folder.