Create an LMDB dataset

Create a dataset from lightning memory-mapped databases (LMDBs). An LMDB dataset can be used to train Caffe models.

Before you begin

Before creating an LMDB dataset in the cluster management console, make sure that your dataset resides on the shared file system. The dataset must have its own directory, where each data type has its own sub directory.

For example, if you have a dataset named cifar10, the parent directory named cifar10 includes a directory for training, validation (optional) and testing:
/home/egoadmin/shared/datasets/cifar10/train-db
/home/egoadmin/shared/datasets/cifar10/validate-db
/home/egoadmin/shared/datasets/cifar10/test-db

Procedure

  1. From the cluster management console, select Workload > Spark > Deep Learning.
  2. Select the Datasets tab.
  3. Click New.
  4. Create a dataset from LMDBs.
  5. Provide a dataset name.
  6. Specify a Spark instance group.
  7. Provide a training folder.
    The full absolute path to the training folder must be provided.
    The folder must contain an LMDB.
  8. Provide a validation folder.
    The full absolute path to the validation folder can be provided. To use this dataset for validation, you must specify a validation folder. Otherwise, this dataset cannot be used to validate a training model.
    The folder must contain an LMDB.
  9. Provide a testing folder.
    The full absolute path to the training folder must be provided.
    The folder must contain an LMDB.
  10. Provide a mean file.
  11. Provide a label file.
  12. Click Create.

Results

The dataset is created once it is in Created state. If creation failed, see the driver and executor logs in the Spark Applications tab.

What to do next

To view details about the dataset, click the dataset name. To use the dataset in a training run, either create a training model or start a training run.