Create an LMDB dataset
Create a dataset from lightning memory-mapped databases (LMDBs). An LMDB dataset can be used to train Caffe models.
Before you begin
Before creating an LMDB dataset in the cluster management console, make sure that your dataset resides on the shared file system. The dataset must have its own directory, where each data type has its own sub directory.
For example, if you have a dataset named cifar10, the parent directory named
cifar10 includes a directory for training, validation (optional) and
testing:
/home/egoadmin/shared/datasets/cifar10/train-db
/home/egoadmin/shared/datasets/cifar10/validate-db
/home/egoadmin/shared/datasets/cifar10/test-db
Procedure
- From the cluster management console, select .
- Select the Datasets tab.
- Click New.
- Create a dataset from LMDBs.
- Provide a dataset name.
- Specify a Spark instance group.
- Provide a training folder. The full absolute path to the training folder must be provided.The folder must contain an LMDB.
- Provide a validation folder. The full absolute path to the validation folder can be provided. To use this dataset for validation, you must specify a validation folder. Otherwise, this dataset cannot be used to validate a training model.The folder must contain an LMDB.
- Provide a testing folder. The full absolute path to the training folder must be provided.The folder must contain an LMDB.
- Provide a mean file.
- Provide a label file.
- Click Create.
Results
The dataset is created once it is in Created state. If creation failed, see the driver and executor logs in the Spark Applications tab.
What to do next
To view details about the dataset, click the dataset name. To use the dataset in a training run, either create a training model or start a training run.