Create an image dataset from object classification

Create a dataset from images for object classification.

About this task

Create am image dataset for the purposes of object classification. Each folder in the dataset, one for testing, training, and validation, has images that are organized by class labels.
/dir/train
    ├── label1
           ├── a.png
           └── b.png
    ├── label2
           ├── c.png
           └── d.png

Procedure

  1. From the cluster management console, select Workload > Spark > Deep Learning.
  2. Select the Datasets tab.
  3. Click New.
  4. Create a dataset from Images for Object Classification.
  5. Provide a dataset name.
  6. Specify a Spark instance group.
  7. Specify image storage format, either LMDB for Caffe or TFRecords for TensorFlow.
    • If TFRecords was selected, select how to generate records, either by shard or class. If shard is selected, specify the shard number.
  8. Specify how training images are selected.
    • Specifying the location of a folder
    1. Specify the location of the training folder
    2. Specify how the training images are selected from one of the following choices.
      • Specify percentages of training images. For this choice, provide the following.
        1. Provide the percentage of images for validation.
        2. Provide the percentage of images for testing.
        3. Specify a split algorithm.
      • Specify folder locations. For this choice, provide the following.
        1. Provide a validation folder.
        2. Provide a testing folder.
    • Specifying the location of a .txt file that contains image locations

      The .txt files must include the location of each image and the classifying label that the image belongs to.

      For example, a train.txt file includes the following image locations and classifiers:
      /dli-fs/dataset/cifar10/train/frog/leptodactylus_pentadactylus_s_000004.png 6
      /dli-fs/dataset/cifar10/train/truck/camion_s_000148.png 9
      /dli-fs/dataset/cifar10/train/truck/tipper_truck_s_001250.png 9
      /dli-fs/dataset/cifar10/train/deer/american_elk_s_001521.png 4
      /dli-fs/dataset/cifar10/train/automobile/station_wagon_s_000293.png 1
      /dli-fs/dataset/cifar10/train/automobile/coupe_s_001735.png 1
      /dli-fs/dataset/cifar10/train/bird/cassowary_s_001300.png 2
      /dli-fs/dataset/cifar10/train/horse/cow_pony_s_001168.png 7
      Then the corresponding labels.txt file, includes the classifiers list where the list numbering starts at 0.
      
      airplane
      automobile
      bird
      cat
      deer
      dog
      frog
      horse
      ship
      truck
    1. Specify configuration file for training.
    2. Specify the label file.
    3. Specify configuration file for validation.
    4. Specify configuration file for testing.
  9. Select an image output color.
  10. Enable image resize. If enabled, specify the following.
    Note: A mean file is only generated if the width and height specified for the resized images is different then the original images.
    1. Output image width.
    2. Output image height.
    3. Image resize transformation.
  11. Click Create.

Results

The dataset is created once it is in Created state. If creation failed, see the driver and executor logs in the Spark Applications tab.

What to do next

To view details about the dataset, click the dataset name. To use the dataset in a training run, either create a training model or start a training run.