Create a CSV file dataset

Create a dataset from CSV files. Deep Learning Impact supports a CSV file dataset type. The CSV file dataset type requires two CSV files, one named label.csv and one named data.csv. Both files must contain header information in the first row.

Before you begin

Prepare your CSV file directories for testing, training and validation. The CSV files must be named label.csv and data.csv. For example:
/dir/test_db
    ├── label.csv
    ├── data.csv
/dir/train_db
    ├── label.csv
    ├── data.csv
/dir/val_db
    ├── label.csv
    ├── data.csv
You must also ensure that the first row in label.csv and data.csv specifies header information.

Procedure

  1. From the cluster management console, select Workload > Deep Learning.
  2. Select the Datasets tab.
  3. Click New.
  4. Create a dataset from CSV Files.
  5. Provide a dataset name.
  6. Specify a instance group.
  7. Provide a training folder.
    The full absolute path to the training folder must be provided.
  8. Specify how the training images are selected from one of the following choices.
    • Specify percentages of training files. For this choice, provide the following.
      1. Provide the percentage of CSV files for validation.
      2. Provide the percentage of CSV files for testing.
      3. Specify a split algorithm.
    • Specify folder locations. For this choice, provide the following.
      1. Provide a validation folder. For example, /dir/val_db.
      2. Provide a testing folder. For example, /dir/test_db.
  9. Click Create.

Results

The dataset is created once it is in Created state. If creation failed, see the driver and executor logs in the Spark Applications tab.

What to do next

To view details about the dataset, click the dataset name. To use the dataset in a training run, either create a training model or start a training run.