Create a CSV file dataset

Create a dataset from CSV files.

Procedure

  1. From the cluster management console, select Workload > Spark > Deep Learning.
  2. Select the Datasets tab.
  3. Click New.
  4. Create a dataset from CSV Files.
  5. Provide a dataset name.
  6. Specify a Spark instance group.
  7. Provide a training folder.
    The full absolute path to the training folder must be provided.
  8. Specify how the training images are selected from one of the following choices.
    • Specify percentages of training files. For this choice, provide the following.
      1. Provide the percentage of CSV files for validation.
      2. Provide the percentage of CSV files for testing.
      3. Specify a split algorithm.
    • Specify folder locations. For this choice, provide the following.
      1. Provide a validation folder.
      2. Provide a testing folder.
  9. Enable redefine plug-in. If enabled, specify a plug-in file.
  10. Click Create.

Results

The dataset is created once it is in Created state. If creation failed, see the driver and executor logs in the Spark Applications tab.

What to do next

To view details about the dataset, click the dataset name. To use the dataset in a training run, either create a training model or start a training run.