Importing externally curated tags for COS/S3 using import tags application

The import tags application is used to import a set of externally curated tags for Cloud Object Storage and S3 services.

Before you begin

The S3/COS (Cloud Object Storage) data source that is associated with the objects in the external tags file must be scanned before you run an import tags policy.

Tag names must be defined in IBM Spectrum® Discover before the Import Tags policy is run. You must refrain from using Restricted tag type and use Open or Characteristics tag types.

Note:
  • You can create limited number of Open type tags and the tags must correspond to values in the header row of the comma-separated values (CSV) file. Tags that are not defined before you trigger the policy are not imported.
  • The CSV file must be in the bucket that is defined in the data source.
  • Only a single tag file is supported per policy.

Requirements of the external CSV file are listed as shown.

  • The tags file must be in CSV format.
  • The first row in the file must be a header row or line.
  • The first column must be the full object path or name. For example, if the bucket is auto_data, and the object name is car1/image1.png, then the first column entry is auto_data/car1/image1.png.
  • The value in the header row for the first column is not restricted by IBM Spectrum Discover.
  • The subsequent columns in the CSV file represent the tag values that can be imported into IBM Spectrum Discover for the associated object records.
  • The second through Nth entries in the header row must correspond to valid tags in IBM Spectrum Discover that are defined before you run the import tags policy.
  • Each entry in the CSV file must represent a unique file in the data source.
    Example contents of a CSV file:
    objectname,bus,tree,stop_sign,red_light,yellow_light,green_light,pedestrian
    auto_data/car1/image1.png,1,3,0,1,1,0,1
    auto_data/car2/image1.png,1,6,0,0,0,0,12
    auto_data/car2/image2.png,1,3,0,2,1,0,1
    auto_data/car3/image1.png,1,3,0,2,1,0,2
    

The following tags can be defined in IBM Spectrum Discover from the records available in the CSV file:

  • bus
  • tree
  • stop_sign
  • red_light
  • yellow_light
  • green_light
  • pedestrian
Note: Only a single tag import policy can be run at a time.

About this task

The IBM Spectrum Discover import tags application allows a user with DATA ADMIN role to apply a pre-curated set of labels (tags) that are available in an external CSV file to S3/COS object records stored in IBM Spectrum Discover.

For example, an external analytics job might generate tag information for a set of S3/COS objects, and save this information into a CSV file. The CSV file comprises an entry for each object that contains an object name and an associated list of labels or tags.

The import tags application can merge these tags into the associated object records in IBM Spectrum Discover, extending the records with new information.

Procedure

  1. Configure a COS/S3 data source connection for IBM Spectrum Discover, with the vault or bucket that gets scanned later.

    The resulting system metadata that is indexed in IBM Spectrum Discover is then enriched with the imported tag data.

    Example

    COS/S3 datasource connection name is camera_vault and the vault or bucket name is camera_lidar_semantic. Connection is created and scan is run.

    For more information about configuring and scanning COS/S3 data sources, see Configure data source connections. For more information about configuring and scanning COS/S3 data sources, see Configure data source connections in the Data Cataloging: Concepts, Planning, and Deployment Guide
  2. Place external tag file on to the data source so that IBM Spectrum Discover can access it.
    A single CSV file comprises a list of objects and tags. The objects and tags are applied to the indexed records for a COS/S3 data source. These objects and tags must be uploaded to the configured source storage bucket by using any IBM® COS/S3 compatible data management utility.
    Example
    IBM Spectrum Discover data source with connection name camera_vault is configured to scan vault and bucket camera_lidar_semantic.
  3. Define tag names in IBM Spectrum Discover.
    1. Use the headers from the CSV file, starting with the second column.
    2. Create corresponding tags in IBM Spectrum Discover for any of the columns where you want to import data.
    For example, for a column you want to import with header value ColumnA, create a tag ColumnA. The columns for which you do not want to import data, must not store tags that are defined with the header value.
    Example

    CSV file comprises columns with header values front, rear, center, car, bicycle, pedestrian, and truck that you want to import. Characteristics tags are created with names: front, rear, center, car, bicycle, pedestrian, and truck.

    For more information about creating tags, see Creating tags.
    For more information about creating multiple tags through REST API see, /policyengine/v1/tags/:POST.For more information about creating multiple tags through REST API, see /policyengine/v1/tags/:POST in the Data Cataloging: REST API Guide.
  4. Initiate the policy by the using REST API interface. For more information, see Initiating Policy Using REST API.