Curl Node Java Python

Discovery

API reference

Introduction

The IBM Watson™ Discovery Service is a cognitive search and content analytics engine that you can add to applications to identify patterns, trends and actionable insights to drive better decision-making. Securely unify structured and unstructured data with pre-enriched content, and use a simplified query language to eliminate the need for manual filtering of results.

For details about using Discovery, see the IBM® Cloud docs.

Authentication

IBM Cloud is migrating to token-based Identity and Access Management (IAM) authentication.

  • With some service instances, you authenticate to the API by using IAM. You can pass either a bearer token in an Authorization header or an API key. Tokens support authenticated requests without embedding service credentials in every call. API keys use basic authentication. Learn more about IAM.

  • In other instances, you authenticate by providing the username and password for the service instance. For more information, see Service credentials for Watson services.

To find out which authentication to use, view the service credentials by clicking the service instance on the Dashboard.

IAM authentication. Replace {apikey} and {url} with your service credentials.


curl -u "apikey:{apikey}" "{url}/{method}"
        

Basic authentication. Replace {username}, {password}, and {url} with your service credentials.


curl -u "{username}:{password}" "{url}/{method}"
        

Service endpoint

The service endpoint is based on the location of the service instance. If your API endpoint URL differs from the default, you must set your endpoint. For example, when Discovery is hosted in Sydney, the base URL is https://gateway-syd.watsonplatform.net. The URL might also be different when you use IBM Cloud Dedicated.

To find out which URL to use, view the service credentials by clicking the service instance on the Dashboard. Use that URL in your requests to Discovery. Set the correct service URL by calling the setEndPoint() method of the service instance. Set the correct service URL by using the url parameter when you create the service instance. Set the correct service URL by using the url parameter when you create the service instance or by calling the set_url() method of the service instance. Set the correct service URL by using the url parameter when you create the service instance or by calling the url= method of the service instance. Set the correct service URL by using the serviceURL property of the service instance.

Service endpoints by location:

  • US South: https://gateway.watsonplatform.net/discovery/api (Default)
  • US East: https://gateway-wdc.watsonplatform.net/discovery/api
  • Germany: https://gateway-fra.watsonplatform.net/discovery/api
  • Sydney: https://gateway-syd.watsonplatform.net/discovery/api
  • United Kingdom: https://gateway.watsonplatform.net/discovery/api

All locations might not support Discovery. For details, see Services by region.

US South API endpoint

https://gateway.watsonplatform.net/discovery/api

Your service instance might not use this URL

Default URL

https://gateway.watsonplatform.net/discovery/api

Examples for the US East location in the constructor and after instantiation


Discovery discovery = new Discovery("{version}");
discovery.setEndPoint("https://gateway-wdc.watsonplatform.net/discovery/api");
                

var  = require('watson-developer-cloud/');

var discovery = new ({
    version: '{version}',
    iam_apikey: '{iam_api_key}',
    url: 'https://gateway-wdc.watsonplatform.net/discovery/api'
});
    
                    

discovery = (
    version='{version}',
    iam_api_key='{iam_api_key}',
    url='https://gateway-wdc.watsonplatform.net/discovery/api'
)

                    

or


discovery.set_url('https://gateway-wdc.watsonplatform.net/discovery/api')
                    

discovery = IBMWatson::.new(
  version: "{version}",
  username: "{username}",
  password: "{password}",
  url: "https://gateway-wdc.watsonplatform.net/discovery/api"
)

                    

or


discovery.url = "https://gateway-wdc.watsonplatform.net/discovery/api"
                

let discovery = Discovery(username: "{username}", password: "{password}", version: "{version}")
discovery.serviceURL = "{url}"
                    

Versioning

API requests require a version parameter that takes a date in the format version=YYYY-MM-DD. When we change the API in a backwards-incompatible way, we release a new version date.

Send the version parameter with every API request. The service uses the API version for the date you specify, or the most recent version before that date. Don't default to the current date. Instead, specify a date that matches a version that is compatible with your app, and don't change it until your app is ready for a later version.

Specify the version to use on API requests with the version parameter when you create the service instance. The service uses the API version for the date you specify, or the most recent version before that date. Don't default to the current date. Instead, specify a date that matches a version that is compatible with your app, and don't change it until your app is ready for a later version.

Error handling

The Discovery service uses standard HTTP response codes to indicate whether a method completed successfully. A 200 response always indicates success. A 400 type response is some sort of failure, and a 500 type response usually indicates an internal system error. Response codes are listed with the method.

ErrorResponse
Name Description
code integer

The HTTP error status code.

error string

A message describing the error.

Data handling

Data labels

You can remove customer data if you associate the customer and the data when you send the information to a service. First you label the data with a customer ID, and then you can delete the data by the ID.

  • Use the X-Watson-Metadata header to associate a customer ID with the data. By adding a customer ID to a request, you indicate that it contains data that belongs to that customer.

    Specify a random or generic string for the customer ID. Do not include personal data, such as an email address. Pass the string customer_id={id} as the argument of the header. For more information about how to pass headers, see Additional headers.

  • Use the Delete labeled data method to remove data that is associated with a customer ID.

Labeling data is used only by methods that accept customer data. For more information about Discovery and labeling data, see Information security.

Data collection

By default, all Watson services log requests and their results. Logging is done only to improve the services for future users. The logged data is not shared or made public. To prevent IBM from accessing your data for general service improvements, set the X-Watson-Learning-Opt-Out request header to true for all requests. (Any value other than false or 0 disables request logging for that call.) You must set the header on each request that you do not want IBM to access for general service improvements. To prevent IBM from accessing your data for general service improvements, set the X-Watson-Learning-Opt-Out header parameter to true when you create the service instance. (Any value other than false or 0 disables request logging.) You can set the header using the setDefaultHeaders method of the service object. You can set the header using the headers parameter when you create the service object. You can set the header using the set_default_headers method of the service object. You can set the header by using the add_default_headers method of the service object.

Example request


curl -u "apikey:{apikey}" -H "X-Watson-Learning-Opt-Out: true" "{url}/{method}"
            

Map<String, String> headers = new HashMap<String, String>();
headers.put("X-Watson-Learning-Opt-Out", "true");

discovery.setDefaultHeaders(headers);
            

var  = require('watson-developer-cloud/');

var discovery = new ({
  version: '{version}',
  iam_apikey: '{iam_api_key}',
  url: '{url}',
  headers: {
    'X-Watson-Learning-Opt-Out': 'true'
  }
});
            

discovery.set_default_headers({'x-watson-learning-opt-out': "true"})
            

discovery.add_default_headers(headers: {"x-watson-learning-opt-out" => "true"})
            

let discovery = Discovery(apiKey: "{iam_api_key}")
discovery.defaultHeaders = ["X-Watson-Learning-Opt-Out": "true"]
            

Environments

Manage an environment to store your documents.

Create an environment

Creates a new environment for private data. An environment must be created before collections can be created.

Note: You can create only one environment for private data per service instance. An attempt to create another environment results in an error.

Request

POST /v1/environments
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body CreateEnvironmentRequest

An object that defines an environment name and optional description. The fields in this object are not approved for personal information and cannot be deleted based on customer ID.

CreateEnvironmentRequest
Name Description
name string

Name that identifies the environment.

description string

Description of the environment.

size string

Size of the environment.

Allowable values:
  • XS
  • S
  • MS
  • M
  • ML
  • L
  • XL
  • XXL
  • XXXL

XS

Example request

curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d '{
  "name": "my_environment",
  "description": "My environment"
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments?version=2018-03-05"
        
Example body

{
  "name": "my_environment",
  "description": "My environment"
}
        

Response

Environment

Details about an environment.

Name Description
environment_id string

Unique identifier for the environment.

name string

Name that identifies the environment.

description string

Description of the environment.

created DateTime

Creation date of the environment, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

Date of most recent environment update, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

Status of the environment.

Possible values:
  • active
  • pending
  • maintenance
read_only boolean

If true, the environment contains read-only collections that are maintained by IBM.

size string

Size of the environment.

Possible values:
  • XS
  • S
  • MS
  • M
  • ML
  • L
  • XL
  • XXL
  • XXXL
index_capacity IndexCapacity

Details about the resource usage and capacity of the environment.

IndexCapacity

Details about the resource usage and capacity of the environment.

Name Description
documents EnvironmentDocuments

Summary of the document usage statistics for the environment.

disk_usage DiskUsage

Summary of the disk usage of the environment.

collections CollectionUsage

Summary of the collection usage in the environment.

memory_usage MemoryUsage

Deprecated: Summary of the memory usage of the environment.

EnvironmentDocuments

Summary of the document usage statistics for the environment.

Name Description
indexed integer

Number of documents indexed for the environment.

maximum_allowed integer

Total number of documents allowed in the environment's capacity.

DiskUsage

Summary of the disk usage statistics for the environment.

Name Description
used_bytes integer

Number of bytes within the environment's disk capacity that are currently used to store data.

maximum_allowed_bytes integer

Total number of bytes available in the environment's disk capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's disk capacity.

used string

Deprecated: Amount of disk capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's disk capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's disk capacity that is being used.

CollectionUsage

Summary of the collection usage in the environment.

Name Description
available integer

Number of active collections in the environment.

maximum_allowed integer

Total number of collections allowed in the environment.

MemoryUsage

Deprecated: Summary of the memory usage statistics for this environment.

Name Description
used_bytes integer

Deprecated: Number of bytes used in the environment's memory capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's memory capacity.

used string

Deprecated: Amount of memory capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's memory capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's memory capacity that is being used.

Example response


{
  "environment_id" : "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
  "name" : "test_environment",
  "description" : "Test environment",
  "created" : "2016-06-16T10:56:54.957Z",
  "updated" : "2017-05-16T13:56:54.957Z",
  "status" : "active",
  "read_only" : false,
  "index_capacity" : {
    "documents" : {
      "indexed" : 0,
      "maximum_allowed" : 1000000
    },
    "disk_usage" : {
      "used_bytes" : 0,
      "maximum_allowed_bytes" : 85899345920
    },
    "collections" : {
      "available" : 1,
      "maximum_allowed" : 4
    }
  }
}
        

Response Codes

Status Description
201

Environment successfully added.

400

Bad request.

List environments

List existing environments for the service instance.

Request

GET /v1/environments
Parameter Description
name query string

Show only the environment with the given name.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments?version=2018-03-05"
        

Response

ListEnvironmentsResponse
Name Description
environments Environment[]

An array of [environments] that are available for the service instance.

Environment

Details about an environment.

Name Description
environment_id string

Unique identifier for the environment.

name string

Name that identifies the environment.

description string

Description of the environment.

created DateTime

Creation date of the environment, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

Date of most recent environment update, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

Status of the environment.

Possible values:
  • active
  • pending
  • maintenance
read_only boolean

If true, the environment contains read-only collections that are maintained by IBM.

size string

Size of the environment.

Possible values:
  • XS
  • S
  • MS
  • M
  • ML
  • L
  • XL
  • XXL
  • XXXL
index_capacity IndexCapacity

Details about the resource usage and capacity of the environment.

IndexCapacity

Details about the resource usage and capacity of the environment.

Name Description
documents EnvironmentDocuments

Summary of the document usage statistics for the environment.

disk_usage DiskUsage

Summary of the disk usage of the environment.

collections CollectionUsage

Summary of the collection usage in the environment.

memory_usage MemoryUsage

Deprecated: Summary of the memory usage of the environment.

EnvironmentDocuments

Summary of the document usage statistics for the environment.

Name Description
indexed integer

Number of documents indexed for the environment.

maximum_allowed integer

Total number of documents allowed in the environment's capacity.

DiskUsage

Summary of the disk usage statistics for the environment.

Name Description
used_bytes integer

Number of bytes within the environment's disk capacity that are currently used to store data.

maximum_allowed_bytes integer

Total number of bytes available in the environment's disk capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's disk capacity.

used string

Deprecated: Amount of disk capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's disk capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's disk capacity that is being used.

CollectionUsage

Summary of the collection usage in the environment.

Name Description
available integer

Number of active collections in the environment.

maximum_allowed integer

Total number of collections allowed in the environment.

MemoryUsage

Deprecated: Summary of the memory usage statistics for this environment.

Name Description
used_bytes integer

Deprecated: Number of bytes used in the environment's memory capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's memory capacity.

used string

Deprecated: Amount of memory capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's memory capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's memory capacity that is being used.

Example response


{
  "environments" : [ {
    "environment_id" : "ecbda78e-fb06-40b1-a43f-a039fac0adc6",
    "name" : "byod_environment",
    "description" : "Private Data Environment",
    "created" : "2017-07-14T12:54:40.985Z",
    "updated" : "2017-07-14T12:54:40.985Z",
    "read_only" : false
  }, {
    "environment_id" : "system",
    "name" : "Watson System Environment",
    "description" : "Watson System environment",
    "created" : "2017-07-13T01:14:20.761Z",
    "updated" : "2017-07-13T01:14:20.761Z",
    "read_only" : true
  } ]
}
        

Response Codes

Status Description
200

Successful response.

400

Bad request.

Get environment info

Request

GET /v1/environments/{environment_id}
Parameter Description
environment_id path string

The ID of the environment.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}?version=2018-03-05"
        

Response

Environment

Details about an environment.

Name Description
environment_id string

Unique identifier for the environment.

name string

Name that identifies the environment.

description string

Description of the environment.

created DateTime

Creation date of the environment, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

Date of most recent environment update, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

Status of the environment.

Possible values:
  • active
  • pending
  • maintenance
read_only boolean

If true, the environment contains read-only collections that are maintained by IBM.

size string

Size of the environment.

Possible values:
  • XS
  • S
  • MS
  • M
  • ML
  • L
  • XL
  • XXL
  • XXXL
index_capacity IndexCapacity

Details about the resource usage and capacity of the environment.

IndexCapacity

Details about the resource usage and capacity of the environment.

Name Description
documents EnvironmentDocuments

Summary of the document usage statistics for the environment.

disk_usage DiskUsage

Summary of the disk usage of the environment.

collections CollectionUsage

Summary of the collection usage in the environment.

memory_usage MemoryUsage

Deprecated: Summary of the memory usage of the environment.

EnvironmentDocuments

Summary of the document usage statistics for the environment.

Name Description
indexed integer

Number of documents indexed for the environment.

maximum_allowed integer

Total number of documents allowed in the environment's capacity.

DiskUsage

Summary of the disk usage statistics for the environment.

Name Description
used_bytes integer

Number of bytes within the environment's disk capacity that are currently used to store data.

maximum_allowed_bytes integer

Total number of bytes available in the environment's disk capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's disk capacity.

used string

Deprecated: Amount of disk capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's disk capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's disk capacity that is being used.

CollectionUsage

Summary of the collection usage in the environment.

Name Description
available integer

Number of active collections in the environment.

maximum_allowed integer

Total number of collections allowed in the environment.

MemoryUsage

Deprecated: Summary of the memory usage statistics for this environment.

Name Description
used_bytes integer

Deprecated: Number of bytes used in the environment's memory capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's memory capacity.

used string

Deprecated: Amount of memory capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's memory capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's memory capacity that is being used.

Example response


{
  "environment_id" : "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
  "name" : "test_environment",
  "description" : "Test environment",
  "created" : "2016-06-16T10:56:54.957Z",
  "updated" : "2017-05-16T13:56:54.957Z",
  "status" : "active",
  "read_only" : false,
  "index_capacity" : {
    "documents" : {
      "indexed" : 0,
      "maximum_allowed" : 1000000
    },
    "disk_usage" : {
      "used_bytes" : 0,
      "maximum_allowed_bytes" : 85899345920
    },
    "collections" : {
      "available" : 1,
      "maximum_allowed" : 4
    }
  }
}
        

Response Codes

Status Description
200

Environment fetched.

400

Bad request.

Update an environment

Updates an environment. The environment's name and description parameters can be changed. You must specify a name for the environment.

Request

PUT /v1/environments/{environment_id}
Parameter Description
environment_id path string

The ID of the environment.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body UpdateEnvironmentRequest

An object that defines the environment's name and, optionally, description.

UpdateEnvironmentRequest
Name Description
name string

Name that identifies the environment.

description string

Description of the environment.

Example request

curl -X PUT -u "{username}":"{password}" -H "Content-Type: application/json" -d '{
   "name": "Updated name",
   "description": "Updated description"
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}?version=2018-03-05"
        
Example body

{
   "name": "Updated name",
   "description": "Updated description"
}
        

Response

Environment

Details about an environment.

Name Description
environment_id string

Unique identifier for the environment.

name string

Name that identifies the environment.

description string

Description of the environment.

created DateTime

Creation date of the environment, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

Date of most recent environment update, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

Status of the environment.

Possible values:
  • active
  • pending
  • maintenance
read_only boolean

If true, the environment contains read-only collections that are maintained by IBM.

size string

Size of the environment.

Possible values:
  • XS
  • S
  • MS
  • M
  • ML
  • L
  • XL
  • XXL
  • XXXL
index_capacity IndexCapacity

Details about the resource usage and capacity of the environment.

IndexCapacity

Details about the resource usage and capacity of the environment.

Name Description
documents EnvironmentDocuments

Summary of the document usage statistics for the environment.

disk_usage DiskUsage

Summary of the disk usage of the environment.

collections CollectionUsage

Summary of the collection usage in the environment.

memory_usage MemoryUsage

Deprecated: Summary of the memory usage of the environment.

EnvironmentDocuments

Summary of the document usage statistics for the environment.

Name Description
indexed integer

Number of documents indexed for the environment.

maximum_allowed integer

Total number of documents allowed in the environment's capacity.

DiskUsage

Summary of the disk usage statistics for the environment.

Name Description
used_bytes integer

Number of bytes within the environment's disk capacity that are currently used to store data.

maximum_allowed_bytes integer

Total number of bytes available in the environment's disk capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's disk capacity.

used string

Deprecated: Amount of disk capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's disk capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's disk capacity that is being used.

CollectionUsage

Summary of the collection usage in the environment.

Name Description
available integer

Number of active collections in the environment.

maximum_allowed integer

Total number of collections allowed in the environment.

MemoryUsage

Deprecated: Summary of the memory usage statistics for this environment.

Name Description
used_bytes integer

Deprecated: Number of bytes used in the environment's memory capacity.

total_bytes integer

Deprecated: Total number of bytes available in the environment's memory capacity.

used string

Deprecated: Amount of memory capacity used, in KB or GB format.

total string

Deprecated: Total amount of the environment's memory capacity, in KB or GB format.

percent_used double

Deprecated: Percentage of the environment's memory capacity that is being used.

Example response


{
  "environment_id" : "f822208e-e4c2-45f8-a0d6-c2be950fbcc8",
  "name" : "test_environment",
  "description" : "Test environment",
  "created" : "2016-06-16T10:56:54.957Z",
  "updated" : "2017-05-16T13:56:54.957Z",
  "status" : "active",
  "read_only" : false,
  "index_capacity" : {
    "documents" : {
      "indexed" : 0,
      "maximum_allowed" : 1000000
    },
    "disk_usage" : {
      "used_bytes" : 0,
      "maximum_allowed_bytes" : 85899345920
    },
    "collections" : {
      "available" : 1,
      "maximum_allowed" : 4
    }
  }
}
        

Response Codes

Status Description
200

Environment successfully updated.

400

Bad request.

403

Forbidden. Returned if you attempt to update a read-only environment.

Delete environment

Request

DELETE /v1/environments/{environment_id}
Parameter Description
environment_id path string

The ID of the environment.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" -X DELETE "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}?version=2018-03-05"
        

Response

DeleteEnvironmentResponse
Name Description
environment_id string

The unique identifier for the environment.

status string

Status of the environment.

Possible values:
  • deleted

Response Codes

Status Description
200

Environment successfully deleted.

400

Bad request. Example error messages:

  • Invalid environment id. Please check if the format is correct.
403

Forbidden. Returned if you attempt to delete a read-only environment.

404

Returned any time the environment is not found (even immediately after the environment was successfully deleted).

Example error message:

An environment with ID '2cd8bc72-d737-46e3-b26b-05a585111111' was not found.

List fields across collections

Gets a list of the unique fields (and their types) stored in the indexes of the specified collections.

Request

GET /v1/environments/{environment_id}/fields
Parameter Description
environment_id path string

The ID of the environment.

collection_ids query string[]

A comma-separated list of collection IDs to be queried against.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/fields?collection_ids={id1},{id2}&version=2018-03-05"
        

Response

ListCollectionFieldsResponse

The list of fetched fields.

The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.

  • Fields which contain nested JSON objects are assigned a type of "nested".

  • Fields which belong to a nested object are prefixed with .properties (for example, warnings.properties.severity means that the warnings object has a property called severity).

  • Fields returned from the News collection are prefixed with v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).

Name Description
fields Field[]

An array containing information about each field in the collections.

Field
Name Description
field string

The name of the field.

type string

The type of the field.

Possible values:
  • nested
  • string
  • date
  • long
  • integer
  • short
  • byte
  • double
  • float
  • boolean
  • binary

Example response


{
  "fields" : [ {
    "field" : "warnings",
    "type" : "nested"
  }, {
    "field" : "warnings.properties.description",
    "type" : "string"
  }, {
    "field" : "warnings.properties.phase",
    "type" : "string"
  }, {
    "field" : "warnings.properties.warning_id",
    "type" : "string"
  } ]
}
        

Response Codes

Status Description
200

The list of fetched fields.

The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations:

  • Fields which contain nested JSON objects are assigned a type of "nested".

  • Fields which belong to a nested object are prefixed with .properties (for example, warnings.properties.severity means that the warnings object has a property called severity).

  • Fields returned from the News collection are prefixed with v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).

400

Bad request.

Configurations

Manage custom configurations for your environment.

See Configuring your service and Configuration reference in the main documentation set for information about default and custom configurations.

Add configuration

Creates a new configuration.

If the input configuration contains the configuration_id, created, or updated properties, then they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when copying a configuration.

The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.

Request

POST /v1/environments/{environment_id}/configurations
Parameter Description
environment_id path string

The ID of the environment.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

configuration body Configuration

Input an object that enables you to customize how your content is ingested and what enrichments are added to your data.

name is required and must be unique within the current environment. All other properties are optional.

If the input configuration contains the configuration_id, created, or updated properties, then they will be ignored and overridden by the system (an error is not returned so that the overridden fields do not need to be removed when copying a configuration).

The configuration can contain unrecognized JSON fields. Any such fields will be ignored and will not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.

Configuration

A custom configuration for the environment.

Name Description
configuration_id string

The unique identifier of the configuration.

name string

The name of the configuration.

created DateTime

The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

description string

The description of the configuration, if available.

conversions Conversions

The document conversion settings for the configuration.

enrichments Enrichment[]

An array of document enrichment settings for the configuration.

normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

source Source

Object containing source parameters for the configuration.

Conversions

Document conversion settings.

Name Description
pdf PdfSettings

A list of PDF conversion settings.

word WordSettings

A list of Word conversion settings.

html HtmlSettings

A list of HTML conversion settings.

segment SegmentSettings

A list of Document Segmentation settings.

json_normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

PdfSettings

A list of PDF conversion settings.

Name Description
heading PdfHeadingDetection
PdfHeadingDetection
Name Description
fonts FontSetting[]
FontSetting
Name Description
level integer
min_size integer
max_size integer
bold boolean
italic boolean
name string
WordSettings

A list of Word conversion settings.

Name Description
heading WordHeadingDetection
WordHeadingDetection
Name Description
fonts FontSetting[]
styles WordStyle[]
WordStyle
Name Description
level integer
names string[]
HtmlSettings

A list of HTML conversion settings.

Name Description
exclude_tags_completely string[]
exclude_tags_keep_content string[]
keep_content XPathPatterns
exclude_content XPathPatterns
keep_tag_attributes string[]
exclude_tag_attributes string[]
XPathPatterns
Name Description
xpaths string[]
SegmentSettings

A list of Document Segmentation settings.

Name Description
enabled boolean

Enables/disables the Document Segmentation feature.

false

selector_tags string[]

Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.

NormalizationOperation
Name Description
operation string

Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.

move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).

merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.

remove - Deletes the source_field field. The destination_field is ignored for this operation.

remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).

Allowable values:
  • copy
  • move
  • merge
  • remove
  • remove_nulls
source_field string

The source field for the operation.

destination_field string

The destination field for the operation.

Enrichment
Name Description
description string

Describes what the enrichment step does.

destination_field string

Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.

source_field string

Field to be enriched.

overwrite boolean

Indicates that the enrichments will overwrite the destination_field field if it already exists.

false

enrichment string

Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain Natural Language Understanding options.

When using elements the options object must contain Element Classification options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in the documentation

Previous API versions also supported alchemy_language.

ignore_downstream_errors boolean

If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.

false

options EnrichmentOptions

A list of options specific to the enrichment.

EnrichmentOptions

Options which are specific to a particular enrichment.

Name Description
features NluEnrichmentFeatures

An object representing the enrichment features that will be applied to the specified field.

language string

ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended.

Allowable values:
  • ar
  • en
  • fr
  • de
  • it
  • pt
  • ru
  • es
  • sv
model string

For use with elements enrichments only. The element extraction model to use. Models available are: contract.

NluEnrichmentFeatures
Name Description
keywords NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

entities NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

sentiment NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

emotion NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

categories NluEnrichmentCategories

An object specifying the categories enrichment and related parameters.

semantic_roles NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

relations NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of keywords will be performed on the specified field.

emotion boolean

When true, emotion detection of keywords will be performed on the specified field.

limit integer

The maximum number of keywords to extract for each instance of the specified field.

NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of entities will be performed on the specified field.

emotion boolean

When true, emotion detection of entities will be performed on the specified field.

limit integer

The maximum number of entities to extract for each instance of the specified field.

mentions boolean

When true, the number of mentions of each identified entity is recorded. The default is false.

mention_types boolean

When true, the types of mentions for each idetifieid entity is recorded. The default is false.

sentence_location boolean

When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false.

model string

The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

Name Description
document boolean

When true, sentiment analysis is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated sentiment analyzed.

NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

Name Description
document boolean

When true, emotion detection is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated emotions detected.

NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

Name Description
entities boolean

When true, entities are extracted from the identified sentence parts.

keywords boolean

When true, keywords are extracted from the identified sentence parts.

limit integer

The maximum number of semantic roles enrichments to extact from each instance of the specified field.

NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

Name Description
model string

For use with natural_language_understanding enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default isen-news.

Source

Object containing source parameters for the configuration.

Name Description
type string

The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.
  • salesforce indicates the configuration is to connect to Salesforce.
  • sharepoint indicates the configuration is to connect to Microsoft SharePoint Online.
Allowable values:
  • box
  • salesforce
  • sharepoint
credential_id string

The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.

schedule SourceSchedule

Object containing the schedule information for the source.

options SourceOptions

The options object defines which items to crawl from the source system.

SourceSchedule

Object containing the schedule information for the source.

Name Description
enabled boolean

When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually.

true

time_zone string

The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.

America/New_York

frequency string

The crawl schedule in the specified time_zone.

  • daily: Runs every day between 00:00 and 06:00.
  • weekly: Runs every week on Sunday between 00:00 and 06:00.
  • monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values:
  • daily
  • weekly
  • monthly
SourceOptions

The options object defines which items to crawl from the source system.

Name Description
folders SourceOptionsFolder[]

Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box.

objects SourceOptionsObject[]

Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce.

site_collections SourceOptionsSiteColl[]

Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.

SourceOptionsFolder

Object that defines a box folder to crawl with this configuration.

Name Description
owner_user_id string

The Box user ID of the user who owns the folder to crawl.

folder_id string

The Box folder ID of the folder to crawl.

limit integer

The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.

SourceOptionsObject

Object that defines a Salesforce document object type crawl with this configuration.

Name Description
name string

The name of the Salesforce document object to crawl. For example, case.

limit integer

The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

SourceOptionsSiteColl

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Name Description
site_collection_path string

The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.

limit integer

The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

Example request

curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d @config.json "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations?version=2018-03-05"
        

Response

Configuration

A custom configuration for the environment.

Name Description
configuration_id string

The unique identifier of the configuration.

name string

The name of the configuration.

created DateTime

The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

description string

The description of the configuration, if available.

conversions Conversions

The document conversion settings for the configuration.

enrichments Enrichment[]

An array of document enrichment settings for the configuration.

normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

source Source

Object containing source parameters for the configuration.

Conversions

Document conversion settings.

Name Description
pdf PdfSettings

A list of PDF conversion settings.

word WordSettings

A list of Word conversion settings.

html HtmlSettings

A list of HTML conversion settings.

segment SegmentSettings

A list of Document Segmentation settings.

json_normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

PdfSettings

A list of PDF conversion settings.

Name Description
heading PdfHeadingDetection
PdfHeadingDetection
Name Description
fonts FontSetting[]
FontSetting
Name Description
level integer
min_size integer
max_size integer
bold boolean
italic boolean
name string
WordSettings

A list of Word conversion settings.

Name Description
heading WordHeadingDetection
WordHeadingDetection
Name Description
fonts FontSetting[]
styles WordStyle[]
WordStyle
Name Description
level integer
names string[]
HtmlSettings

A list of HTML conversion settings.

Name Description
exclude_tags_completely string[]
exclude_tags_keep_content string[]
keep_content XPathPatterns
exclude_content XPathPatterns
keep_tag_attributes string[]
exclude_tag_attributes string[]
XPathPatterns
Name Description
xpaths string[]
SegmentSettings

A list of Document Segmentation settings.

Name Description
enabled boolean

Enables/disables the Document Segmentation feature.

selector_tags string[]

Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.

NormalizationOperation
Name Description
operation string

Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.

move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).

merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.

remove - Deletes the source_field field. The destination_field is ignored for this operation.

remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).

Possible values:
  • copy
  • move
  • merge
  • remove
  • remove_nulls
source_field string

The source field for the operation.

destination_field string

The destination field for the operation.

Enrichment
Name Description
description string

Describes what the enrichment step does.

destination_field string

Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.

source_field string

Field to be enriched.

overwrite boolean

Indicates that the enrichments will overwrite the destination_field field if it already exists.

enrichment string

Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain Natural Language Understanding options.

When using elements the options object must contain Element Classification options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in the documentation

Previous API versions also supported alchemy_language.

ignore_downstream_errors boolean

If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.

options EnrichmentOptions

A list of options specific to the enrichment.

EnrichmentOptions

Options which are specific to a particular enrichment.

Name Description
features NluEnrichmentFeatures

An object representing the enrichment features that will be applied to the specified field.

language string

ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended.

Possible values:
  • ar
  • en
  • fr
  • de
  • it
  • pt
  • ru
  • es
  • sv
model string

For use with elements enrichments only. The element extraction model to use. Models available are: contract.

NluEnrichmentFeatures
Name Description
keywords NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

entities NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

sentiment NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

emotion NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

categories NluEnrichmentCategories

An object specifying the categories enrichment and related parameters.

semantic_roles NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

relations NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of keywords will be performed on the specified field.

emotion boolean

When true, emotion detection of keywords will be performed on the specified field.

limit integer

The maximum number of keywords to extract for each instance of the specified field.

NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of entities will be performed on the specified field.

emotion boolean

When true, emotion detection of entities will be performed on the specified field.

limit integer

The maximum number of entities to extract for each instance of the specified field.

mentions boolean

When true, the number of mentions of each identified entity is recorded. The default is false.

mention_types boolean

When true, the types of mentions for each idetifieid entity is recorded. The default is false.

sentence_location boolean

When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false.

model string

The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

Name Description
document boolean

When true, sentiment analysis is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated sentiment analyzed.

NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

Name Description
document boolean

When true, emotion detection is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated emotions detected.

NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

Name Description
entities boolean

When true, entities are extracted from the identified sentence parts.

keywords boolean

When true, keywords are extracted from the identified sentence parts.

limit integer

The maximum number of semantic roles enrichments to extact from each instance of the specified field.

NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

Name Description
model string

For use with natural_language_understanding enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default isen-news.

Source

Object containing source parameters for the configuration.

Name Description
type string

The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.
  • salesforce indicates the configuration is to connect to Salesforce.
  • sharepoint indicates the configuration is to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_id string

The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.

schedule SourceSchedule

Object containing the schedule information for the source.

options SourceOptions

The options object defines which items to crawl from the source system.

SourceSchedule

Object containing the schedule information for the source.

Name Description
enabled boolean

When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually.

time_zone string

The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.

frequency string

The crawl schedule in the specified time_zone.

  • daily: Runs every day between 00:00 and 06:00.
  • weekly: Runs every week on Sunday between 00:00 and 06:00.
  • monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values:
  • daily
  • weekly
  • monthly
SourceOptions

The options object defines which items to crawl from the source system.

Name Description
folders SourceOptionsFolder[]

Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box.

objects SourceOptionsObject[]

Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce.

site_collections SourceOptionsSiteColl[]

Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.

SourceOptionsFolder

Object that defines a box folder to crawl with this configuration.

Name Description
owner_user_id string

The Box user ID of the user who owns the folder to crawl.

folder_id string

The Box folder ID of the folder to crawl.

limit integer

The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.

SourceOptionsObject

Object that defines a Salesforce document object type crawl with this configuration.

Name Description
name string

The name of the Salesforce document object to crawl. For example, case.

limit integer

The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

SourceOptionsSiteColl

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Name Description
site_collection_path string

The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.

limit integer

The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

Example response


{
  "configuration_id" : "448e3545-51ca-4530-a03b-6ff282ceac2e",
  "name" : "IBM News",
  "created" : "2015-08-24T18:42:25.324Z",
  "updated" : "2015-08-24T18:42:25.324Z",
  "description" : "A configuration useful for ingesting IBM press releases.",
  "conversions" : {
    "html" : {
      "exclude_tags_keep_content" : [ "span" ],
      "exclude_content" : {
        "xpaths" : [ "/home" ]
      }
    },
    "segment" : {
      "enabled" : true,
      "selector_tags" : [ "h1", "h2" ]
    },
    "json_normalizations" : [ {
      "operation" : "move",
      "source_field" : "extracted_metadata.title",
      "destination_field" : "metadata.title"
    }, {
      "operation" : "move",
      "source_field" : "extracted_metadata.author",
      "destination_field" : "metadata.author"
    }, {
      "operation" : "remove",
      "source_field" : "extracted_metadata"
    } ]
  },
  "enrichments" : [ {
    "enrichment" : "natural_language_understanding",
    "source_field" : "title",
    "destination_field" : "enriched_title",
    "options" : {
      "features" : {
        "keywords" : {
          "sentiment" : true,
          "emotion" : false,
          "limit" : 50
        },
        "entities" : {
          "sentiment" : true,
          "emotion" : false,
          "limit" : 50,
          "mentions" : true,
          "mention_types" : true,
          "sentence_locations" : true,
          "model" : "WKS-model-id"
        },
        "sentiment" : {
          "document" : true,
          "targets" : [ "IBM", "Watson" ]
        },
        "emotion" : {
          "document" : true,
          "targets" : [ "IBM", "Watson" ]
        },
        "categories" : { },
        "concepts" : {
          "limit" : 8
        },
        "semantic_roles" : {
          "entities" : true,
          "keywords" : true,
          "limit" : 50
        },
        "relations" : {
          "model" : "WKS-model-id"
        }
      }
    }
  }, {
    "enrichment" : "elements",
    "source_field" : "html",
    "destination_field" : "enriched_html",
    "options" : {
      "model" : "contract"
    }
  } ],
  "normalizations" : [ {
    "operation" : "move",
    "source_field" : "metadata.title",
    "destination_field" : "title"
  }, {
    "operation" : "move",
    "source_field" : "metadata.author",
    "destination_field" : "author"
  }, {
    "operation" : "move",
    "source_field" : "alchemy_enriched_text.language",
    "destination_field" : "language"
  }, {
    "operation" : "remove",
    "source_field" : "html"
  }, {
    "operation" : "remove",
    "source_field" : "alchemy_enriched_text.status"
  }, {
    "operation" : "remove",
    "source_field" : "alchemy_enriched_text.text"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.language"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.model"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.status"
  }, {
    "operation" : "remove_nulls"
  } ],
  "source" : {
    "type" : "salesforce",
    "credential_id" : "00ad0000-0000-11e8-ba89-0ed5f00f718b",
    "schedule" : {
      "enabled" : true,
      "time_zone" : "America/New_York",
      "frequency" : "weekly"
    },
    "options" : {
      "site_collections" : [ {
        "site_collection_path" : "/sites/TestSiteA",
        "limit" : 10
      } ]
    }
  }
}
        

Response Codes

Status Description
201

Configuration successfully created.

400

Bad request.

403

Forbidden. Returned if you attempt to add a configuration to a read-only environment.

List configurations

Lists existing configurations for the service instance.

Request

GET /v1/environments/{environment_id}/configurations
Parameter Description
environment_id path string

The ID of the environment.

name query string

Find configurations with the given name.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations?version=2018-03-05"
        

Response

ListConfigurationsResponse
Name Description
configurations Configuration[]

An array of Configurations that are available for the service instance.

Configuration

A custom configuration for the environment.

Name Description
configuration_id string

The unique identifier of the configuration.

name string

The name of the configuration.

created DateTime

The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

description string

The description of the configuration, if available.

conversions Conversions

The document conversion settings for the configuration.

enrichments Enrichment[]

An array of document enrichment settings for the configuration.

normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

source Source

Object containing source parameters for the configuration.

Conversions

Document conversion settings.

Name Description
pdf PdfSettings

A list of PDF conversion settings.

word WordSettings

A list of Word conversion settings.

html HtmlSettings

A list of HTML conversion settings.

segment SegmentSettings

A list of Document Segmentation settings.

json_normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

PdfSettings

A list of PDF conversion settings.

Name Description
heading PdfHeadingDetection
PdfHeadingDetection
Name Description
fonts FontSetting[]
FontSetting
Name Description
level integer
min_size integer
max_size integer
bold boolean
italic boolean
name string
WordSettings

A list of Word conversion settings.

Name Description
heading WordHeadingDetection
WordHeadingDetection
Name Description
fonts FontSetting[]
styles WordStyle[]
WordStyle
Name Description
level integer
names string[]
HtmlSettings

A list of HTML conversion settings.

Name Description
exclude_tags_completely string[]
exclude_tags_keep_content string[]
keep_content XPathPatterns
exclude_content XPathPatterns
keep_tag_attributes string[]
exclude_tag_attributes string[]
XPathPatterns
Name Description
xpaths string[]
SegmentSettings

A list of Document Segmentation settings.

Name Description
enabled boolean

Enables/disables the Document Segmentation feature.

selector_tags string[]

Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.

NormalizationOperation
Name Description
operation string

Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.

move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).

merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.

remove - Deletes the source_field field. The destination_field is ignored for this operation.

remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).

Possible values:
  • copy
  • move
  • merge
  • remove
  • remove_nulls
source_field string

The source field for the operation.

destination_field string

The destination field for the operation.

Enrichment
Name Description
description string

Describes what the enrichment step does.

destination_field string

Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.

source_field string

Field to be enriched.

overwrite boolean

Indicates that the enrichments will overwrite the destination_field field if it already exists.

enrichment string

Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain Natural Language Understanding options.

When using elements the options object must contain Element Classification options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in the documentation

Previous API versions also supported alchemy_language.

ignore_downstream_errors boolean

If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.

options EnrichmentOptions

A list of options specific to the enrichment.

EnrichmentOptions

Options which are specific to a particular enrichment.

Name Description
features NluEnrichmentFeatures

An object representing the enrichment features that will be applied to the specified field.

language string

ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended.

Possible values:
  • ar
  • en
  • fr
  • de
  • it
  • pt
  • ru
  • es
  • sv
model string

For use with elements enrichments only. The element extraction model to use. Models available are: contract.

NluEnrichmentFeatures
Name Description
keywords NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

entities NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

sentiment NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

emotion NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

categories NluEnrichmentCategories

An object specifying the categories enrichment and related parameters.

semantic_roles NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

relations NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of keywords will be performed on the specified field.

emotion boolean

When true, emotion detection of keywords will be performed on the specified field.

limit integer

The maximum number of keywords to extract for each instance of the specified field.

NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of entities will be performed on the specified field.

emotion boolean

When true, emotion detection of entities will be performed on the specified field.

limit integer

The maximum number of entities to extract for each instance of the specified field.

mentions boolean

When true, the number of mentions of each identified entity is recorded. The default is false.

mention_types boolean

When true, the types of mentions for each idetifieid entity is recorded. The default is false.

sentence_location boolean

When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false.

model string

The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

Name Description
document boolean

When true, sentiment analysis is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated sentiment analyzed.

NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

Name Description
document boolean

When true, emotion detection is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated emotions detected.

NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

Name Description
entities boolean

When true, entities are extracted from the identified sentence parts.

keywords boolean

When true, keywords are extracted from the identified sentence parts.

limit integer

The maximum number of semantic roles enrichments to extact from each instance of the specified field.

NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

Name Description
model string

For use with natural_language_understanding enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default isen-news.

Source

Object containing source parameters for the configuration.

Name Description
type string

The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.
  • salesforce indicates the configuration is to connect to Salesforce.
  • sharepoint indicates the configuration is to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_id string

The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.

schedule SourceSchedule

Object containing the schedule information for the source.

options SourceOptions

The options object defines which items to crawl from the source system.

SourceSchedule

Object containing the schedule information for the source.

Name Description
enabled boolean

When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually.

time_zone string

The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.

frequency string

The crawl schedule in the specified time_zone.

  • daily: Runs every day between 00:00 and 06:00.
  • weekly: Runs every week on Sunday between 00:00 and 06:00.
  • monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values:
  • daily
  • weekly
  • monthly
SourceOptions

The options object defines which items to crawl from the source system.

Name Description
folders SourceOptionsFolder[]

Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box.

objects SourceOptionsObject[]

Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce.

site_collections SourceOptionsSiteColl[]

Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.

SourceOptionsFolder

Object that defines a box folder to crawl with this configuration.

Name Description
owner_user_id string

The Box user ID of the user who owns the folder to crawl.

folder_id string

The Box folder ID of the folder to crawl.

limit integer

The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.

SourceOptionsObject

Object that defines a Salesforce document object type crawl with this configuration.

Name Description
name string

The name of the Salesforce document object to crawl. For example, case.

limit integer

The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

SourceOptionsSiteColl

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Name Description
site_collection_path string

The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.

limit integer

The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

Response Codes

Status Description
200

Successful response.

400

Bad request.

Get configuration details

Request

GET /v1/environments/{environment_id}/configurations/{configuration_id}
Parameter Description
environment_id path string

The ID of the environment.

configuration_id path string

The ID of the configuration.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations/{configuration_id}?version=2018-03-05"
        

Response

Configuration

A custom configuration for the environment.

Name Description
configuration_id string

The unique identifier of the configuration.

name string

The name of the configuration.

created DateTime

The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

description string

The description of the configuration, if available.

conversions Conversions

The document conversion settings for the configuration.

enrichments Enrichment[]

An array of document enrichment settings for the configuration.

normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

source Source

Object containing source parameters for the configuration.

Conversions

Document conversion settings.

Name Description
pdf PdfSettings

A list of PDF conversion settings.

word WordSettings

A list of Word conversion settings.

html HtmlSettings

A list of HTML conversion settings.

segment SegmentSettings

A list of Document Segmentation settings.

json_normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

PdfSettings

A list of PDF conversion settings.

Name Description
heading PdfHeadingDetection
PdfHeadingDetection
Name Description
fonts FontSetting[]
FontSetting
Name Description
level integer
min_size integer
max_size integer
bold boolean
italic boolean
name string
WordSettings

A list of Word conversion settings.

Name Description
heading WordHeadingDetection
WordHeadingDetection
Name Description
fonts FontSetting[]
styles WordStyle[]
WordStyle
Name Description
level integer
names string[]
HtmlSettings

A list of HTML conversion settings.

Name Description
exclude_tags_completely string[]
exclude_tags_keep_content string[]
keep_content XPathPatterns
exclude_content XPathPatterns
keep_tag_attributes string[]
exclude_tag_attributes string[]
XPathPatterns
Name Description
xpaths string[]
SegmentSettings

A list of Document Segmentation settings.

Name Description
enabled boolean

Enables/disables the Document Segmentation feature.

selector_tags string[]

Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.

NormalizationOperation
Name Description
operation string

Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.

move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).

merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.

remove - Deletes the source_field field. The destination_field is ignored for this operation.

remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).

Possible values:
  • copy
  • move
  • merge
  • remove
  • remove_nulls
source_field string

The source field for the operation.

destination_field string

The destination field for the operation.

Enrichment
Name Description
description string

Describes what the enrichment step does.

destination_field string

Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.

source_field string

Field to be enriched.

overwrite boolean

Indicates that the enrichments will overwrite the destination_field field if it already exists.

enrichment string

Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain Natural Language Understanding options.

When using elements the options object must contain Element Classification options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in the documentation

Previous API versions also supported alchemy_language.

ignore_downstream_errors boolean

If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.

options EnrichmentOptions

A list of options specific to the enrichment.

EnrichmentOptions

Options which are specific to a particular enrichment.

Name Description
features NluEnrichmentFeatures

An object representing the enrichment features that will be applied to the specified field.

language string

ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended.

Possible values:
  • ar
  • en
  • fr
  • de
  • it
  • pt
  • ru
  • es
  • sv
model string

For use with elements enrichments only. The element extraction model to use. Models available are: contract.

NluEnrichmentFeatures
Name Description
keywords NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

entities NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

sentiment NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

emotion NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

categories NluEnrichmentCategories

An object specifying the categories enrichment and related parameters.

semantic_roles NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

relations NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of keywords will be performed on the specified field.

emotion boolean

When true, emotion detection of keywords will be performed on the specified field.

limit integer

The maximum number of keywords to extract for each instance of the specified field.

NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of entities will be performed on the specified field.

emotion boolean

When true, emotion detection of entities will be performed on the specified field.

limit integer

The maximum number of entities to extract for each instance of the specified field.

mentions boolean

When true, the number of mentions of each identified entity is recorded. The default is false.

mention_types boolean

When true, the types of mentions for each idetifieid entity is recorded. The default is false.

sentence_location boolean

When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false.

model string

The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

Name Description
document boolean

When true, sentiment analysis is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated sentiment analyzed.

NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

Name Description
document boolean

When true, emotion detection is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated emotions detected.

NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

Name Description
entities boolean

When true, entities are extracted from the identified sentence parts.

keywords boolean

When true, keywords are extracted from the identified sentence parts.

limit integer

The maximum number of semantic roles enrichments to extact from each instance of the specified field.

NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

Name Description
model string

For use with natural_language_understanding enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default isen-news.

Source

Object containing source parameters for the configuration.

Name Description
type string

The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.
  • salesforce indicates the configuration is to connect to Salesforce.
  • sharepoint indicates the configuration is to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_id string

The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.

schedule SourceSchedule

Object containing the schedule information for the source.

options SourceOptions

The options object defines which items to crawl from the source system.

SourceSchedule

Object containing the schedule information for the source.

Name Description
enabled boolean

When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually.

time_zone string

The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.

frequency string

The crawl schedule in the specified time_zone.

  • daily: Runs every day between 00:00 and 06:00.
  • weekly: Runs every week on Sunday between 00:00 and 06:00.
  • monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values:
  • daily
  • weekly
  • monthly
SourceOptions

The options object defines which items to crawl from the source system.

Name Description
folders SourceOptionsFolder[]

Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box.

objects SourceOptionsObject[]

Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce.

site_collections SourceOptionsSiteColl[]

Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.

SourceOptionsFolder

Object that defines a box folder to crawl with this configuration.

Name Description
owner_user_id string

The Box user ID of the user who owns the folder to crawl.

folder_id string

The Box folder ID of the folder to crawl.

limit integer

The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.

SourceOptionsObject

Object that defines a Salesforce document object type crawl with this configuration.

Name Description
name string

The name of the Salesforce document object to crawl. For example, case.

limit integer

The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

SourceOptionsSiteColl

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Name Description
site_collection_path string

The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.

limit integer

The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

Example response


{
  "configuration_id" : "448e3545-51ca-4530-a03b-6ff282ceac2e",
  "name" : "IBM News",
  "created" : "2015-08-24T18:42:25.324Z",
  "updated" : "2015-08-24T18:42:25.324Z",
  "description" : "A configuration useful for ingesting IBM press releases.",
  "conversions" : {
    "html" : {
      "exclude_tags_keep_content" : [ "span" ],
      "exclude_content" : {
        "xpaths" : [ "/home" ]
      }
    },
    "segment" : {
      "enabled" : true,
      "selector_tags" : [ "h1", "h2" ]
    },
    "json_normalizations" : [ {
      "operation" : "move",
      "source_field" : "extracted_metadata.title",
      "destination_field" : "metadata.title"
    }, {
      "operation" : "move",
      "source_field" : "extracted_metadata.author",
      "destination_field" : "metadata.author"
    }, {
      "operation" : "remove",
      "source_field" : "extracted_metadata"
    } ]
  },
  "enrichments" : [ {
    "enrichment" : "natural_language_understanding",
    "source_field" : "title",
    "destination_field" : "enriched_title",
    "options" : {
      "features" : {
        "keywords" : {
          "sentiment" : true,
          "emotion" : false,
          "limit" : 50
        },
        "entities" : {
          "sentiment" : true,
          "emotion" : false,
          "limit" : 50,
          "mentions" : true,
          "mention_types" : true,
          "sentence_locations" : true,
          "model" : "WKS-model-id"
        },
        "sentiment" : {
          "document" : true,
          "targets" : [ "IBM", "Watson" ]
        },
        "emotion" : {
          "document" : true,
          "targets" : [ "IBM", "Watson" ]
        },
        "categories" : { },
        "concepts" : {
          "limit" : 8
        },
        "semantic_roles" : {
          "entities" : true,
          "keywords" : true,
          "limit" : 50
        },
        "relations" : {
          "model" : "WKS-model-id"
        }
      }
    }
  }, {
    "enrichment" : "elements",
    "source_field" : "html",
    "destination_field" : "enriched_html",
    "options" : {
      "model" : "contract"
    }
  } ],
  "normalizations" : [ {
    "operation" : "move",
    "source_field" : "metadata.title",
    "destination_field" : "title"
  }, {
    "operation" : "move",
    "source_field" : "metadata.author",
    "destination_field" : "author"
  }, {
    "operation" : "move",
    "source_field" : "alchemy_enriched_text.language",
    "destination_field" : "language"
  }, {
    "operation" : "remove",
    "source_field" : "html"
  }, {
    "operation" : "remove",
    "source_field" : "alchemy_enriched_text.status"
  }, {
    "operation" : "remove",
    "source_field" : "alchemy_enriched_text.text"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.language"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.model"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.status"
  }, {
    "operation" : "remove_nulls"
  } ],
  "source" : {
    "type" : "salesforce",
    "credential_id" : "00ad0000-0000-11e8-ba89-0ed5f00f718b",
    "schedule" : {
      "enabled" : true,
      "time_zone" : "America/New_York",
      "frequency" : "weekly"
    },
    "options" : {
      "site_collections" : [ {
        "site_collection_path" : "/sites/TestSiteA",
        "limit" : 10
      } ]
    }
  }
}
        

Response Codes

Status Description
200

Configuration successfully fetched.

400

Bad request.

Update a configuration

Replaces an existing configuration.

  • Completely replaces the original configuration.
  • The configuration_id, updated, and created fields are accepted in the request, but they are ignored, and an error is not generated. It is also acceptable for users to submit an updated configuration with none of the three properties.
  • Documents are processed with a snapshot of the configuration as it was at the time the document was submitted to be ingested. This means that already submitted documents will not see any updates made to the configuration.

Request

PUT /v1/environments/{environment_id}/configurations/{configuration_id}
Parameter Description
environment_id path string

The ID of the environment.

configuration_id path string

The ID of the configuration.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

configuration body Configuration

Input an object that enables you to update and customize how your data is ingested and what enrichments are added to your data. The name parameter is required and must be unique within the current environment. All other properties are optional, but if they are omitted the default values replace the current value of each omitted property.

If the input configuration contains the configuration_id, created, or updated properties, they are ignored and overridden by the system, and an error is not returned so that the overridden fields do not need to be removed when updating a configuration.

The configuration can contain unrecognized JSON fields. Any such fields are ignored and do not generate an error. This makes it easier to use newer configuration files with older versions of the API and the service. It also makes it possible for the tooling to add additional metadata and information to the configuration.

Configuration

A custom configuration for the environment.

Name Description
configuration_id string

The unique identifier of the configuration.

name string

The name of the configuration.

created DateTime

The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

description string

The description of the configuration, if available.

conversions Conversions

The document conversion settings for the configuration.

enrichments Enrichment[]

An array of document enrichment settings for the configuration.

normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

source Source

Object containing source parameters for the configuration.

Conversions

Document conversion settings.

Name Description
pdf PdfSettings

A list of PDF conversion settings.

word WordSettings

A list of Word conversion settings.

html HtmlSettings

A list of HTML conversion settings.

segment SegmentSettings

A list of Document Segmentation settings.

json_normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

PdfSettings

A list of PDF conversion settings.

Name Description
heading PdfHeadingDetection
PdfHeadingDetection
Name Description
fonts FontSetting[]
FontSetting
Name Description
level integer
min_size integer
max_size integer
bold boolean
italic boolean
name string
WordSettings

A list of Word conversion settings.

Name Description
heading WordHeadingDetection
WordHeadingDetection
Name Description
fonts FontSetting[]
styles WordStyle[]
WordStyle
Name Description
level integer
names string[]
HtmlSettings

A list of HTML conversion settings.

Name Description
exclude_tags_completely string[]
exclude_tags_keep_content string[]
keep_content XPathPatterns
exclude_content XPathPatterns
keep_tag_attributes string[]
exclude_tag_attributes string[]
XPathPatterns
Name Description
xpaths string[]
SegmentSettings

A list of Document Segmentation settings.

Name Description
enabled boolean

Enables/disables the Document Segmentation feature.

false

selector_tags string[]

Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.

NormalizationOperation
Name Description
operation string

Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.

move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).

merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.

remove - Deletes the source_field field. The destination_field is ignored for this operation.

remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).

Allowable values:
  • copy
  • move
  • merge
  • remove
  • remove_nulls
source_field string

The source field for the operation.

destination_field string

The destination field for the operation.

Enrichment
Name Description
description string

Describes what the enrichment step does.

destination_field string

Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.

source_field string

Field to be enriched.

overwrite boolean

Indicates that the enrichments will overwrite the destination_field field if it already exists.

false

enrichment string

Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain Natural Language Understanding options.

When using elements the options object must contain Element Classification options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in the documentation

Previous API versions also supported alchemy_language.

ignore_downstream_errors boolean

If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.

false

options EnrichmentOptions

A list of options specific to the enrichment.

EnrichmentOptions

Options which are specific to a particular enrichment.

Name Description
features NluEnrichmentFeatures

An object representing the enrichment features that will be applied to the specified field.

language string

ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended.

Allowable values:
  • ar
  • en
  • fr
  • de
  • it
  • pt
  • ru
  • es
  • sv
model string

For use with elements enrichments only. The element extraction model to use. Models available are: contract.

NluEnrichmentFeatures
Name Description
keywords NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

entities NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

sentiment NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

emotion NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

categories NluEnrichmentCategories

An object specifying the categories enrichment and related parameters.

semantic_roles NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

relations NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of keywords will be performed on the specified field.

emotion boolean

When true, emotion detection of keywords will be performed on the specified field.

limit integer

The maximum number of keywords to extract for each instance of the specified field.

NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of entities will be performed on the specified field.

emotion boolean

When true, emotion detection of entities will be performed on the specified field.

limit integer

The maximum number of entities to extract for each instance of the specified field.

mentions boolean

When true, the number of mentions of each identified entity is recorded. The default is false.

mention_types boolean

When true, the types of mentions for each idetifieid entity is recorded. The default is false.

sentence_location boolean

When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false.

model string

The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

Name Description
document boolean

When true, sentiment analysis is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated sentiment analyzed.

NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

Name Description
document boolean

When true, emotion detection is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated emotions detected.

NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

Name Description
entities boolean

When true, entities are extracted from the identified sentence parts.

keywords boolean

When true, keywords are extracted from the identified sentence parts.

limit integer

The maximum number of semantic roles enrichments to extact from each instance of the specified field.

NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

Name Description
model string

For use with natural_language_understanding enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default isen-news.

Source

Object containing source parameters for the configuration.

Name Description
type string

The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.
  • salesforce indicates the configuration is to connect to Salesforce.
  • sharepoint indicates the configuration is to connect to Microsoft SharePoint Online.
Allowable values:
  • box
  • salesforce
  • sharepoint
credential_id string

The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.

schedule SourceSchedule

Object containing the schedule information for the source.

options SourceOptions

The options object defines which items to crawl from the source system.

SourceSchedule

Object containing the schedule information for the source.

Name Description
enabled boolean

When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually.

true

time_zone string

The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.

America/New_York

frequency string

The crawl schedule in the specified time_zone.

  • daily: Runs every day between 00:00 and 06:00.
  • weekly: Runs every week on Sunday between 00:00 and 06:00.
  • monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
Allowable values:
  • daily
  • weekly
  • monthly
SourceOptions

The options object defines which items to crawl from the source system.

Name Description
folders SourceOptionsFolder[]

Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box.

objects SourceOptionsObject[]

Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce.

site_collections SourceOptionsSiteColl[]

Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.

SourceOptionsFolder

Object that defines a box folder to crawl with this configuration.

Name Description
owner_user_id string

The Box user ID of the user who owns the folder to crawl.

folder_id string

The Box folder ID of the folder to crawl.

limit integer

The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.

SourceOptionsObject

Object that defines a Salesforce document object type crawl with this configuration.

Name Description
name string

The name of the Salesforce document object to crawl. For example, case.

limit integer

The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

SourceOptionsSiteColl

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Name Description
site_collection_path string

The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.

limit integer

The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

Example request

curl -X PUT -u "{username}":"{password}" -H "Content-Type: application/json" -d @new_config.json "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations/{configuration_id}?version=2018-03-05"
        

Response

Configuration

A custom configuration for the environment.

Name Description
configuration_id string

The unique identifier of the configuration.

name string

The name of the configuration.

created DateTime

The creation date of the configuration in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

The timestamp of when the configuration was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

description string

The description of the configuration, if available.

conversions Conversions

The document conversion settings for the configuration.

enrichments Enrichment[]

An array of document enrichment settings for the configuration.

normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

source Source

Object containing source parameters for the configuration.

Conversions

Document conversion settings.

Name Description
pdf PdfSettings

A list of PDF conversion settings.

word WordSettings

A list of Word conversion settings.

html HtmlSettings

A list of HTML conversion settings.

segment SegmentSettings

A list of Document Segmentation settings.

json_normalizations NormalizationOperation[]

Defines operations that can be used to transform the final output JSON into a normalized form. Operations are executed in the order that they appear in the array.

PdfSettings

A list of PDF conversion settings.

Name Description
heading PdfHeadingDetection
PdfHeadingDetection
Name Description
fonts FontSetting[]
FontSetting
Name Description
level integer
min_size integer
max_size integer
bold boolean
italic boolean
name string
WordSettings

A list of Word conversion settings.

Name Description
heading WordHeadingDetection
WordHeadingDetection
Name Description
fonts FontSetting[]
styles WordStyle[]
WordStyle
Name Description
level integer
names string[]
HtmlSettings

A list of HTML conversion settings.

Name Description
exclude_tags_completely string[]
exclude_tags_keep_content string[]
keep_content XPathPatterns
exclude_content XPathPatterns
keep_tag_attributes string[]
exclude_tag_attributes string[]
XPathPatterns
Name Description
xpaths string[]
SegmentSettings

A list of Document Segmentation settings.

Name Description
enabled boolean

Enables/disables the Document Segmentation feature.

selector_tags string[]

Defines the heading level that splits into document segments. Valid values are h1, h2, h3, h4, h5, h6.

NormalizationOperation
Name Description
operation string

Identifies what type of operation to perform.

copy - Copies the value of the source_field to the destination_field field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field.

move - Renames (moves) the source_field to the destination_field. If the destination_field already exists, then the value of the source_field overwrites the original value of the destination_field. Rename is identical to copy, except that the source_field is removed after the value has been copied to the destination_field (it is the same as a copy followed by a remove).

merge - Merges the value of the source_field with the value of the destination_field. The destination_field is converted into an array if it is not already an array, and the value of the source_field is appended to the array. This operation removes the source_field after the merge. If the source_field does not exist in the current document, then the destination_field is still converted into an array (if it is not an array already). This conversion ensures the type for destination_field is consistent across all documents.

remove - Deletes the source_field field. The destination_field is ignored for this operation.

remove_nulls - Removes all nested null (blank) field values from the JSON tree. source_field and destination_field are ignored by this operation because remove_nulls operates on the entire JSON tree. Typically, remove_nulls is invoked as the last normalization operation (if it is invoked at all, it can be time-expensive).

Possible values:
  • copy
  • move
  • merge
  • remove
  • remove_nulls
source_field string

The source field for the operation.

destination_field string

The destination field for the operation.

Enrichment
Name Description
description string

Describes what the enrichment step does.

destination_field string

Field where enrichments will be stored. This field must already exist or be at most 1 level deeper than an existing field. For example, if text is a top-level field with no sub-fields, text.foo is a valid destination but text.foo.bar is not.

source_field string

Field to be enriched.

overwrite boolean

Indicates that the enrichments will overwrite the destination_field field if it already exists.

enrichment string

Name of the enrichment service to call. Current options are natural_language_understanding and elements.

When using natual_language_understanding, the options object must contain Natural Language Understanding options.

When using elements the options object must contain Element Classification options. Additionally, when using the elements enrichment the configuration specified and files ingested must meet all the criteria specified in the documentation

Previous API versions also supported alchemy_language.

ignore_downstream_errors boolean

If true, then most errors generated during the enrichment process will be treated as warnings and will not cause the document to fail processing.

options EnrichmentOptions

A list of options specific to the enrichment.

EnrichmentOptions

Options which are specific to a particular enrichment.

Name Description
features NluEnrichmentFeatures

An object representing the enrichment features that will be applied to the specified field.

language string

ISO 639-1 code indicating the language to use for the analysis. This code overrides the automatic language detection performed by the service. Valid codes are ar (Arabic), en (English), fr (French), de (German), it (Italian), pt (Portuguese), ru (Russian), es (Spanish), and sv (Swedish). Note: Not all features support all languages, automatic detection is recommended.

Possible values:
  • ar
  • en
  • fr
  • de
  • it
  • pt
  • ru
  • es
  • sv
model string

For use with elements enrichments only. The element extraction model to use. Models available are: contract.

NluEnrichmentFeatures
Name Description
keywords NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

entities NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

sentiment NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

emotion NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

categories NluEnrichmentCategories

An object specifying the categories enrichment and related parameters.

semantic_roles NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

relations NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

NluEnrichmentKeywords

An object specifying the Keyword enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of keywords will be performed on the specified field.

emotion boolean

When true, emotion detection of keywords will be performed on the specified field.

limit integer

The maximum number of keywords to extract for each instance of the specified field.

NluEnrichmentEntities

An object speficying the Entities enrichment and related parameters.

Name Description
sentiment boolean

When true, sentiment analysis of entities will be performed on the specified field.

emotion boolean

When true, emotion detection of entities will be performed on the specified field.

limit integer

The maximum number of entities to extract for each instance of the specified field.

mentions boolean

When true, the number of mentions of each identified entity is recorded. The default is false.

mention_types boolean

When true, the types of mentions for each idetifieid entity is recorded. The default is false.

sentence_location boolean

When true, a list of sentence locations for each instance of each identified entity is recorded. The default is false.

model string

The enrichement model to use with entity extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, or the default public model alchemy.

NluEnrichmentSentiment

An object specifying the sentiment extraction enrichment and related parameters.

Name Description
document boolean

When true, sentiment analysis is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated sentiment analyzed.

NluEnrichmentEmotion

An object specifying the emotion detection enrichment and related parameters.

Name Description
document boolean

When true, emotion detection is performed on the entire field.

targets string[]

A comma-separated list of target strings that will have any associated emotions detected.

NluEnrichmentSemanticRoles

An object specifiying the semantic roles enrichment and related parameters.

Name Description
entities boolean

When true, entities are extracted from the identified sentence parts.

keywords boolean

When true, keywords are extracted from the identified sentence parts.

limit integer

The maximum number of semantic roles enrichments to extact from each instance of the specified field.

NluEnrichmentRelations

An object specifying the relations enrichment and related parameters.

Name Description
model string

For use with natural_language_understanding enrichments only. The enrichement model to use with relationship extraction. May be a custom model provided by Watson Knowledge Studio, the public model for use with Knowledge Graph en-news, the default isen-news.

Source

Object containing source parameters for the configuration.

Name Description
type string

The type of source to connect to.

  • box indicates the configuration is to connect an instance of Enterprise Box.
  • salesforce indicates the configuration is to connect to Salesforce.
  • sharepoint indicates the configuration is to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_id string

The credential_id of the credentials to use to connect to the source. Credentials are defined using the credentials method. The source_type of the credentials used must match the type field specified in this object.

schedule SourceSchedule

Object containing the schedule information for the source.

options SourceOptions

The options object defines which items to crawl from the source system.

SourceSchedule

Object containing the schedule information for the source.

Name Description
enabled boolean

When true, the source is re-crawled based on the frequency field in this object. When false the source is not re-crawled; When false and connecting to Salesforce the source is crawled annually.

time_zone string

The time zone to base source crawl times on. Possible values correspond to the IANA (Internet Assigned Numbers Authority) time zones list.

frequency string

The crawl schedule in the specified time_zone.

  • daily: Runs every day between 00:00 and 06:00.
  • weekly: Runs every week on Sunday between 00:00 and 06:00.
  • monthly: Runs the on the first Sunday of every month between 00:00 and 06:00.
Possible values:
  • daily
  • weekly
  • monthly
SourceOptions

The options object defines which items to crawl from the source system.

Name Description
folders SourceOptionsFolder[]

Array of folders to crawl from the Box source. Only valid, and required, when the type field of the source object is set to box.

objects SourceOptionsObject[]

Array of Salesforce document object types to crawl from the Salesforce source. Only valid, and required, when the type field of the source object is set to salesforce.

site_collections SourceOptionsSiteColl[]

Array of Microsoft SharePointoint Online site collections to crawl from the SharePoint source. Only valid and required when the type field of the source object is set to sharepoint.

SourceOptionsFolder

Object that defines a box folder to crawl with this configuration.

Name Description
owner_user_id string

The Box user ID of the user who owns the folder to crawl.

folder_id string

The Box folder ID of the folder to crawl.

limit integer

The maximum number of documents to crawl for this folder. By default, all documents in the folder are crawled.

SourceOptionsObject

Object that defines a Salesforce document object type crawl with this configuration.

Name Description
name string

The name of the Salesforce document object to crawl. For example, case.

limit integer

The maximum number of documents to crawl for this document object. By default, all documents in the document object are crawled.

SourceOptionsSiteColl

Object that defines a Microsoft SharePoint site collection to crawl with this configuration.

Name Description
site_collection_path string

The Microsoft SharePoint Online site collection path to crawl. The path must be be relative to the organization_url that was specified in the credentials associated with this source configuration.

limit integer

The maximum number of documents to crawl for this site collection. By default, all documents in the site collection are crawled.

Example response


{
  "configuration_id" : "448e3545-51ca-4530-a03b-6ff282ceac2e",
  "name" : "IBM News",
  "created" : "2015-08-24T18:42:25.324Z",
  "updated" : "2015-08-24T18:42:25.324Z",
  "description" : "A configuration useful for ingesting IBM press releases.",
  "conversions" : {
    "html" : {
      "exclude_tags_keep_content" : [ "span" ],
      "exclude_content" : {
        "xpaths" : [ "/home" ]
      }
    },
    "segment" : {
      "enabled" : true,
      "selector_tags" : [ "h1", "h2" ]
    },
    "json_normalizations" : [ {
      "operation" : "move",
      "source_field" : "extracted_metadata.title",
      "destination_field" : "metadata.title"
    }, {
      "operation" : "move",
      "source_field" : "extracted_metadata.author",
      "destination_field" : "metadata.author"
    }, {
      "operation" : "remove",
      "source_field" : "extracted_metadata"
    } ]
  },
  "enrichments" : [ {
    "enrichment" : "natural_language_understanding",
    "source_field" : "title",
    "destination_field" : "enriched_title",
    "options" : {
      "features" : {
        "keywords" : {
          "sentiment" : true,
          "emotion" : false,
          "limit" : 50
        },
        "entities" : {
          "sentiment" : true,
          "emotion" : false,
          "limit" : 50,
          "mentions" : true,
          "mention_types" : true,
          "sentence_locations" : true,
          "model" : "WKS-model-id"
        },
        "sentiment" : {
          "document" : true,
          "targets" : [ "IBM", "Watson" ]
        },
        "emotion" : {
          "document" : true,
          "targets" : [ "IBM", "Watson" ]
        },
        "categories" : { },
        "concepts" : {
          "limit" : 8
        },
        "semantic_roles" : {
          "entities" : true,
          "keywords" : true,
          "limit" : 50
        },
        "relations" : {
          "model" : "WKS-model-id"
        }
      }
    }
  }, {
    "enrichment" : "elements",
    "source_field" : "html",
    "destination_field" : "enriched_html",
    "options" : {
      "model" : "contract"
    }
  } ],
  "normalizations" : [ {
    "operation" : "move",
    "source_field" : "metadata.title",
    "destination_field" : "title"
  }, {
    "operation" : "move",
    "source_field" : "metadata.author",
    "destination_field" : "author"
  }, {
    "operation" : "move",
    "source_field" : "alchemy_enriched_text.language",
    "destination_field" : "language"
  }, {
    "operation" : "remove",
    "source_field" : "html"
  }, {
    "operation" : "remove",
    "source_field" : "alchemy_enriched_text.status"
  }, {
    "operation" : "remove",
    "source_field" : "alchemy_enriched_text.text"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.language"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.model"
  }, {
    "operation" : "remove",
    "source_field" : "sire_enriched_text.status"
  }, {
    "operation" : "remove_nulls"
  } ],
  "source" : {
    "type" : "salesforce",
    "credential_id" : "00ad0000-0000-11e8-ba89-0ed5f00f718b",
    "schedule" : {
      "enabled" : true,
      "time_zone" : "America/New_York",
      "frequency" : "weekly"
    },
    "options" : {
      "site_collections" : [ {
        "site_collection_path" : "/sites/TestSiteA",
        "limit" : 10
      } ]
    }
  }
}
        

Response Codes

Status Description
200

Configuration successfully updated.

400

Bad request.

403

Forbidden. Returned if you attempt to update a read-only configuration or if you attempt to update a configuration in a read-only environment.

Delete a configuration

The deletion is performed unconditionally. A configuration deletion request succeeds even if the configuration is referenced by a collection or document ingestion. However, documents that have already been submitted for processing continue to use the deleted configuration. Documents are always processed with a snapshot of the configuration as it existed at the time the document was submitted.

Request

DELETE /v1/environments/{environment_id}/configurations/{configuration_id}
Parameter Description
environment_id path string

The ID of the environment.

configuration_id path string

The ID of the configuration.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations/{configuration_id}?version=2018-03-05"
        

Response

DeleteConfigurationResponse
Name Description
configuration_id string

The unique identifier for the configuration.

status string

Status of the configuration. A deleted configuration has the status deleted.

Possible values:
  • deleted
notices Notice[]

An array of notice messages, if any.

Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

Example response


{
  "configuration_id" : "123abc",
  "status" : "deleted",
  "notices" : [ {
    "notice_id" : "configuration_in_use",
    "created" : "2016-09-28T12:34:00.000Z",
    "severity" : "warning",
    "description" : "The configuration was deleted, but it is referenced by one or more collections."
  } ]
}
        

Response Codes

Status Description
200

Configuration successfully deleted. The response contains a warning if the configuration was referenced by at least one collection.

400

Bad request.

A bad request is returned any time there is a problem with the request itself.

Example error messages:

  • Invalid Configuration ID - if the configuration ID is not correctly formatted.
  • Invalid configurationId: 2c3a981b-dade-488c-b8c6-01ae8d111111 - if the configuration is not found.
403

Forbidden. Returned if you attempt to delete a read-only configuration, or if you attempt to delete a configuration from a read-only environment.

Test your configuration on a document

Test a configuration by running a sample document through the ingestion process.

Test configuration

Runs a sample document through the default or your configuration and returns diagnostic information designed to help you understand how the document was processed. The document is not added to the index.

Request

POST /v1/environments/{environment_id}/preview
Parameter Description
environment_id path string

The ID of the environment.

configuration form string

The configuration to use to process the document. If this part is provided, then the provided configuration is used to process the document. If the configuration_id is also provided (both are present at the same time), then request is rejected. The maximum supported configuration size is 1 MB. Configuration parts larger than 1 MB are rejected. See the GET /configurations/{configuration_id} operation for an example configuration.

step query string

Specify to only run the input document through the given step instead of running the input document through the entire ingestion workflow. Valid values are convert, enrich, and normalize.

Allowable values:
  • html_input
  • html_output
  • json_output
  • json_normalizations_output
  • enrichments_output
  • normalizations_output
configuration_id query string

The ID of the configuration to use to process the document. If the configuration form part is also provided (both are present at the same time), then the request will be rejected.

file form file

The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected.

metadata form string

If you're using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: { \"Creator\": \"Johnny Appleseed\", \"Subject\": \"Apples\" }.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -X POST -u "{username}":"{password}" -F "file=@<a href="http://github.com/watson-developer-cloud/doc-tutorial-downloads/raw/master/discovery/sample1.html" download="sample1.html" target="_blank">sample1.html</a>;type=text/html" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/preview?version=2018-03-05&configuration_id={configuration_id}"
        
Example metadata

{
  "Creator": "Johnny Appleseed",
  "Subject": "Apples"
}
        

Response

TestDocument
Name Description
configuration_id string

The unique identifier for the configuration.

status string

Status of the preview operation.

enriched_field_units integer

The number of 10-kB chunks of field data that were enriched. This can be used to estimate the cost of running a real ingestion.

original_media_type string

Format of the test document.

snapshots DocumentSnapshot[]

An array of objects that describe each step in the preview process.

notices Notice[]

An array of notice messages about the preview operation.

DocumentSnapshot
Name Description
step string Possible values:
  • html_input
  • html_output
  • json_output
  • json_normalizations_output
  • enrichments_output
  • normalizations_output
snapshot object
Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

Example response


{
  "configuration_id" : "e8b9d793-b163-452a-9373-bce07efb510b",
  "status" : "completed",
  "enriched_field_units" : 5,
  "original_media_type" : "text/html",
  "snapshots" : [ {
    "step" : "html_input",
    "snapshot" : {
      "html" : "<html><head><title>IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations</title><meta name=\"author\" content=\"Jake Bake\"></head><body><p><span>Mr. Rhodin will lead the IBM Watson Group,<span>a new IBM business unit headquartered in the heart of New York City's Silicon Alley</span> that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.</span></p></body></html>"
    }
  }, {
    "step" : "html_output",
    "snapshot" : {
      "html" : "<?xml version='1.0' encoding='UTF-8' standalone='yes'?><html><head><title>IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations</title><meta name=\"author\" content=\"Jake Bake\"/></head><body><p>Mr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.</p></body></html>"
    }
  }, {
    "step" : "json_output",
    "snapshot" : {
      "text" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations\n\nMr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
      "html" : "<?xml version='1.0' encoding='UTF-8' standalone='yes'?><html><head><title>IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations</title><meta name=\"author\" content=\"Jake Bake\"/></head><body><p>Mr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.</p></body></html>",
      "metadata" : {
        "title" : "Press Release 2014-01-09",
        "category" : "news"
      },
      "extracted_metadata" : {
        "title" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations",
        "author" : "Jake Bake"
      }
    }
  }, {
    "step" : "json_normalizations_output",
    "snapshot" : {
      "text" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations\n\nMr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
      "html" : "<?xml version='1.0' encoding='UTF-8' standalone='yes'?><html><head><title>IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations</title><meta name=\"author\" content=\"Jake Bake\"/></head><body><p>Mr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.</p></body></html>",
      "metadata" : {
        "title" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations",
        "author" : "Jake Bake",
        "category" : "news"
      }
    }
  }, {
    "step" : "enrichments_output",
    "snapshot" : {
      "html" : "<?xml version='1.0' encoding='UTF-8' standalone='yes'?><html><head><title>IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations</title><meta name=\"author\" content=\"Jake Bake\"/></head><body><p>Mr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.</p></body></html>",
      "text" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations\n\nMr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
      "metadata" : {
        "title" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations",
        "author" : "Jake Bake",
        "category" : "news"
      },
      "alchemy_enriched_text" : {
        "status" : "OK",
        "language" : "english",
        "text" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations\n\nMr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
        "keywords" : [ {
          "relevance" : 0.978167,
          "text" : "IBM Watson Group"
        }, {
          "relevance" : 0.932488,
          "text" : "cloud-based cognitive apps"
        }, {
          "relevance" : 0.838797,
          "text" : "IBM business unit"
        }, {
          "relevance" : 0.830576,
          "text" : "New York City"
        }, {
          "relevance" : 0.736028,
          "text" : "Silicon Alley"
        }, {
          "relevance" : 0.67098,
          "text" : "Title sample"
        }, {
          "relevance" : 0.648253,
          "text" : "Mr. Rhodin"
        }, {
          "relevance" : 0.49768,
          "text" : "start-ups"
        }, {
          "relevance" : 0.433437,
          "text" : "heart"
        }, {
          "relevance" : 0.432563,
          "text" : "products"
        } ]
      },
      "sire_enriched_text" : {
        "status" : "OK",
        "language" : "english",
        "model" : "ie-en-news",
        "typedRelations" : [ {
          "arguments" : [ {
            "entities" : [ {
              "id" : "-E3",
              "text" : "Silicon Alley",
              "type" : "GeopoliticalEntity"
            } ],
            "part" : "first",
            "text" : "Silicon Alley"
          }, {
            "entities" : [ {
              "id" : "-E2",
              "text" : "New York City",
              "type" : "GeopoliticalEntity"
            } ],
            "part" : "second",
            "text" : "New York City"
          } ],
          "score" : "0.898437",
          "sentence" : "Mr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
          "type" : "locatedAt"
        } ]
      }
    }
  }, {
    "step" : "normalizations_output",
    "snapshot" : {
      "title" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations",
      "author" : "Jake Bake",
      "language" : "english",
      "text" : "IBM Forms New Watson Group to Meet Growing Demand for Cognitive Innovations\n\nMr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
      "alchemy_enriched_text" : {
        "keywords" : [ {
          "relevance" : 0.978167,
          "text" : "IBM Watson Group"
        }, {
          "relevance" : 0.932488,
          "text" : "cloud-based cognitive apps"
        }, {
          "relevance" : 0.838797,
          "text" : "IBM business unit"
        }, {
          "relevance" : 0.830576,
          "text" : "New York City"
        }, {
          "relevance" : 0.736028,
          "text" : "Silicon Alley"
        }, {
          "relevance" : 0.67098,
          "text" : "Title sample"
        }, {
          "relevance" : 0.648253,
          "text" : "Mr. Rhodin"
        }, {
          "relevance" : 0.49768,
          "text" : "start-ups"
        }, {
          "relevance" : 0.433437,
          "text" : "heart"
        }, {
          "relevance" : 0.432563,
          "text" : "products"
        } ]
      },
      "sire_enriched_text" : {
        "typedRelations" : [ {
          "arguments" : [ {
            "entities" : [ {
              "id" : "-E3",
              "text" : "Silicon Alley",
              "type" : "GeopoliticalEntity"
            } ],
            "part" : "first",
            "text" : "Silicon Alley"
          }, {
            "entities" : [ {
              "id" : "-E2",
              "text" : "New York City",
              "type" : "GeopoliticalEntity"
            } ],
            "part" : "second",
            "text" : "New York City"
          } ],
          "score" : "0.898437",
          "sentence" : "Mr. Rhodin will lead the IBM Watson Group, a new IBM business unit headquartered in the heart of New York City's Silicon Alley that will develop products and collaborate with start-ups on cloud-based cognitive apps and services powered by Watson.",
          "type" : "locatedAt"
        } ]
      }
    }
  } ],
  "notices" : [ {
    "notice_id" : "xpath_not_found",
    "severity" : "warning",
    "step" : "html-to-html",
    "description" : "xpath '/home' not found"
  } ]
}
        

Response Codes

Status Description
200

The document was successfully processed.

400

Bad request if:

  • The request is incorrectly formatted
  • The configuration_id parameter refers to a non-existent configuration
  • The default configuration_id of the collection refers to a non-existent configuration (and no override has been provided). The error message contains details about what caused the request to be rejected.

Collections

Create a collection for your documents. Each environment can have multiple collections.

Create a collection

Request

POST /v1/environments/{environment_id}/collections
Parameter Description
environment_id path string

The ID of the environment.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body CreateCollectionRequest

Input an object that allows you to add a collection.

CreateCollectionRequest
Name Description
name string

The name of the collection to be created.

description string

A description of the collection.

configuration_id string

The ID of the configuration in which the collection is to be created.

language string

The language of the documents stored in the collection, in the form of an ISO 639-1 language code.

Allowable values:
  • en
  • es
  • de
  • ar
  • fr
  • it
  • ja
  • ko
  • pt
  • nl

en

Example request

      curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d '{
  "name": "test_collection",
  "description": "My test collection",
  "configuration_id": "{configuration_id}",
  "language": "en"
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections?version=2018-03-05"
        
Example JSON body

{
  "name": "{collection_name}",
  "description": "{description}",
  "configuration_id": "{configuration_id}",
  "language": "en"
}
        

Response

Collection

A collection for storing documents.

Name Description
collection_id string

The unique identifier of the collection.

name string

The name of the collection.

description string

The description of the collection.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mmcon:ss.SSS'Z'.

updated DateTime

The timestamp of when the collection was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

The status of the collection.

Possible values:
  • active
  • pending
  • maintenance
configuration_id string

The unique identifier of the collection's configuration.

language string

The language of the documents stored in the collection. Permitted values include en (English), de (German), and es (Spanish).

document_counts DocumentCounts

The object providing information about the documents in the collection. Present only when retrieving details of a collection.

disk_usage CollectionDiskUsage

The object providing information about the disk usage of the collection. Present only when retrieving details of a collection.

training_status TrainingStatus

Provides information about the status of relevance training for collection.

source_crawl SourceStatus

Object containing source crawl status information.

DocumentCounts
Name Description
available long

The total number of available documents in the collection.

processing long

The number of documents in the collection that are currently being processed.

failed long

The number of documents in the collection that failed to be ingested.

CollectionDiskUsage

Summary of the disk usage statistics for this collection.

Name Description
used_bytes integer

Number of bytes used by the collection.

TrainingStatus
Name Description
total_examples integer
available boolean
processing boolean
minimum_queries_added boolean
minimum_examples_added boolean
sufficient_label_diversity boolean
notices integer
successfully_trained DateTime
data_updated DateTime
SourceStatus

Object containing source crawl status information.

Name Description
status string

The current status of the source crawl for this collection. This field returns not_configured if the default configuration for this source does not have a source object defined.

  • running indicates that a crawl to fetch more documents is in progress.
  • complete indicates that the crawl has completed with no errors.
  • complete_with_notices indicates that some notices were generated during the crawl. Notices can be checked by using the notices query method.
  • stopped indicates that the crawl has stopped but is not complete.
Possible values:
  • running
  • complete
  • complete_with_notices
  • stopped
  • not_configured
last_updated DateTime

Date in UTC format indicating when the last crawl was attempted. If null, no crawl was completed.

Example response


{
  "collection_id" : "800e58e4-198d-45eb-be87-74e1d6df4e96",
  "name" : "test-collection",
  "configuration_id" : "3c4fff84-1500-455c-b125-eaa2d319f6d3",
  "language" : "de",
  "status" : "active",
  "description" : "A test collection to show as an example",
  "created" : "2017-07-14T12:55:40.652Z",
  "updated" : "2017-07-14T12:55:40.652Z",
  "document_counts" : {
    "available" : 0,
    "processing" : 0,
    "failed" : 0
  },
  "disk_usage" : {
    "used_bytes" : 260
  },
  "training_status" : {
    "data_updated" : "",
    "total_examples" : 0,
    "sufficient_label_diversity" : false,
    "processing" : false,
    "minimum_examples_added" : false,
    "successfully_trained" : "",
    "available" : false,
    "notices" : 0,
    "minimum_queries_added" : false
  },
  "source_crawl" : {
    "status" : "complete",
    "last_updated" : "2018-01-05T12:55:40.652Z"
  }
}
        

Response Codes

Status Description
201

Collection successfully created.

400

Bad request if the collection body does not match the expected format or if the configuration_id references a configuration that does not exist. The error string will describe why the request was rejected.

403

Forbidden. Returned if you attempt to add a collection to a read-only environment.

List collections

Lists existing collections for the service instance.

Request

GET /v1/environments/{environment_id}/collections
Parameter Description
environment_id path string

The ID of the environment.

name query string

Find collections with the given name.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections?version=2018-03-05"
        

Response

ListCollectionsResponse
Name Description
collections Collection[]

An array containing information about each collection in the environment.

Collection

A collection for storing documents.

Name Description
collection_id string

The unique identifier of the collection.

name string

The name of the collection.

description string

The description of the collection.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mmcon:ss.SSS'Z'.

updated DateTime

The timestamp of when the collection was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

The status of the collection.

Possible values:
  • active
  • pending
  • maintenance
configuration_id string

The unique identifier of the collection's configuration.

language string

The language of the documents stored in the collection. Permitted values include en (English), de (German), and es (Spanish).

document_counts DocumentCounts

The object providing information about the documents in the collection. Present only when retrieving details of a collection.

disk_usage CollectionDiskUsage

The object providing information about the disk usage of the collection. Present only when retrieving details of a collection.

training_status TrainingStatus

Provides information about the status of relevance training for collection.

source_crawl SourceStatus

Object containing source crawl status information.

DocumentCounts
Name Description
available long

The total number of available documents in the collection.

processing long

The number of documents in the collection that are currently being processed.

failed long

The number of documents in the collection that failed to be ingested.

CollectionDiskUsage

Summary of the disk usage statistics for this collection.

Name Description
used_bytes integer

Number of bytes used by the collection.

TrainingStatus
Name Description
total_examples integer
available boolean
processing boolean
minimum_queries_added boolean
minimum_examples_added boolean
sufficient_label_diversity boolean
notices integer
successfully_trained DateTime
data_updated DateTime
SourceStatus

Object containing source crawl status information.

Name Description
status string

The current status of the source crawl for this collection. This field returns not_configured if the default configuration for this source does not have a source object defined.

  • running indicates that a crawl to fetch more documents is in progress.
  • complete indicates that the crawl has completed with no errors.
  • complete_with_notices indicates that some notices were generated during the crawl. Notices can be checked by using the notices query method.
  • stopped indicates that the crawl has stopped but is not complete.
Possible values:
  • running
  • complete
  • complete_with_notices
  • stopped
  • not_configured
last_updated DateTime

Date in UTC format indicating when the last crawl was attempted. If null, no crawl was completed.

Example response


{
  "collections" : [ {
    "collection_id" : "f1360220-ea2d-4271-9d62-89a910b13c37",
    "name" : "example",
    "description" : "this is a demo collection",
    "created" : "2015-08-24T18:42:25.324Z",
    "updated" : "2015-08-24T18:42:25.324Z",
    "status" : "active",
    "configuration_id" : "6963be41-2dea-4f79-8f52-127c63c479b0",
    "language" : "en"
  } ]
}
        

Response Codes

Status Description
200

Successful response.

400

Bad request.

Get collection details

Request

GET /v1/environments/{environment_id}/collections/{collection_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}?version=2018-03-05"
        

Response

Collection

A collection for storing documents.

Name Description
collection_id string

The unique identifier of the collection.

name string

The name of the collection.

description string

The description of the collection.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mmcon:ss.SSS'Z'.

updated DateTime

The timestamp of when the collection was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

The status of the collection.

Possible values:
  • active
  • pending
  • maintenance
configuration_id string

The unique identifier of the collection's configuration.

language string

The language of the documents stored in the collection. Permitted values include en (English), de (German), and es (Spanish).

document_counts DocumentCounts

The object providing information about the documents in the collection. Present only when retrieving details of a collection.

disk_usage CollectionDiskUsage

The object providing information about the disk usage of the collection. Present only when retrieving details of a collection.

training_status TrainingStatus

Provides information about the status of relevance training for collection.

source_crawl SourceStatus

Object containing source crawl status information.

DocumentCounts
Name Description
available long

The total number of available documents in the collection.

processing long

The number of documents in the collection that are currently being processed.

failed long

The number of documents in the collection that failed to be ingested.

CollectionDiskUsage

Summary of the disk usage statistics for this collection.

Name Description
used_bytes integer

Number of bytes used by the collection.

TrainingStatus
Name Description
total_examples integer
available boolean
processing boolean
minimum_queries_added boolean
minimum_examples_added boolean
sufficient_label_diversity boolean
notices integer
successfully_trained DateTime
data_updated DateTime
SourceStatus

Object containing source crawl status information.

Name Description
status string

The current status of the source crawl for this collection. This field returns not_configured if the default configuration for this source does not have a source object defined.

  • running indicates that a crawl to fetch more documents is in progress.
  • complete indicates that the crawl has completed with no errors.
  • complete_with_notices indicates that some notices were generated during the crawl. Notices can be checked by using the notices query method.
  • stopped indicates that the crawl has stopped but is not complete.
Possible values:
  • running
  • complete
  • complete_with_notices
  • stopped
  • not_configured
last_updated DateTime

Date in UTC format indicating when the last crawl was attempted. If null, no crawl was completed.

Example response


{
  "collection_id" : "800e58e4-198d-45eb-be87-74e1d6df4e96",
  "name" : "test-collection",
  "configuration_id" : "3c4fff84-1500-455c-b125-eaa2d319f6d3",
  "language" : "de",
  "status" : "active",
  "description" : "A test collection to show as an example",
  "created" : "2017-07-14T12:55:40.652Z",
  "updated" : "2017-07-14T12:55:40.652Z",
  "document_counts" : {
    "available" : 0,
    "processing" : 0,
    "failed" : 0
  },
  "disk_usage" : {
    "used_bytes" : 260
  },
  "training_status" : {
    "data_updated" : "",
    "total_examples" : 0,
    "sufficient_label_diversity" : false,
    "processing" : false,
    "minimum_examples_added" : false,
    "successfully_trained" : "",
    "available" : false,
    "notices" : 0,
    "minimum_queries_added" : false
  },
  "source_crawl" : {
    "status" : "complete",
    "last_updated" : "2018-01-05T12:55:40.652Z"
  }
}
        

Response Codes

Status Description
200

Collection fetched.

400

Bad request.

Update a collection

Request

PUT /v1/environments/{environment_id}/collections/{collection_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body UpdateCollectionRequest

Input an object that allows you to update a collection.

UpdateCollectionRequest
Name Description
name string

The name of the collection.

description string

A description of the collection.

configuration_id string

The ID of the configuration in which the collection is to be updated.

Example request

      curl -X PUT -u "{username}":"{password}" -H "Content-Type: application/json" -d '{
  "name": "test_collection",
  "description": "My test collection",
  "configuration_id": "{configuration_id}"
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}?version=2018-03-05"
        
Example JSON body

{
  "name": "{collection_name}",
  "description": "{description}",
  "configuration_id": "{configuration_id}"
}
        

Response

Collection

A collection for storing documents.

Name Description
collection_id string

The unique identifier of the collection.

name string

The name of the collection.

description string

The description of the collection.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mmcon:ss.SSS'Z'.

updated DateTime

The timestamp of when the collection was last updated in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

The status of the collection.

Possible values:
  • active
  • pending
  • maintenance
configuration_id string

The unique identifier of the collection's configuration.

language string

The language of the documents stored in the collection. Permitted values include en (English), de (German), and es (Spanish).

document_counts DocumentCounts

The object providing information about the documents in the collection. Present only when retrieving details of a collection.

disk_usage CollectionDiskUsage

The object providing information about the disk usage of the collection. Present only when retrieving details of a collection.

training_status TrainingStatus

Provides information about the status of relevance training for collection.

source_crawl SourceStatus

Object containing source crawl status information.

DocumentCounts
Name Description
available long

The total number of available documents in the collection.

processing long

The number of documents in the collection that are currently being processed.

failed long

The number of documents in the collection that failed to be ingested.

CollectionDiskUsage

Summary of the disk usage statistics for this collection.

Name Description
used_bytes integer

Number of bytes used by the collection.

TrainingStatus
Name Description
total_examples integer
available boolean
processing boolean
minimum_queries_added boolean
minimum_examples_added boolean
sufficient_label_diversity boolean
notices integer
successfully_trained DateTime
data_updated DateTime
SourceStatus

Object containing source crawl status information.

Name Description
status string

The current status of the source crawl for this collection. This field returns not_configured if the default configuration for this source does not have a source object defined.

  • running indicates that a crawl to fetch more documents is in progress.
  • complete indicates that the crawl has completed with no errors.
  • complete_with_notices indicates that some notices were generated during the crawl. Notices can be checked by using the notices query method.
  • stopped indicates that the crawl has stopped but is not complete.
Possible values:
  • running
  • complete
  • complete_with_notices
  • stopped
  • not_configured
last_updated DateTime

Date in UTC format indicating when the last crawl was attempted. If null, no crawl was completed.

Example response


{
  "collection_id" : "800e58e4-198d-45eb-be87-74e1d6df4e96",
  "name" : "test-collection",
  "configuration_id" : "3c4fff84-1500-455c-b125-eaa2d319f6d3",
  "language" : "de",
  "status" : "active",
  "description" : "A test collection to show as an example",
  "created" : "2017-07-14T12:55:40.652Z",
  "updated" : "2017-07-14T12:55:40.652Z",
  "document_counts" : {
    "available" : 0,
    "processing" : 0,
    "failed" : 0
  },
  "disk_usage" : {
    "used_bytes" : 260
  },
  "training_status" : {
    "data_updated" : "",
    "total_examples" : 0,
    "sufficient_label_diversity" : false,
    "processing" : false,
    "minimum_examples_added" : false,
    "successfully_trained" : "",
    "available" : false,
    "notices" : 0,
    "minimum_queries_added" : false
  },
  "source_crawl" : {
    "status" : "complete",
    "last_updated" : "2018-01-05T12:55:40.652Z"
  }
}
        

Response Codes

Status Description
201

Collection successfully updated.

400

Bad request if the collection body does not match the expected format or if the configuration_id references a configuration that does not exist. The error string will describe why the request was rejected.

403

Forbidden. Returned if you attempt to update a collection in a read-only environment.

Delete a collection

Request

DELETE /v1/environments/{environment_id}/collections/{collection_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" -X DELETE  "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}?version=2018-03-05"
        

Response

DeleteCollectionResponse
Name Description
collection_id string

The unique identifier of the collection that is being deleted.

status string

The status of the collection. The status of a successful deletion operation is deleted.

Possible values:
  • deleted

Response Codes

Status Description
200

Collection successfully deleted.

400

Bad request.

A bad request is returned any time there is a problem with the request itself.

Example error messages:

  • Could not find listed collection - if the ID is incorrectly formatted.
403

Forbidden. Returned if you attempt to delete a collection in a read-only environment.

404

Returned any time the collection is not found (even immediately after the collection was successfully deleted).

Example error message:

A collection with ID '2cd8bc72-d737-46e3-b26b-05a585111111' was not found.

List collection fields

Gets a list of the unique fields (and their types) stored in the index.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/fields
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/fields?version=2018-03-05"
        

Response

ListCollectionFieldsResponse

The list of fetched fields.

The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations.

  • Fields which contain nested JSON objects are assigned a type of "nested".

  • Fields which belong to a nested object are prefixed with .properties (for example, warnings.properties.severity means that the warnings object has a property called severity).

  • Fields returned from the News collection are prefixed with v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).

Name Description
fields Field[]

An array containing information about each field in the collections.

Field
Name Description
field string

The name of the field.

type string

The type of the field.

Possible values:
  • nested
  • string
  • date
  • long
  • integer
  • short
  • byte
  • double
  • float
  • boolean
  • binary

Example response


{
  "fields" : [ {
    "field" : "warnings",
    "type" : "nested"
  }, {
    "field" : "warnings.properties.description",
    "type" : "string"
  }, {
    "field" : "warnings.properties.phase",
    "type" : "string"
  }, {
    "field" : "warnings.properties.warning_id",
    "type" : "string"
  } ]
}
        

Response Codes

Status Description
200

The list of fetched fields.

The fields are returned using a fully qualified name format, however, the format differs slightly from that used by the query operations:

  • Fields which contain nested JSON objects are assigned a type of "nested".

  • Fields which belong to a nested object are prefixed with .properties (for example, warnings.properties.severity means that the warnings object has a property called severity).

  • Fields returned from the News collection are prefixed with v{N}-fullnews-t3-{YEAR}.mappings (for example, v5-fullnews-t3-2016.mappings.text.properties.author).

400

Bad request.

Expansions

Add and manage query expansions for a collection. For additional details, see using query expansion.

Get the expansion list

Returns the current expansion list for the specified collection. If an expansion list is not specified, an object with empty expansion arrays is returned.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/expansions
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" -X GET "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/expansions?version=2018-03-05"
        

Response

Expansions

The query expansion definitions for the specified collection.

Name Description
expansions Expansion[]

An array of query expansion definitions.

Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.

To create a bi-directional expansion specify an expanded_terms array. When found in a query, all items in the expanded_terms array are then expanded to the other items in the same array.

To create a uni-directional expansion, specify both an array of input_terms and an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array.

Expansion

An expansion definition. Each object respresents one set of expandable strings. For example, you could have expansions for the word hot in one object, and expansions for the word cold in another.

Name Description
input_terms string[]

A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.

expanded_terms string[]

A list of terms that this expansion will be expanded to. If specified without input_terms, it also functions as the input term list.

Response Codes

Status Description
200

Successfully fetched expansions details.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Create or update expansion list

Create or replace the Expansion list for this collection. The maximum number of expanded terms per collection is 500. The current expansion list is replaced with the uploaded content.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/expansions
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body Expansions

An object that defines the expansion list.

Expansions

The query expansion definitions for the specified collection.

Name Description
expansions Expansion[]

An array of query expansion definitions.

Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.

To create a bi-directional expansion specify an expanded_terms array. When found in a query, all items in the expanded_terms array are then expanded to the other items in the same array.

To create a uni-directional expansion, specify both an array of input_terms and an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array.

Expansion

An expansion definition. Each object respresents one set of expandable strings. For example, you could have expansions for the word hot in one object, and expansions for the word cold in another.

Name Description
input_terms string[]

A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.

expanded_terms string[]

A list of terms that this expansion will be expanded to. If specified without input_terms, it also functions as the input term list.

Example request

curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d @expansions.json "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/expansions?version=2018-03-05"
        

Response

Expansions

The query expansion definitions for the specified collection.

Name Description
expansions Expansion[]

An array of query expansion definitions.

Each object in the expansions array represents a term or set of terms that will be expanded into other terms. Each expansion object can be configured as bidirectional or unidirectional. Bidirectional means that all terms are expanded to all other terms in the object. Unidirectional means that a set list of terms can be expanded into a second list of terms.

To create a bi-directional expansion specify an expanded_terms array. When found in a query, all items in the expanded_terms array are then expanded to the other items in the same array.

To create a uni-directional expansion, specify both an array of input_terms and an array of expanded_terms. When items in the input_terms array are present in a query, they are expanded using the items listed in the expanded_terms array.

Expansion

An expansion definition. Each object respresents one set of expandable strings. For example, you could have expansions for the word hot in one object, and expansions for the word cold in another.

Name Description
input_terms string[]

A list of terms that will be expanded for this expansion. If specified, only the items in this list are expanded.

expanded_terms string[]

A list of terms that this expansion will be expanded to. If specified without input_terms, it also functions as the input term list.

Response Codes

Status Description
200

The expansion list has been accepted and it will used for all future queries

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

500

Timeout when uploading the expansion list.

Delete the expansion list

Remove the expansion information for this collection. The expansion list must be deleted to disable query expansion for a collection.

Request

DELETE /v1/environments/{environment_id}/collections/{collection_id}/expansions
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/expansions?version=2018-03-05"
        

Response

No response body.

Response Codes

Status Description
200

The expansion list was successfully deleted.

400

Bad request.

A bad request is returned any time there is a problem with the request itself.

Documents

Add and update documents in your collection.

Add a document

Add a document to a collection with optional metadata.

  • The version query parameter is still required.

  • Returns immediately after the system has accepted the document for processing.

  • The user must provide document content, metadata, or both. If the request is missing both document content and metadata, it is rejected.

  • The user can set the Content-Type parameter on the file part to indicate the media type of the document. If the Content-Type parameter is missing or is one of the generic media types (for example, application/octet-stream), then the service attempts to automatically detect the document's media type.

  • The following field names are reserved and will be filtered out if present after normalization: id, score, highlight, and any field with the prefix of: _, +, or -

  • Fields with empty name values after normalization are filtered out before indexing.

  • Fields containing the following characters after normalization are filtered out before indexing: # and ,.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/documents
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

file form file

The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected.

metadata form string

If you're using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: { \"Creator\": \"Johnny Appleseed\", \"Subject\": \"Apples\" }.

Example request

curl -X POST -u "{username}":"{password}" -F file=@sample1.html "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/documents?version=2018-03-05"
        

Response

DocumentAccepted
Name Description
document_id string

The unique identifier of the ingested document.

status string

Status of the document in the ingestion process.

Possible values:
  • processing
notices Notice[]

Array of notices produced by the document-ingestion process.

Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

Example response


{
  "document_id" : "f1360220-ea2d-4271-9d62-89a910b13c37",
  "status" : "processing"
}
        

Response Codes

Status Description
202

The document has been accepted and will be processed.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

403

Forbidden. Returned if you attempt to add a document to a collection in a read-only environment.

Get document details

Fetch status details about a submitted document. Note: this operation does not return the document itself. Instead, it returns only the document's processing status and any notices (warnings or errors) that were generated when the document was ingested. Use the query API to retrieve the actual document content.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/documents/{document_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

document_id path string

The ID of the document.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/documents/{document_id}?version=2018-03-05"
        

Response

DocumentStatus

Status information about a submitted document.

Name Description
document_id string

The unique identifier of the document.

configuration_id string

The unique identifier for the configuration.

created DateTime

The creation date of the document in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

updated DateTime

Date of the most recent document update, in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

status string

Status of the document in the ingestion process.

Possible values:
  • available
  • available with notices
  • failed
  • processing
status_description string

Description of the document status.

filename string

Name of the original source file (if available).

file_type string

The type of the original source file.

Possible values:
  • pdf
  • html
  • word
  • json
sha1 string

The SHA-1 hash of the original source file (formatted as a hexadecimal string).

notices Notice[]

Array of notices produced by the document-ingestion process.

Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

Example response


{
  "document_id" : "f1360220-ea2d-4271-9d62-89a910b13c37",
  "configuration_id" : "e8b9d793-b163-452a-9373-bce07efb510b",
  "created" : "2015-08-24T18:42:25.324Z",
  "updated" : "2015-08-24T18:42:25.324Z",
  "status" : "available with notices",
  "status_description" : "Document is successfully ingested but was indexed with warnings",
  "filename" : "instructions.html",
  "file_type" : "html",
  "sha1" : "de9f2c7fd25e1b3afad3e85a0bd17d9b100db4b3",
  "notices" : [ {
    "notice_id" : "index_342",
    "severity" : "warning",
    "step" : "indexing",
    "description" : "something bad happened",
    "document_id" : "f1360220-ea2d-4271-9d62-89a910b13c37"
  } ]
}
        

Response Codes

Status Description
200

Successfully fetched document details.

400

Bad request.

403

Forbidden. Returned if you attempt to get the status of a document in a collection in a read-only environment.

Update a document

Replace an existing document. Starts ingesting a document with optional metadata.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/documents/{document_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

document_id path string

The ID of the document.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

file form file

The content of the document to ingest. The maximum supported file size is 50 megabytes. Files larger than 50 megabytes is rejected.

metadata form string

If you're using the Data Crawler to upload your documents, you can test a document against the type of metadata that the Data Crawler might send. The maximum supported metadata file size is 1 MB. Metadata parts larger than 1 MB are rejected. Example: { \"Creator\": \"Johnny Appleseed\", \"Subject\": \"Apples\" }.

Example request

curl -X POST -u "{username}":"{password}" -F "file=@1.html;type=text/html" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/documents/{document_id}?version=2018-03-05"
        

Response

DocumentAccepted
Name Description
document_id string

The unique identifier of the ingested document.

status string

Status of the document in the ingestion process.

Possible values:
  • processing
notices Notice[]

Array of notices produced by the document-ingestion process.

Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

Example response


{
  "document_id" : "f1360220-ea2d-4271-9d62-89a910b13c37",
  "status" : "processing"
}
        

Response Codes

Status Description
202

The document has been accepted and it will be processed.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

403

Forbidden. Returned if you attempt to add or update a document in a collection in a read-only environment.

Delete a document

If the given document ID is invalid, or if the document is not found, then the a success response is returned (HTTP status code 200) with the status set to 'deleted'.

Request

DELETE /v1/environments/{environment_id}/collections/{collection_id}/documents/{document_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

document_id path string

The ID of the document.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/documents/{document_id}?version=2018-03-05"
        

Response

DeleteDocumentResponse
Name Description
document_id string

The unique identifier of the document.

status string

Status of the document. A deleted document has the status deleted.

Possible values:
  • deleted

Response Codes

Status Description
200

The document was successfully deleted.

400

Bad request.

A bad request is returned any time there is a problem with the request itself.

403

Forbidden. Returned if you attempt to delete a document in a collection in a read-only environment.

Queries

Query the documents in your service instance.

Query your collection

After your content is uploaded and enriched by the Discovery service, you can build queries to search your content. For details, see the Discovery service documentation.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/query
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

filter query string

A cacheable query that limits the documents returned to exclude any documents that don't mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.

query query string

A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.

natural_language_query query string

A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.

passages query boolean

A passages query that returns the most relevant passages from the results.

aggregation query string

An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.

count query integer

Number of results to return.

10

return query string[]

A comma separated list of the portion of the document hierarchy to return.

offset query integer

The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.

sort query string[]

A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.

highlight query boolean

When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.

passages.fields query string[]

A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included.

passages.count query integer

The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100.

passages.characters query integer

The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000.

deduplicate query boolean

When true and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality.

false

deduplicate.field query string

When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.

similar query boolean

When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter.

false

similar.document_ids query string[]

A comma-separated list of document IDs that will be used to find similar documents.

Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope.

similar.fields query string[]

A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

X-Watson-Logging-Opt-Out header boolean

If true, queries are not stored in the Discovery Logs endpoint.

false

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/query?version=2018-03-05&query=relations.action.lemmatized:acquire&count=15&filter=entities.text:IBM&return=text"
        

Response

QueryResponse

A response containing the documents and aggregations for the query.

Name Description
matching_results integer
results QueryResult[]
aggregations QueryAggregation[]
passages QueryPassages[]
duplicates_removed integer
session_token string

The session token for this query. The session token can be used to add events associated with this query to the query and event log.

Important: Session tokens are case sensitive.

QueryResult
Name Description
id string

The unique identifier of the document.

score double

Deprecated This field is now part of the result_metadata object.

metadata object

Metadata of the document.

collection_id string

The collection ID of the collection containing the document for this result.

result_metadata QueryResultMetadata

Metadata of the query result.

QueryResultMetadata

Metadata of a query result.

Name Description
score double

An unbounded measure of the relevance of a particular result, dependent on the query and matching document. A higher score indicates a greater match to the query parameters.

confidence double

The confidence score for the given result. Calculated based on how relevant the result is estimated to be, compared to a trained relevancy model. confidence can range from 0.0 to 1.0. The higher the number, the more relevant the document.

QueryAggregation

An aggregation produced by the Discovery service to analyze the input provided.

Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

AggregationResult
Name Description
key string

Key that matched the aggregation type.

matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned in the case of chained aggregations.

Histogram
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval integer

Interval of the aggregation. (For 'histogram' type).

Calculation
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

value double

Value of the aggregation.

TopHits
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

size integer

Number of top hits returned by the aggregation.

hits TopHitsResults
TopHitsResults
Name Description
matching_results integer

Number of matching results.

hits QueryResult[]

Top results returned by the aggregation.

Timeslice
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval string

Interval of the aggregation. Valid date interval values are second/seconds minute/minutes, hour/hours, day/days, week/weeks, month/months, and year/years.

anomaly boolean

Used to indicate that anomaly detection should be performed. Anomaly detection is used to locate unusual datapoints within a time series.

Term
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

count integer
Nested
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

path string

The area of the results the aggregation was restricted to.

Filter
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

match string

The match the aggregated results queried for.

QueryPassages
Name Description
document_id string

The unique identifier of the document from which the passage has been extracted.

passage_score double

The confidence score of the passages's analysis. A higher score indicates greater confidence.

passage_text string

The content of the extracted passage.

start_offset integer

The position of the first character of the extracted passage in the originating field.

end_offset integer

The position of the last character of the extracted passage in the originating field.

field string

The label of the field from which the passage has been extracted.

Example response


{
  "matching_results" : 24,
  "results" : [ {
    "id" : "watson-generated ID",
    "score" : 1
  } ],
  "aggregations" : {
    "term" : {
      "results" : [ {
        "key" : "active",
        "matching_results" : 34
      } ]
    }
  }
}
        

Response Codes

Status Description
200

Query executed successfully.

400

Bad request.

Query system notices

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the Discovery service documentation for more details on the query language.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/notices
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

filter query string

A cacheable query that limits the documents returned to exclude any documents that don't mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.

query query string

A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.

natural_language_query query string

A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.

passages query boolean

A passages query that returns the most relevant passages from the results.

aggregation query string

An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.

count query integer

Number of results to return.

10

return query string[]

A comma separated list of the portion of the document hierarchy to return.

offset query integer

The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.

sort query string[]

A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.

highlight query boolean

When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.

passages.fields query string[]

A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included.

passages.count query integer

The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100.

passages.characters query integer

The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000.

deduplicate.field query string

When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.

similar query boolean

When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter.

false

similar.document_ids query string[]

A comma-separated list of document IDs that will be used to find similar documents.

Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope.

similar.fields query string[]

A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/notices?collection_ids={id1},{id2}&version=2018-03-05&filter=entities.text:error"
        

Response

QueryNoticesResponse
Name Description
matching_results integer
results QueryNoticesResult[]
aggregations QueryAggregation[]
passages QueryPassages[]
duplicates_removed integer
QueryNoticesResult
Name Description
id string

The unique identifier of the document.

score double

Deprecated This field is now part of the result_metadata object.

metadata object

Metadata of the document.

collection_id string

The collection ID of the collection containing the document for this result.

result_metadata QueryResultMetadata

Metadata of the query result.

code integer

The internal status code returned by the ingestion subsystem indicating the overall result of ingesting the source document.

filename string

Name of the original source file (if available).

file_type string

The type of the original source file.

Possible values:
  • pdf
  • html
  • word
  • json
sha1 string

The SHA-1 hash of the original source file (formatted as a hexadecimal string).

notices Notice[]

Array of notices for the document.

QueryResultMetadata

Metadata of a query result.

Name Description
score double

An unbounded measure of the relevance of a particular result, dependent on the query and matching document. A higher score indicates a greater match to the query parameters.

confidence double

The confidence score for the given result. Calculated based on how relevant the result is estimated to be, compared to a trained relevancy model. confidence can range from 0.0 to 1.0. The higher the number, the more relevant the document.

Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

QueryAggregation

An aggregation produced by the Discovery service to analyze the input provided.

Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

AggregationResult
Name Description
key string

Key that matched the aggregation type.

matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned in the case of chained aggregations.

Histogram
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval integer

Interval of the aggregation. (For 'histogram' type).

Calculation
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

value double

Value of the aggregation.

TopHits
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

size integer

Number of top hits returned by the aggregation.

hits TopHitsResults
TopHitsResults
Name Description
matching_results integer

Number of matching results.

hits QueryResult[]

Top results returned by the aggregation.

QueryResult
Name Description
id string

The unique identifier of the document.

score double

Deprecated This field is now part of the result_metadata object.

metadata object

Metadata of the document.

collection_id string

The collection ID of the collection containing the document for this result.

result_metadata QueryResultMetadata

Metadata of the query result.

Timeslice
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval string

Interval of the aggregation. Valid date interval values are second/seconds minute/minutes, hour/hours, day/days, week/weeks, month/months, and year/years.

anomaly boolean

Used to indicate that anomaly detection should be performed. Anomaly detection is used to locate unusual datapoints within a time series.

Term
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

count integer
Nested
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

path string

The area of the results the aggregation was restricted to.

Filter
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

match string

The match the aggregated results queried for.

QueryPassages
Name Description
document_id string

The unique identifier of the document from which the passage has been extracted.

passage_score double

The confidence score of the passages's analysis. A higher score indicates greater confidence.

passage_text string

The content of the extracted passage.

start_offset integer

The position of the first character of the extracted passage in the originating field.

end_offset integer

The position of the last character of the extracted passage in the originating field.

field string

The label of the field from which the passage has been extracted.

Example response


{
  "matching_results" : 24,
  "results" : [ {
    "id" : "030ba125-29db-43f2-8552-f941ae30a7a8",
    "collection_id" : "f1360220-ea2d-4271-9d62-89a910b13c37",
    "code" : 200,
    "score" : 1,
    "filename" : "instructions.html",
    "file_type" : "html",
    "sha1" : "de9f2c7fd25e1b3afad3e85a0bd17d9b100db4b3",
    "notices" : [ {
      "notice_id" : "xpath_not_found",
      "created" : "2016-09-20T17:26:17.000Z",
      "document_id" : "030ba125-29db-43f2-8552-f941ae30a7a8",
      "severity" : "warning",
      "step" : "html-to-html",
      "description" : "The xpath expression \"boom\" was not found."
    } ]
  } ],
  "aggregations" : {
    "term" : {
      "results" : [ {
        "key" : "warning",
        "matching_results" : 34
      } ]
    }
  }
}
        

Response Codes

Status Description
200

Query for notices executed successfully.

400

Bad request.

Query documents in multiple collections

See the Discovery service documentation for more details.

Request

GET /v1/environments/{environment_id}/query
Parameter Description
environment_id path string

The ID of the environment.

collection_ids query string[]

A comma-separated list of collection IDs to be queried against.

filter query string

A cacheable query that limits the documents returned to exclude any documents that don't mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.

query query string

A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.

natural_language_query query string

A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.

aggregation query string

An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.

count query integer

Number of results to return.

10

return query string[]

A comma separated list of the portion of the document hierarchy to return.

offset query integer

The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.

sort query string[]

A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.

highlight query boolean

When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.

deduplicate query boolean

When true and used with a Watson Discovery News collection, duplicate results (based on the contents of the title field) are removed. Duplicate comparison is limited to the current query only; offset is not considered. This parameter is currently Beta functionality.

false

deduplicate.field query string

When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.

similar query boolean

When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter.

false

similar.document_ids query string[]

A comma-separated list of document IDs that will be used to find similar documents.

Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope.

similar.fields query string[]

A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

passages query boolean

A passages query that returns the most relevant passages from the results.

passages.fields query string[]

A comma-separated list of fields that passages are drawn from. If this parameter not specified, then all top-level fields are included.

passages.count query integer

The maximum number of passages to return. The search returns fewer passages if the requested total is not found. The default is 10. The maximum is 100.

passages.characters query integer

The approximate number of characters that any one passage will have. The default is 400. The minimum is 50. The maximum is 2000.

Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/query?collection_ids={id1},{id2}&version=2018-03-05&query=relations.action.lemmatized:acquire&count=15&filter=entities.text:IBM&return=text"
        

Response

QueryResponse

A response containing the documents and aggregations for the query.

Name Description
matching_results integer
results QueryResult[]
aggregations QueryAggregation[]
passages QueryPassages[]
duplicates_removed integer
session_token string

The session token for this query. The session token can be used to add events associated with this query to the query and event log.

Important: Session tokens are case sensitive.

QueryResult
Name Description
id string

The unique identifier of the document.

score double

Deprecated This field is now part of the result_metadata object.

metadata object

Metadata of the document.

collection_id string

The collection ID of the collection containing the document for this result.

result_metadata QueryResultMetadata

Metadata of the query result.

QueryResultMetadata

Metadata of a query result.

Name Description
score double

An unbounded measure of the relevance of a particular result, dependent on the query and matching document. A higher score indicates a greater match to the query parameters.

confidence double

The confidence score for the given result. Calculated based on how relevant the result is estimated to be, compared to a trained relevancy model. confidence can range from 0.0 to 1.0. The higher the number, the more relevant the document.

QueryAggregation

An aggregation produced by the Discovery service to analyze the input provided.

Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

AggregationResult
Name Description
key string

Key that matched the aggregation type.

matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned in the case of chained aggregations.

Histogram
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval integer

Interval of the aggregation. (For 'histogram' type).

Calculation
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

value double

Value of the aggregation.

TopHits
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

size integer

Number of top hits returned by the aggregation.

hits TopHitsResults
TopHitsResults
Name Description
matching_results integer

Number of matching results.

hits QueryResult[]

Top results returned by the aggregation.

Timeslice
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval string

Interval of the aggregation. Valid date interval values are second/seconds minute/minutes, hour/hours, day/days, week/weeks, month/months, and year/years.

anomaly boolean

Used to indicate that anomaly detection should be performed. Anomaly detection is used to locate unusual datapoints within a time series.

Term
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

count integer
Nested
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

path string

The area of the results the aggregation was restricted to.

Filter
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

match string

The match the aggregated results queried for.

QueryPassages
Name Description
document_id string

The unique identifier of the document from which the passage has been extracted.

passage_score double

The confidence score of the passages's analysis. A higher score indicates greater confidence.

passage_text string

The content of the extracted passage.

start_offset integer

The position of the first character of the extracted passage in the originating field.

end_offset integer

The position of the last character of the extracted passage in the originating field.

field string

The label of the field from which the passage has been extracted.

Example response


{
  "matching_results" : 24,
  "results" : [ {
    "id" : "watson-generated ID",
    "score" : 1
  } ],
  "aggregations" : {
    "term" : {
      "results" : [ {
        "key" : "active",
        "matching_results" : 34
      } ]
    }
  }
}
        

Response Codes

Status Description
200

Query executed successfully.

400

Bad request.

Query multiple collection system notices

Queries for notices (errors or warnings) that might have been generated by the system. Notices are generated when ingesting documents and performing relevance training. See the Discovery service documentation for more details on the query language.

Request

GET /v1/environments/{environment_id}/notices
Parameter Description
environment_id path string

The ID of the environment.

collection_ids query string[]

A comma-separated list of collection IDs to be queried against.

filter query string

A cacheable query that limits the documents returned to exclude any documents that don't mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.

query query string

A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.

natural_language_query query string

A natural language query that returns relevant documents by utilizing training data and natural language understanding. You cannot use natural_language_query and query at the same time.

aggregation query string

An aggregation search uses combinations of filters and query search to return an exact answer. Aggregations are useful for building applications, because you can use them to build lists, tables, and time series. For a full list of possible aggregrations, see the Query reference.

count query integer

Number of results to return.

10

return query string[]

A comma separated list of the portion of the document hierarchy to return.

offset query integer

The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.

sort query string[]

A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.

highlight query boolean

When true a highlight field is returned for each result which contains the fields that match the query with <em></em> tags around the matching query terms. Defaults to false.

deduplicate.field query string

When specified, duplicate results based on the field specified are removed from the returned results. Duplicate comparison is limited to the current query only, offset is not considered. This parameter is currently Beta functionality.

similar query boolean

When true, results are returned based on their similarity to the document IDs specified in the similar.document_ids parameter.

false

similar.document_ids query string[]

A comma-separated list of document IDs that will be used to find similar documents.

Note: If the natural_language_query parameter is also specified, it will be used to expand the scope of the document similarity search to include the natural language query. Other query parameters, such as filter and query are subsequently applied and reduce the query scope.

similar.fields query string[]

A comma-separated list of field names that will be used as a basis for comparison to identify similar documents. If not specified, the entire document is used for comparison.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Example request

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/notices?collection_ids={id1},{id2}&version=2018-03-05&filter=entities.text:error"
        

Response

QueryNoticesResponse
Name Description
matching_results integer
results QueryNoticesResult[]
aggregations QueryAggregation[]
passages QueryPassages[]
duplicates_removed integer
QueryNoticesResult
Name Description
id string

The unique identifier of the document.

score double

Deprecated This field is now part of the result_metadata object.

metadata object

Metadata of the document.

collection_id string

The collection ID of the collection containing the document for this result.

result_metadata QueryResultMetadata

Metadata of the query result.

code integer

The internal status code returned by the ingestion subsystem indicating the overall result of ingesting the source document.

filename string

Name of the original source file (if available).

file_type string

The type of the original source file.

Possible values:
  • pdf
  • html
  • word
  • json
sha1 string

The SHA-1 hash of the original source file (formatted as a hexadecimal string).

notices Notice[]

Array of notices for the document.

QueryResultMetadata

Metadata of a query result.

Name Description
score double

An unbounded measure of the relevance of a particular result, dependent on the query and matching document. A higher score indicates a greater match to the query parameters.

confidence double

The confidence score for the given result. Calculated based on how relevant the result is estimated to be, compared to a trained relevancy model. confidence can range from 0.0 to 1.0. The higher the number, the more relevant the document.

Notice

A notice produced for the collection.

Name Description
notice_id string

Identifies the notice. Many notices might have the same ID. This field exists so that user applications can programmatically identify a notice and take automatic corrective action.

created DateTime

The creation date of the collection in the format yyyy-MM-dd'T'HH:mm:ss.SSS'Z'.

document_id string

Unique identifier of the document.

query_id string

Unique identifier of the query used for relevance training.

severity string

Severity level of the notice.

Possible values:
  • warning
  • error
step string

Ingestion or training step in which the notice occurred.

description string

The description of the notice.

QueryAggregation

An aggregation produced by the Discovery service to analyze the input provided.

Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

AggregationResult
Name Description
key string

Key that matched the aggregation type.

matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned in the case of chained aggregations.

Histogram
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval integer

Interval of the aggregation. (For 'histogram' type).

Calculation
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

value double

Value of the aggregation.

TopHits
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

size integer

Number of top hits returned by the aggregation.

hits TopHitsResults
TopHitsResults
Name Description
matching_results integer

Number of matching results.

hits QueryResult[]

Top results returned by the aggregation.

QueryResult
Name Description
id string

The unique identifier of the document.

score double

Deprecated This field is now part of the result_metadata object.

metadata object

Metadata of the document.

collection_id string

The collection ID of the collection containing the document for this result.

result_metadata QueryResultMetadata

Metadata of the query result.

Timeslice
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

interval string

Interval of the aggregation. Valid date interval values are second/seconds minute/minutes, hour/hours, day/days, week/weeks, month/months, and year/years.

anomaly boolean

Used to indicate that anomaly detection should be performed. Anomaly detection is used to locate unusual datapoints within a time series.

Term
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

field string

The field where the aggregation is located in the document.

count integer
Nested
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

path string

The area of the results the aggregation was restricted to.

Filter
Name Description
type string

The type of aggregation command used. For example: term, filter, max, min, etc.

results AggregationResult[]
matching_results integer

Number of matching results.

aggregations QueryAggregation[]

Aggregations returned by the Discovery service.

match string

The match the aggregated results queried for.

QueryPassages
Name Description
document_id string

The unique identifier of the document from which the passage has been extracted.

passage_score double

The confidence score of the passages's analysis. A higher score indicates greater confidence.

passage_text string

The content of the extracted passage.

start_offset integer

The position of the first character of the extracted passage in the originating field.

end_offset integer

The position of the last character of the extracted passage in the originating field.

field string

The label of the field from which the passage has been extracted.

Example response


{
  "matching_results" : 24,
  "results" : [ {
    "id" : "030ba125-29db-43f2-8552-f941ae30a7a8",
    "collection_id" : "f1360220-ea2d-4271-9d62-89a910b13c37",
    "code" : 200,
    "score" : 1,
    "filename" : "instructions.html",
    "file_type" : "html",
    "sha1" : "de9f2c7fd25e1b3afad3e85a0bd17d9b100db4b3",
    "notices" : [ {
      "notice_id" : "xpath_not_found",
      "created" : "2016-09-20T17:26:17.000Z",
      "document_id" : "030ba125-29db-43f2-8552-f941ae30a7a8",
      "severity" : "warning",
      "step" : "html-to-html",
      "description" : "The xpath expression \"boom\" was not found."
    } ]
  } ],
  "aggregations" : {
    "term" : {
      "results" : [ {
        "key" : "warning",
        "matching_results" : 34
      } ]
    }
  }
}
        

Response Codes

Status Description
200

Query for notices executed successfully.

400

Bad request.

Knowledge Graph entity query

See the Knowledge Graph documentation for more details.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/query_entities
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

entity_query body QueryEntities

An object specifying the entities to query, which functions to perform, and any additional constraints.

QueryEntities
Name Description
feature string

The entity query feature to perform. Supported features are disambiguate and similar_entities.

entity QueryEntitiesEntity

A text string that appears within the entity text field.

context QueryEntitiesContext

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

count integer

The number of results to return. The default is 10. The maximum is 1000.

evidence_count integer

The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000.

QueryEntitiesEntity

A text string that appears within the entity text field.

Name Description
text string

Entity text content.

type string

The type of the specified entity.

QueryEntitiesContext

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

Name Description
text string

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

Example request

curl -u "{username}":"{password}" -H 'content-type: application/json' -d '{
       "feature": "disambiguate"
       "entity": {
         "text": "Steve",
         "type": "Person"
       },
       "context": {
         "text": "iphone"
       },
       "count": 100
     }'
      "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/query_entities?version=2018-03-05"
        

Response

QueryEntitiesResponse

An array of entities resulting from the query.

Name Description
entities QueryEntitiesResponseItem[]
QueryEntitiesResponseItem

Object containing Entity query response information.

Name Description
text string

Entity text content.

type string

The type of the result entity.

evidence QueryEvidence[]

List of different evidentiary items to support the result.

QueryEvidence

Description of evidence location supporting Knoweldge Graph query result.

Name Description
document_id string

The docuemnt ID (as indexed in Discovery) of the evidence location.

field string

The field of the document where the supporting evidence was identified.

start_offset integer

The start location of the evidence in the identified field. This value is inclusive.

end_offset integer

The end location of the evidence in the identified field. This value is inclusive.

entities QueryEvidenceEntity[]

An array of entity objects that show evidence of the result.

QueryEvidenceEntity

Entity description and location within evidence field.

Name Description
type string

The entity type for this entity. Possible types vary based on model used.

text string

The original text of this entity as found in the evidence field.

start_offset integer

The start location of the entity text in the identified field. This value is inclusive.

end_offset integer

The end location of the entity text in the identified field. This value is exclusive.

Response Codes

Status Description
200

Entity query executed successfully.

400

Bad request.

Knowledge Graph relationship query

See the Knowledge Graph documentation for more details.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/query_relations
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

relationship_query body QueryRelations

An object that describes the relationships to be queried and any query constraints (such as filters).

QueryRelations

A respresentation of a relationship query.

Name Description
entities QueryRelationsEntity[]

An array of entities to find relationships for.

context QueryEntitiesContext

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

sort string

The sorting method for the relationships, can be score or frequency. frequency is the number of unique times each entity is identified. The default is score.

Allowable values:
  • score
  • frequency
filter QueryRelationsFilter

Filters to apply to the relationship query.

count integer

The number of results to return. The default is 10. The maximum is 1000.

evidence_count integer

The number of evidence items to return for each result. The default is 0. The maximum number of evidence items per query is 10,000.

QueryRelationsEntity
Name Description
text string

Entity text content.

type string

The type of the specified entity.

exact boolean

If false, implicit querying is performed. The default is false.

QueryEntitiesContext

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

Name Description
text string

Entity text to provide context for the queried entity and rank based on that association. For example, if you wanted to query the city of London in England your query would look for London with the context of England.

QueryRelationsFilter
Name Description
relation_types QueryFilterType

A list of relation types to include or exclude from the query.

entity_types QueryFilterType

A list of entity types to include or exclude from the query.

document_ids string[]

A comma-separated list of document IDs to include in the query.

QueryFilterType
Name Description
exclude string[]

A comma-separated list of types to exclude.

include string[]

A comma-separated list of types to include. All other types are excluded.

Example request

curl -u "{username}":"{password}" -H 'content-type: application/json' -d '{
  "entities": [
    {
      "text": "Steve Jobs",
      "type": "PERSON",
      "exact": true
    }
  ],
  "context": {
    "text": "iphone"
  },
  "sort": "score",
  "filter": {
    "relation_types": {
      "exclude": ["colocation"],
      "include": ["locatedAt", "employedBy", "managerOf", "founderOf"]
    },
    "entity_types": {
      "exclude": ["EVENT"],
      "include": ["PERSON", "GPE", "ORGANIZATION"]
    },
    "document_ids": ["b95df4c1-d00f-4771-abb2-a52baea0444a", "ad340635-bf3e-47a5-bea5-5e778f600c32"]
  },
  "count": 10
}'
      "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/query_relations?version=2018-03-05"
        

Response

QueryRelationsResponse
Name Description
relations QueryRelationsRelationship[]
QueryRelationsRelationship
Name Description
type string

The identified relationship type.

frequency integer

The number of times the relationship is mentioned.

arguments QueryRelationsArgument[]

Information about the relationship.

evidence QueryEvidence[]

List of different evidentiary items to support the result.

QueryRelationsArgument
Name Description
entities QueryEntitiesEntity[]
QueryEntitiesEntity

A text string that appears within the entity text field.

Name Description
text string

Entity text content.

type string

The type of the specified entity.

QueryEvidence

Description of evidence location supporting Knoweldge Graph query result.

Name Description
document_id string

The docuemnt ID (as indexed in Discovery) of the evidence location.

field string

The field of the document where the supporting evidence was identified.

start_offset integer

The start location of the evidence in the identified field. This value is inclusive.

end_offset integer

The end location of the evidence in the identified field. This value is inclusive.

entities QueryEvidenceEntity[]

An array of entity objects that show evidence of the result.

QueryEvidenceEntity

Entity description and location within evidence field.

Name Description
type string

The entity type for this entity. Possible types vary based on model used.

text string

The original text of this entity as found in the evidence field.

start_offset integer

The start location of the entity text in the identified field. This value is inclusive.

end_offset integer

The end location of the entity text in the identified field. This value is exclusive.

Response Codes

Status Description
200

Relations query executed successfully.

400

Bad request.

Training data

Add training data to your collection to improve the relevance of query results. Note: When working with training data, use the training array returned by the list collection details method to check the status of training, including the number of training data examples, whether the service has enough valid training data to provide improved relevance of returned results, and other relevance-training details. See Improving the relevance of your query results with the API for procedural information on using relevance training.

List training data

Lists the training data for the specified collection.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/training_data
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example.

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data?version=2018-03-05"
        

Response

TrainingDataSet
Name Description
environment_id string
collection_id string
queries TrainingQuery[]
TrainingQuery
Name Description
query_id string
natural_language_query string
filter string
examples TrainingExample[]
TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
200

Training data for this collection found and returned.

Add query to training data

Adds a query to the training data for this collection. The query can contain a filter and natural language query.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/training_data
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body NewTrainingQuery

The body of the training data query that is to be added to the collection's training data.

NewTrainingQuery
Name Description
natural_language_query string
filter string
examples TrainingExample[]
TrainingExample
Name Description
document_id string
cross_reference string
relevance integer
Request example

curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d
'{
  "natural_language_query": "who is keyser soze",
  "filter": "text:criminology",
  "examples": [
    {
      "document_id": "adaf50f1-2526-4fad-b670-7d6e8a42e6e6",
      "relevance": 2
    },
    {
      "document_id": "63919442-7d5b-4cae-ab7e-56f58b1390fe",
      "cross_reference": "my_id_field:14",
      "relevance": 4
    }
  ]
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data?version=2018-03-05"
        

Response

TrainingQuery
Name Description
query_id string
natural_language_query string
filter string
examples TrainingExample[]
TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
200

The query was succesfully added.

400

Bad request.

Delete all training data

Deletes all training data from a collection.

Request

DELETE /v1/environments/{environment_id}/collections/{collection_id}/training_data
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data?version=2018-03-05"
        

Response

No response body.

Response Codes

Status Description
204

All training data removed.

400

Bad request.

Get details about a query

Gets details for a specific training data query, including the query string and all examples.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example

curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}?version=2018-03-05"
        

Response

TrainingQuery
Name Description
query_id string
natural_language_query string
filter string
examples TrainingExample[]
TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
200

Details for this training query found and returned.

404

The query does not exist.

Delete a training data query

Removes the training data query and all associated examples from the training data set.

Request

DELETE /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}?version=2018-03-05"
        

Response

No response body.

Response Codes

Status Description
204

Query and all example document references successfully removed from the training set for this collection.

400

Bad request.

List examples for a training data query

List all examples for this training data query.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example not currently available.

Response

TrainingExampleList
Name Description
examples TrainingExample[]
TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
200

A list of all training examples added for this query.

404

Query not found.

Add example to training data query

Adds a example to this training data query.

Request

POST /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body TrainingExample

The body of the example that is to be added to the specified query.

TrainingExample
Name Description
document_id string
cross_reference string
relevance integer
Request example

curl -X POST -u "{username}":"{password}" -H "Content-Type: application/json" -d
'{
  "document_id": "{document_id}",
  "cross_reference": "{cross_reference}",
  "relevance": 0
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples?version=2018-03-05"
        

Response

TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
201

The example was successfully added to the query.

400

Bad request.

Delete example for training data query

Deletes the example document with the given ID from the training data query.

Request

DELETE /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples/{example_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

example_id path string

The ID of the document as it is indexed.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples/{example_id}?version=2018-03-05"
        

Response

No response body.

Response Codes

Status Description
204

The example document reference was removed from the query.

400

Bad request.

Change label or cross reference for example

Changes the label or cross reference query for this training data example.

Request

PUT /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples/{example_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

example_id path string

The ID of the document as it is indexed.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

body body TrainingExamplePatch

The body of the example that is to be added to the specified query.

TrainingExamplePatch
Name Description
cross_reference string
relevance integer
Request example

curl -X PUT -u "{username}":"{password}" -d
'{
  "document_id": "string",
  "cross_reference": "{new_cross_reference}",
  "relevance": 3
}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples/{example_id}?version=2018-03-05"
        

Response

TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
200

The label or cross reference query were successfully applied.

400

Bad request.

Get details for training data example

Gets the details for this training example.

Request

GET /v1/environments/{environment_id}/collections/{collection_id}/training_data/{query_id}/examples/{example_id}
Parameter Description
environment_id path string

The ID of the environment.

collection_id path string

The ID of the collection.

query_id path string

The ID of the query used for training.

example_id path string

The ID of the document as it is indexed.

version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

Request example not currently available.

Response

TrainingExample
Name Description
document_id string
cross_reference string
relevance integer

Response Codes

Status Description
200

Details for this example successfully found and returned.

404

The query or the example does not exist.

User data

Delete data that has been uploaded and labeled with a customer_id.

Delete labeled data

Deletes all data associated with a specified customer ID. The method has no effect if no data is associated with the customer ID.

You associate a customer ID with data by passing the X-Watson-Metadata header with a request that passes data. For more information about personal data and customer IDs, see Information security.

Request

DELETE /v1/user_data
Parameter Description
customer_id query string

The customer ID for which all data is to be deleted.

Example request

curl -X DELETE -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/user_data?customer_id=customer&version=2018-03-05"
        

Response

No response body.

Response Codes

Status Description
200

OK. The delete request was successfully submitted.

400

Bad Request. The request did not pass a customer ID:

  • No customer ID found in the request

Events and feedback

Query and event logs store information about natural language queries that have been made to a private collection and events associated with those queries. Each event is created using the events endpoint and must be associated with an existing natural language query. The query and event logs can be viewed using the logs endpoint, and statistics on those logs are returned by the various metrics endpoints.

Create event

The Events API can be used to create log entries that are associated with specific queries. For example, you can record which documents in the results set were "clicked" by a user and when that click occured.

Request

POST /v1/events
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

query_event body CreateEventObject

An object that defines a query event to be added to the log.

CreateEventObject

An object defining the event being created.

Name Description
type string

The event type to be created.

Allowable values:
  • click
data EventData

Data object used to create a query event.

EventData

Query event data object.

Name Description
environment_id string

The environment_id associated with the query that the event is associated with.

session_token string

The session token that was returned as part of the query results that this event is associated with.

client_timestamp DateTime

The optional timestamp for the event that was created. If not provided, the time that the event was created in the log was used.

display_rank integer

The rank of the result item which the event is associated with.

collection_id string

The collection_id of the document that this event is associated with.

document_id string

The document_id of the document that this event is associated with.

query_id string

The query identifier stored in the log. The query and any events associated with that query are stored with the same query_id.

Example request

curl -u "{username}":"{password}" -X POST -H "Content-Type: application/json" -d '{"type": "click", "data": { "environment_id": "e6061c99-950a-4dad-aee0-411f7143690a", "session_token":  "1_1dlwjrntwodlw111344", "client_timestamp": "2018-01-29T14:58:39.470Z", "display_rank": 1, "collection_id": "a83aaa222aaa222aac460", "document_id": "584857e87807ff4709f3749eb99a05d3" } }' "https://gateway.watsonplatform.net/discovery/api/v1/events?version=2018-05-23"
        

Response

CreateEventResponse

An object defining the event being created.

Name Description
type string

The event type that was created.

Possible values:
  • click
data EventData

Query event data object.

EventData

Query event data object.

Name Description
environment_id string

The environment_id associated with the query that the event is associated with.

session_token string

The session token that was returned as part of the query results that this event is associated with.

client_timestamp DateTime

The optional timestamp for the event that was created. If not provided, the time that the event was created in the log was used.

display_rank integer

The rank of the result item which the event is associated with.

collection_id string

The collection_id of the document that this event is associated with.

document_id string

The document_id of the document that this event is associated with.

query_id string

The query identifier stored in the log. The query and any events associated with that query are stored with the same query_id.

Response Codes

Status Description
201

The event object was successfully accepted.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

401

Request contains invalid content and cannot be added. The error message contains details about what caused the request to be rejected.

Search the query and event log

Searches the query and event log to find query sessions that match the specified criteria. Searching the logs endpoint uses the standard Discovery query syntax for the parameters that are supported.

Request

GET /v1/logs
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

filter query string

A cacheable query that limits the documents returned to exclude any documents that don't mention the query content. Filter searches are better for metadata type searches and when you are trying to get a sense of concepts in the data set.

query query string

A query search returns all documents in your data set with full enrichments and full text, but with the most relevant documents listed first. Use a query search when you want to find the most relevant search results. You cannot use natural_language_query and query at the same time.

count query integer

Number of results to return.

10

offset query integer

The number of query results to skip at the beginning. For example, if the total number of results that are returned is 10, and the offset is 8, it returns the last two results.

sort query string[]

A comma separated list of fields in the document to sort on. You can optionally specify a sort direction by prefixing the field with - for descending or + for ascending. Ascending is the default sort direction if no prefix is specified.

Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/logs?version=2018-03-05&query=test&count=5"
        

Response

LogQueryResponse

Object containing results that match the requested logs query.

Name Description
matching_results integer

Number of matching results.

results LogQueryResponseResult[]
LogQueryResponseResult

Individual result object for a logs query. Each object represents either a query to a Discovery collection or an event that is associated with a query.

Name Description
environment_id string

The environment ID that is associated with this log entry.

customer_id string

The customer_id label that was specified in the header of the query or event API call that corresponds to this log entry.

document_type string

The type of log entry returned.

query indicates that the log represents the results of a call to the single collection query method.

event indicates that the log represents a call to the events API.

Possible values:
  • query
  • event
natural_language_query string

The value of the natural_language_query query parameter that was used to create these results. Only returned with logs of type query.

Note: Other query parameters (such as filter or deduplicate) might have been used with this query, but are not recorded.

document_results LogQueryResponseResultDocuments

Object containing result information that was returned by the query used to create this log entry. Only returned with logs of type query.

created_timestamp DateTime

Date that the log result was created. Returned in YYYY-MM-DDThh:mm:ssZ format.

client_timestamp DateTime

Date specified by the user when recording an event. Returned in YYYY-MM-DDThh:mm:ssZ format. Only returned with logs of type event.

query_id string

Identifier that corresponds to the natural_language_query string used in the original or associated query. All event and query log entries that have the same original natural_language_query string also have them same query_id. This field can be used to recall all event and query log results that have the same original query (event logs do not contain the original natural_language_query field).

session_token string

Unique identifier (within a 24-hour period) that identifies a single query log and any event logs that were created for it.

Note: If the exact same query is run at the exact same time on different days, the session_token for those queries might be identical. However, the created_timestamp differs.

Note: Session tokens are case sensitive. To avoid matching on session tokens that are identical except for case, use the exact match operator (::) when you query for a specific session token.

collection_id string

The collection ID of the document associated with this event. Only returned with logs of type event.

display_rank integer

The original display rank of the document associated with this event. Only returned with logs of type event.

document_id string

The document ID of the document associated with this event. Only returned with logs of type event.

event_type string

The type of event that this object respresents. Possible values are

  • query the log of a query to a collection

  • click the result of a call to the events endpoint.

Possible values:
  • click
  • query
result_type string

The type of result that this event is associated with. Only returned with logs of type event.

Possible values:
  • document
LogQueryResponseResultDocuments

Object containing result information that was returned by the query used to create this log entry. Only returned with logs of type query.

Name Description
results LogQueryResponseResultDocumentsResult[]
count integer

The number of results returned in the query associate with this log.

LogQueryResponseResultDocumentsResult

Each object in the results array corresponds to an individual document returned by the original query.

Name Description
position integer

The result rank of this document. A position of 1 indicates that it was the first returned result.

document_id string

The document_id of the document that this result represents.

score double

The raw score of this result. A higher score indicates a greater match to the query parameters.

confidence double

The confidence score of the result's analysis. A higher score indicating greater confidence.

collection_id string

The collection_id of the document represented by this result.

Response Codes

Status Description
200

Log query executed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Number of queries over time

Total number of queries using the natural_language_query parameter over a specific time window.

Request

GET /v1/metrics/number_of_queries
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

start_time query DateTime

Metric is computed from data recorded after this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

end_time query DateTime

Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

result_type query string

The type of result to consider when calculating the metric.

Allowable values:
  • document
Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/metrics/number_of_queries?version=2018-03-05"
        

Response

MetricResponse

The response generated from a call to a metrics method.

Name Description
aggregations MetricAggregation[]
MetricAggregation

An aggregation analyzing log information for queries and events.

Name Description
interval string

The measurement interval for this metric. Metric intervals are always 1 day (1d).

event_type string

The event type associated with this metric result. This field, when present, will always be click.

results MetricAggregationResult[]
MetricAggregationResult

Aggregation result data for the requested metric.

Name Description
key_as_string DateTime

Date in string form representing the start of this interval.

key long

Unix epoch time equivalent of the key_as_string, that represents the start of this interval.

matching_results integer

Number of matching results.

event_rate double

The number of queries with associated events divided by the total number of queries for the interval. Only returned with event_rate metrics.

Response Codes

Status Description
200

Metric calculation executed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Number of queries with an event over time

Total number of queries using the natural_language_query parameter that have a corresponding "click" event over a specified time window. This metric requires having integrated event tracking in your application using the Events API.

Request

GET /v1/metrics/number_of_queries_with_event
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

start_time query DateTime

Metric is computed from data recorded after this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

end_time query DateTime

Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

result_type query string

The type of result to consider when calculating the metric.

Allowable values:
  • document
Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/metrics/number_of_queries_with_event?version=2018-03-05"
        

Response

MetricResponse

The response generated from a call to a metrics method.

Name Description
aggregations MetricAggregation[]
MetricAggregation

An aggregation analyzing log information for queries and events.

Name Description
interval string

The measurement interval for this metric. Metric intervals are always 1 day (1d).

event_type string

The event type associated with this metric result. This field, when present, will always be click.

results MetricAggregationResult[]
MetricAggregationResult

Aggregation result data for the requested metric.

Name Description
key_as_string DateTime

Date in string form representing the start of this interval.

key long

Unix epoch time equivalent of the key_as_string, that represents the start of this interval.

matching_results integer

Number of matching results.

event_rate double

The number of queries with associated events divided by the total number of queries for the interval. Only returned with event_rate metrics.

Response Codes

Status Description
200

Metric calculation executed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Number of queries with no search results over time

Total number of queries using the natural_language_query parameter that have no results returned over a specified time window.

Request

GET /v1/metrics/number_of_queries_with_no_search_results
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

start_time query DateTime

Metric is computed from data recorded after this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

end_time query DateTime

Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

result_type query string

The type of result to consider when calculating the metric.

Allowable values:
  • document
Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/metrics/number_of_queries_with_no_search_results?version=2018-03-05"
        

Response

MetricResponse

The response generated from a call to a metrics method.

Name Description
aggregations MetricAggregation[]
MetricAggregation

An aggregation analyzing log information for queries and events.

Name Description
interval string

The measurement interval for this metric. Metric intervals are always 1 day (1d).

event_type string

The event type associated with this metric result. This field, when present, will always be click.

results MetricAggregationResult[]
MetricAggregationResult

Aggregation result data for the requested metric.

Name Description
key_as_string DateTime

Date in string form representing the start of this interval.

key long

Unix epoch time equivalent of the key_as_string, that represents the start of this interval.

matching_results integer

Number of matching results.

event_rate double

The number of queries with associated events divided by the total number of queries for the interval. Only returned with event_rate metrics.

Response Codes

Status Description
200

Metric calculation executed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Percentage of queries with an associated event

The percentage of queries using the natural_language_query parameter that have a corresponding "click" event over a specified time window. This metric requires having integrated event tracking in your application using the Events API.

Request

GET /v1/metrics/event_rate
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

start_time query DateTime

Metric is computed from data recorded after this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

end_time query DateTime

Metric is computed from data recorded before this timestamp; must be in YYYY-MM-DDThh:mm:ssZ format.

result_type query string

The type of result to consider when calculating the metric.

Allowable values:
  • document
Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/metrics/event_rate?version=2018-03-05"
        

Response

MetricResponse

The response generated from a call to a metrics method.

Name Description
aggregations MetricAggregation[]
MetricAggregation

An aggregation analyzing log information for queries and events.

Name Description
interval string

The measurement interval for this metric. Metric intervals are always 1 day (1d).

event_type string

The event type associated with this metric result. This field, when present, will always be click.

results MetricAggregationResult[]
MetricAggregationResult

Aggregation result data for the requested metric.

Name Description
key_as_string DateTime

Date in string form representing the start of this interval.

key long

Unix epoch time equivalent of the key_as_string, that represents the start of this interval.

matching_results integer

Number of matching results.

event_rate double

The number of queries with associated events divided by the total number of queries for the interval. Only returned with event_rate metrics.

Response Codes

Status Description
200

Metric calculation executed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Most frequent query tokens with an event

The most frequent query tokens parsed from the natural_language_query parameter and their corresponding "click" event rate within the recording period (queries and events are stored for 30 days). A query token is an individual word or unigram within the query string.

Request

GET /v1/metrics/top_query_tokens_with_event_rate
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

count query integer

Number of results to return.

10

Example request

    curl -u "{username}":"{password}" "https://gateway.watsonplatform.net/discovery/api/v1/metrics/top_query_tokens_with_event_rate?version=2018-03-05"
        

Response

MetricTokenResponse

The response generated from a call to a metrics method that evaluates tokens.

Name Description
aggregations MetricTokenAggregation[]
MetricTokenAggregation

An aggregation analyzing log information for queries and events.

Name Description
event_type string

The event type associated with this metric result. This field, when present, will always be click.

results MetricTokenAggregationResult[]
MetricTokenAggregationResult

Aggregation result data for the requested metric.

Name Description
key string

The content of the natural_language_query parameter used in the query that this result represents.

matching_results integer

Number of matching results.

event_rate double

The number of queries with associated events divided by the total number of queries currently stored (queries and events are stored in the log for 30 days).

Response Codes

Status Description
200

Metric calculation executed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Credentials

Credentials are used to connect to supported remote sources to retrieve documents for adding to a Discovery collection.

List credentials

List all the source credentials that have been created for this service instance.

Note: All credentials are sent over an encrypted connection and encrypted at rest.

Request

GET /v1/environments/{environment_id}/credentials
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

environment_id path string

The ID of the environment.

Example request

    curl -u "{username}":"{password}" -X GET "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/credentials?version=2018-03-05"
        

Response

CredentialsList
Name Description
credentials Credentials[]

An array of credential definitions that were created for this instance.

Credentials

Object containing credential information.

Name Description
credential_id string

Unique identifier for this set of credentials.

source_type string

The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.
  • salesforce indicates the credentials are used to connect to Salesforce.
  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_details CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

Name Description
credential_type string

The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible:

  • \"source_type\": \"box\" - valid credential_types: oauth2
  • \"source_type\": \"salesforce\" - valid credential_types: username_password
  • \"source_type\": \"sharepoint\" - valid credential_types: saml.
Possible values:
  • oauth2
  • saml
  • username_password
client_id string

The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2.

enterprise_id string

The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box.

url string

The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password.

username string

The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password.

organization_url string

The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml.

site_collection.path string

The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint.

client_secret string

The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

public_key_id string

The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

private_key string

The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

passphrase string

The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

password string

The password of the source that these credentials connect to. Only valid, and required, with credential_types of saml and username_password.

Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.

Example response


{
  "credentials" : [ {
    "credential_id" : "00000d8c-0000-00e8-ba89-0ed5f89f718b",
    "source_type" : "salesforce",
    "credential_details" : {
      "credential_type" : "username_password",
      "url" : "login.salesforce.com",
      "username" : "user@email.address"
    }
  }, {
    "credential_id" : "00000d8c-0000-00e8-ba89-0ed5f89f111c",
    "source_type" : "box",
    "credential_details" : {
      "credential_type" : "oauth2",
      "client_id" : "1234567899bz7micz6x6p5zfnycw98e3",
      "enterprise_id" : "000000001"
    }
  }, {
    "credential_id" : "00000d8c-0000-00e8-ba22-0ed5f89f999d",
    "source_type" : "sharepoint",
    "credential_details" : {
      "credential_type" : "saml",
      "organization_url" : "https://site001.sharepointonline.com",
      "site_collection_path" : "/sites/TestSite1",
      "username" : "userA@sharepointonline.com"
    }
  } ]
}
        

Response Codes

Status Description
200

The request to list all credentials completed successfully.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

404

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Create credentials

Creates a set of credentials to connect to a remote source. Created credentials are used in a configuration to associate a collection with the remote source.

Note: All credentials are sent over an encrypted connection and encrypted at rest.

Request

POST /v1/environments/{environment_id}/credentials
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

environment_id path string

The ID of the environment.

credentials_parameter body Credentials

An object that defines an individual set of source credentials.

Credentials

Object containing credential information.

Name Description
credential_id string

Unique identifier for this set of credentials.

source_type string

The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.
  • salesforce indicates the credentials are used to connect to Salesforce.
  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online.
Allowable values:
  • box
  • salesforce
  • sharepoint
credential_details CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

Name Description
credential_type string

The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible:

  • \"source_type\": \"box\" - valid credential_types: oauth2
  • \"source_type\": \"salesforce\" - valid credential_types: username_password
  • \"source_type\": \"sharepoint\" - valid credential_types: saml.
Allowable values:
  • oauth2
  • saml
  • username_password
client_id string

The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2.

enterprise_id string

The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box.

url string

The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password.

username string

The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password.

organization_url string

The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml.

site_collection.path string

The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint.

client_secret string

The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

public_key_id string

The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

private_key string

The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

passphrase string

The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

password string

The password of the source that these credentials connect to. Only valid, and required, with credential_types of saml and username_password.

Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.

Example request

curl -u "{username}":"{password}" -X POST -H "Content-Type: application/json" -d '{  "source_type": "salesforce", "credential_details": { "credential_type": "username_password", "url": "login.salesforce.com", "username": "email@server.xyz", "password": "{my_salesforce_password}{my_salesforce_security_token}"}}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/credentials?version=2018-03-05"
        

Response

Credentials

Object containing credential information.

Name Description
credential_id string

Unique identifier for this set of credentials.

source_type string

The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.
  • salesforce indicates the credentials are used to connect to Salesforce.
  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_details CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

Name Description
credential_type string

The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible:

  • \"source_type\": \"box\" - valid credential_types: oauth2
  • \"source_type\": \"salesforce\" - valid credential_types: username_password
  • \"source_type\": \"sharepoint\" - valid credential_types: saml.
Possible values:
  • oauth2
  • saml
  • username_password
client_id string

The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2.

enterprise_id string

The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box.

url string

The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password.

username string

The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password.

organization_url string

The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml.

site_collection.path string

The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint.

client_secret string

The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

public_key_id string

The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

private_key string

The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

passphrase string

The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

password string

The password of the source that these credentials connect to. Only valid, and required, with credential_types of saml and username_password.

Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.

Example response


{
  "credential_id" : "00000d8c-0000-00e8-ba89-0ed5f89f718b",
  "source_type" : "salesforce",
  "credential_details" : {
    "credential_type" : "username_password",
    "url" : "login.salesforce.com",
    "username" : "user@email.address"
  }
}
        

Response Codes

Status Description
200

Credentials successfully created.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

404

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

View Credentials

Returns details about the specified credentials.

Note: Secure credential information such as a password or SSH key is never returned and must be obtained from the source system.

Request

GET /v1/environments/{environment_id}/credentials/{credential_id}
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

environment_id path string

The ID of the environment.

credential_id path string

The unique identifier for a set of source credentials.

Example request

curl -u "{username}":"{password}" -X GET "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/credentials/{credential_id}?version=2018-03-05"
        

Response

Credentials

Object containing credential information.

Name Description
credential_id string

Unique identifier for this set of credentials.

source_type string

The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.
  • salesforce indicates the credentials are used to connect to Salesforce.
  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_details CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

Name Description
credential_type string

The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible:

  • \"source_type\": \"box\" - valid credential_types: oauth2
  • \"source_type\": \"salesforce\" - valid credential_types: username_password
  • \"source_type\": \"sharepoint\" - valid credential_types: saml.
Possible values:
  • oauth2
  • saml
  • username_password
client_id string

The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2.

enterprise_id string

The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box.

url string

The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password.

username string

The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password.

organization_url string

The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml.

site_collection.path string

The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint.

client_secret string

The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

public_key_id string

The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

private_key string

The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

passphrase string

The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

password string

The password of the source that these credentials connect to. Only valid, and required, with credential_types of saml and username_password.

Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.

Example response


{
  "credential_id" : "00000d8c-0000-00e8-ba89-0ed5f89f718b",
  "source_type" : "salesforce",
  "credential_details" : {
    "credential_type" : "username_password",
    "url" : "login.salesforce.com",
    "username" : "user@email.address"
  }
}
        

Response Codes

Status Description
200

The requested credentials object was successfully returned.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

404

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

Update credentials

Updates an existing set of source credentials.

Note: All credentials are sent over an encrypted connection and encrypted at rest.

Request

PUT /v1/environments/{environment_id}/credentials/{credential_id}
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

environment_id path string

The ID of the environment.

credential_id path string

The unique identifier for a set of source credentials.

credentials_parameter body Credentials

An object that defines an individual set of source credentials.

Credentials

Object containing credential information.

Name Description
credential_id string

Unique identifier for this set of credentials.

source_type string

The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.
  • salesforce indicates the credentials are used to connect to Salesforce.
  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online.
Allowable values:
  • box
  • salesforce
  • sharepoint
credential_details CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

Name Description
credential_type string

The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible:

  • \"source_type\": \"box\" - valid credential_types: oauth2
  • \"source_type\": \"salesforce\" - valid credential_types: username_password
  • \"source_type\": \"sharepoint\" - valid credential_types: saml.
Allowable values:
  • oauth2
  • saml
  • username_password
client_id string

The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2.

enterprise_id string

The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box.

url string

The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password.

username string

The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password.

organization_url string

The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml.

site_collection.path string

The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint.

client_secret string

The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

public_key_id string

The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

private_key string

The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

passphrase string

The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

password string

The password of the source that these credentials connect to. Only valid, and required, with credential_types of saml and username_password.

Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.

Example request

curl -u "{username}":"{password}" -X PUT -H "Content-Type: application/json" -d '{  "source_type": "salesforce", "credential_details": { "credential_type": "username_password", "url": "login.salesforce.com", "username": "email@server.xyz", "password": "my_salesforce_passwordmy_salesforce_security_token"}}' "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/credentials/{credential_id}?version=2018-03-05"
        

Response

Credentials

Object containing credential information.

Name Description
credential_id string

Unique identifier for this set of credentials.

source_type string

The source that this credentials object connects to.

  • box indicates the credentials are used to connect an instance of Enterprise Box.
  • salesforce indicates the credentials are used to connect to Salesforce.
  • sharepoint indicates the credentials are used to connect to Microsoft SharePoint Online.
Possible values:
  • box
  • salesforce
  • sharepoint
credential_details CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

CredentialDetails

Object containing details of the stored credentials.

Obtain credentials for your source from the administrator of the source.

Name Description
credential_type string

The authentication method for this credentials definition. The credential_type specified must be supported by the source_type. The following combinations are possible:

  • \"source_type\": \"box\" - valid credential_types: oauth2
  • \"source_type\": \"salesforce\" - valid credential_types: username_password
  • \"source_type\": \"sharepoint\" - valid credential_types: saml.
Possible values:
  • oauth2
  • saml
  • username_password
client_id string

The client_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2.

enterprise_id string

The enterprise_id of the Box site that these credentials connect to. Only valid, and required, with a source_type of box.

url string

The url of the source that these credentials connect to. Only valid, and required, with a credential_type of username_password.

username string

The username of the source that these credentials connect to. Only valid, and required, with a credential_type of saml and username_password.

organization_url string

The organization_url of the source that these credentials connect to. Only valid, and required, with a credential_type of saml.

site_collection.path string

The site_collection.path of the source that these credentials connect to. Only valid, and required, with a source_type of sharepoint.

client_secret string

The client_secret of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

public_key_id string

The public_key_id of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

private_key string

The private_key of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

passphrase string

The passphrase of the source that these credentials connect to. Only valid, and required, with a credential_type of oauth2. This value is never returned and is only used when creating or modifying credentials.

password string

The password of the source that these credentials connect to. Only valid, and required, with credential_types of saml and username_password.

Note: When used with a source_type of salesforce, the password consists of the Salesforce password and a valid Salesforce security token concatenated. This value is never returned and is only used when creating or modifying credentials.

Example response


{
  "credential_id" : "00000d8c-0000-00e8-ba89-0ed5f89f718b",
  "source_type" : "salesforce",
  "credential_details" : {
    "credential_type" : "username_password",
    "url" : "login.salesforce.com",
    "username" : "user@email.address"
  }
}
        

Response Codes

Status Description
200

Credentials successfully updated.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

401

Authentication to source failed. The error message contains details about what caused the request to be rejected.

404

Not found. The error message contains details about what caused the request to be rejected.

Delete credentials

Deletes a set of stored credentials from your Discovery instance.

Request

DELETE /v1/environments/{environment_id}/credentials/{credential_id}
Parameter Description
version query date

A date (YYYY-MM-DD) that identifies the specific version of the API to use when processing the request.

environment_id path string

The ID of the environment.

credential_id path string

The unique identifier for a set of source credentials.

Example request

curl -u "{username}":"{password}" -X DELETE "https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/credentials/{credential_id}?version=2018-03-05"
        

Response

DeleteCredentials

Object returned after credentials are deleted.

Name Description
credential_id string

The unique identifier of the credentials that have been deleted.

status string

The status of the deletion request.

Possible values:
  • deleted

Response Codes

Status Description
200

Credentials successfully deleted.

400

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.

404

Bad request if the request is incorrectly formatted. The error message contains details about what caused the request to be rejected.