Search API (Open Data for Industries)

Search, index, organize, and present the indexed documents on the storage layer.

The Search API provides a mechanism for searching indexed documents that contain structured data. The Search API can index any number of documents.

Documents and indexes are saved in a separate persistent store that is optimized for search operations.

You can use the API to do:

full text search on String fields
range queries on date, numeric, or string fields
geospatial search.

To use the Search API, see the Search API reference.

Learn more

Accessing the Search API
Search query operation
Cross-kind search operations
Cross-partition search operations
Search query with cursor
Search API endpoint permissions

Accessing the Search API

The request to Search API requires the validation of roles, headers, and optional headers.

Required roles

The Search API requires dedicated roles for the users. Users must have one of the following roles:

viewers (users.datalake.viewers)
editors (users.datalake.editors)
admins (users.datalake.admins)

These roles can be assigned by using the Entitlements Service API.

Attention: To access the data, a user must also be a member of some data groups.

Required headers

The Open Data for Industries data ecosystem stores data in partitions, depending on the accounts that exist in the Open Data for Industries system. In addition, a user might belong to more than one account.

To use the Search APIs, you must specify the active account in the header attribute named data-partition-id.

The data-partition-id attribute enables the search within the mapped partition. In the following example, the data partition is ODI.

data-partition-id: ODI

The common partition contains all public data in the data ecosystem and is accessible to all users.

data-partition-id: common

Optional headers

The correlation ID (correlation-Id) is used to track the journey of a single request.

The correlation-Id can be a GUID on the header with a key.

Use the correlation-Id to track the request through all services.

correlation-Id: 1e0fef08-22fd-49b1-a5cc-dffa21bc0b70

Note: If the service is initiating the request, an ID is generated. If the correlation-Id is not provided, a new ID is generated by the service so that the request is traceable.

Search query operation

The Search API provides a JSON domain-specific language that you can use to run queries.

To use the API, see the Search query API reference.

Note: If you use the Storage API to ingest records, it can take at least 30 seconds for records to become searchable.

Note: Offset + Limit cannot be more than 10,000. See the Query with cursor section for more efficient ways to do deep scrolling.

The following parameters in the Search query API can be used to create a valid query string:

kind parameter

kind is a required parameter that has the following format:

account-id:data-source-id:type:schema-version

To get the list of available values for the kind parameter, use the Storage API.

Users can search documents by providing the kind parameter.

By default, the query return 10 documents for the specified kind parameter.

Wildcard queries on the kind parameter are also supported. For more information, see Cross-kind queries and Cross-partition queries.

The data ecosystem indexer splits the kind parameter and indexes each part individually. For example, common:welldb:wellbore:1.0.0 is indexed as namespace=common:welldb, type=well, and version=1.0.0. You can query the data ecosystem to search based on one of these attributes.

query parameter

The data ecosystem provides comprehensive query options in Lucene query syntax.

The query string is parsed into a series of terms and operators. A term can be a single word, such as producing or well, or a phrase surrounded by double quotation marks, such as producing well. The API searches for all the words in the phrase in the same order.

The default operator for query is OR.

A field in the document can be searched by using <field-name>:<value>.

If field is not defined, it defaults to all queryable fields. The query automatically attempts to determine the existing fields in the index’s mapping that are queryable and search on those fields.

Example fields in the query parameter

The best way to learn the query language it is to start with a few basic examples.

Note: The kind parameter is required and is omitted for brevity in following examples.

Note: All storage record properties are in a data block. Any reference to a field inside the block is prefixed with data.

Search all fields that contain the text well
```
{
  "query": "well"
}
```
Note: If no field name is specified, the query attempts to determine the existing fields in the index mapping that are queryable and runs the search on those fields.
Performance of the search query improves if the field name is specified in the query. Search the Basin field that contains the text Permian.
```
{
  "query": "data.Basin:Permian"
}
```

Search the Rig_Contractor field that contains the text Ocean or Drilling.

{
  "query": "data.Rig_Contractor:(Ocean OR Drilling)"
}

Or

{
  "query": "data.Rig_Contractor:(Ocean Drilling)"
}

Search the Rig_Contractor field that contains the exact phrase Ocean Drilling.
```
{
  "query": "data.Rig_Contractor:\"Ocean Drilling\""
}
```
Search if any of these fields ValueList.OriginalValue, ValueList.Value, or ValueList.AppDataType contains the text PRODUCING or DUAINE. To escape the asterisk (*), use a backslash.
```
{
  "query": "data.ValueList.\\*:(PRODUCING DUAINE)"
}
```
Search if the field Status has any nonnull value. Use the _exists_ prefix for a field to see whether the field exists.
```
{
  "query": "_exists_:data.Status"
}
```

query operations

The query parameter supports the following operations to do a robust search on query strings:

Grouping.
Multiple terms or clauses can be grouped with parentheses to form sub queries.
```
{
  "query": "data.Rig_Contractor:(Ocean OR Drilling) AND Exploration NOT Basin"
}
```
Reserved characters.
The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
If you need to use any of the characters that function as operators in your query itself, you can escape them with a leading backslash. For example, to search for (1+1)=2, your query is \(1\+1\)\=2. Failing to escape these special characters correctly might lead to a syntax error that prevents your query from running.
Note: < and > can’t be escaped at all. The only way to prevent it from attempting to create a range query is to remove it from the query string.
Wildcard.
Wildcard searches can be run on individual terms by using a question mark ? to replace a single character, and an asterisk * to replace zero or more characters.

Therefore, wildcard queries can use an enormous amount of memory and can affect the performance. Use them sparingly.
Note: Leading wildcards are unavailable by the data ecosystem search. It is not recommended to allow a wildcard at the beginning of a word because all terms in the index need to be examined in case they match.
```
{
  "query": "data.Rig_Contractor:Oc?an Dr*"
}
```

Date format

The valid format for date is "date with optional time".

date-element ['T' [time-element] [offset]]

Table 1. Date element format
Element	Format
`date-element`	yyyy ['-' MM ['-' dd]]
`time-element`	HH [minute-element] \| [fraction]
`minute-element`	':' mm [second-element] \| [fraction]
`second-element`	':' ss [fraction]
`fraction`	('.' \| ',') digit+
`offset`	'Z' \| (('+' \| '-') HH [':' mm [':' ss [('.' \| ',') SSS]]])

Range queries.
Ranges can be specified for date, numeric, or string fields. Inclusive ranges are specified with brackets [min TO max]. Exclusive ranges are specified with curly brackets {min TO max}.
- All SpudDate in 2012:
```
{
  "query": "data.SpudDate:[2012-01-01 TO 2012-12-31]"
}
```
- Count 1 - 5:
```
{
  "query": "data.Count:[1 TO 5]"
}
```
- Count from 10 upwards:
```
{
  "query": "data.Count:[10 TO *]"
}
```
- Ranges with one side unbounded can use the following syntax:
```
{
  "query": "data.ProjDepth:>10" 
}
```
- To combine an upper and lower bound with the simplified syntax, join two clauses with an AND operator:
```
{
  "query": "data.ProjDepth:(>=10 AND <20)"
}
```

sort parameter

The sort feature supports the int, float, double, long, and datetime data types. It does not support array object, nested object, or string field. Records that contain unsupported types do not appear in the response. Records that do not have the sorted fields or have empty values are listed last in the result.

In the following conditions, the queries that use the sort parameter return the following results.

{
  "kind": "common:welldb:*:*",
  "sort": {
    "field": ["data.Id"],
    "order": ["ASC"]
  }
}

List all kinds that match the request: common:welldb:wellbore:1.0.0 and common:welldb:well:1.0.0.
data.Id in common:welldb:wellbore:1.0.0 is ingested as INTEGER, but the data.Id field in common:welldb:well:1.0.0 is ingested as TEXT.
common:welldb:wellbore:1.0.0 has 10 records in total and 5 of them have empty values in the data.Id field.
common:welldb:well:1.0.0 also has 10 records in total and all of them have values in the data.Id field.

As a result, the request payload asks the Search API to sort the data.Id field in an ascending order, and the expected response had totalCount: 10 instead of 20. The 10 returned records are only from common:welldb:wellbore:1.0.0 because the data.Id field in common:welldb:well:1.0.0 is of data type string, which is not supported. Therefore, the response lists the five records that have empty data.Id fields at last.

Note: The Search API does not validate the provided sort parameter, whether it exists or is of the supported data types. Different kinds might have attributes with the same names but are different data types. Therefore, it is the user's responsibility to be aware and validate the workflow.

Queries that use the sort parameter can be time-consuming especially if the value of the kind parameter is too broad. For example,

"kind":
":::"

). The current timeout threshold is 60 seconds and a 504 error (Request timed out after waiting for 1m) is returned if the request times out. To avoid this issue, make the kind parameter as narrow as possible when you use the sort parameter.

spatialFilter parameter: The data ecosystem supports geopoint and geospatial data, which supports "latitude and longitude" pairs.; The spatialFilter and query group in the request have AND relationship. If both of the criteria are defined in the query, the Search API returns results that match both clauses.; The query elements are geo distance, geo polygon, and bounding box. Only one of the spatial criteria can be used to define a filter.; Geo distance is a query to filter documents that include only hits that exist within a specific distance from a geographical point. It is represented in the by Distance API parameter. For more information, see Search query API reference.; Bounding box is a query that filters hits based on a point location within a bounding box. It is represented in the byBoundingBox API. For more information, see Search query API reference.; Geo polygon is query that filters hits that only fall within a polygon of points. It is represented in the byGeoPloygon API parameter. For more information, see Search query API reference.

Cross-kind search operations

The Search API supports cross-kind queries. A typical kind parameter has the following format:

account-id:data-source-id:type:schema-version

Each of the text partitioned by a colon (:) can be replaced with wildcard characters to support cross-kind search.

Search across the data source, all types, and versions for the common data partition:
```
{
  "kind": "common:*:*:*"
}
```
Search across the data source, the type well with schema version 1.0.0 :
```
{
  "kind": "common:*:well:1.0.0"
}
```
Search across all types and versions for the welldb namespace in the common data partition:
```
{
  "kind": "common:welldb:*:*"
}
```

To use the API, see the Search cross cluster API reference.

Cross-partition queries

Use the cross-partition queries to search records from multiple partitions.

To run cross-partition searches, provide a comma-separated list of partitions in the data-partition-id header.

Cross-partition searches deal with larger data sets and entitlements from multiple partitions. As a result, single-partition searches have better performance than cross-partition searches.

Note: Cross-partition queries are only supported for private and common partitions.

Search query with cursor

A search request returns a single page of results. You can use the Query with cursor endpoint to retrieve larger (or all) results from a single search request, as you use a cursor on a traditional database.

The Query with cursor endpoint is not intended for real-time requests, but rather for processing large amounts of data.

For more information, see the Search query with cursor API reference.

The parameters that are passed in the request body are the same as in the Query endpoint, except for the offset and cursor values. Offset is not a valid parameter in the Query with cursor.

The results from a request reflect the state of the index at the time when the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update, or delete) affect later search requests only.

The successful response from the request includes a cursor that you can specify in your next call to the Query with cursor endpoint to retrieve the next batch of results. As the next batches of results are retrieved by the Query with cursor endpoint, the cursor value might not change. Do not expect a different cursor value in each response.

Note: To process the next query_with_cursor request, the Search API keeps the search context alive for 1 minute, which is the time that is required to process the next batch of results. Each cursor request sets a new expiry time. The cursor expires after 1 minute and does not return any more results if the requests are not made in that time.

Search API endpoint permissions

Table 2. API method permissions
Endpoint URL	Method	Minimum permissions required	Data permissions required
/api/search/v2/query	POST	users.datalake.viewers	Yes
/api/search/v2/ccs/query	POST	users.datalake.viewers	Yes
/api/search/v2/query_with_cursor	POST	users.datalake.viewers	Yes