Search API (Open Data for Industries)
Search, index, organize, and present the indexed documents on the storage layer.
The Search API provides a mechanism for searching indexed documents that contain structured data. The Search API can index any number of documents.
Documents and indexes are saved in a separate persistent store that is optimized for search operations.
- full text search on String fields
- range queries on date, numeric, or string fields
- geospatial search.
To use the Search API, see the Search API reference.
Accessing the Search API
- Required roles
- The Search API requires dedicated roles for the users. Users must have one of the following
roles:
- viewers (users.datalake.viewers)
- editors (users.datalake.editors)
- admins (users.datalake.admins)
Attention: To access the data, a user must also be a member of some data groups. - Required headers
-
The Open Data for Industries data ecosystem stores data in partitions, depending on the accounts that exist in the Open Data for Industries system. In addition, a user might belong to more than one account.
To use the Search APIs, you must specify the active account in the header attribute named data-partition-id.
The data-partition-id attribute enables the search within the mapped partition. In the following example, the data partition is ODI.data-partition-id: ODIThecommonpartition contains all public data in the data ecosystem and is accessible to all users.data-partition-id: common - Optional headers
-
The correlation ID (correlation-Id) is used to track the journey of a single request.
The correlation-Id can be a GUID on the header with a key.
Use the correlation-Id to track the request through all services.
correlation-Id: 1e0fef08-22fd-49b1-a5cc-dffa21bc0b70
Search query operation
The Search API provides a JSON domain-specific language that you can use to run queries.
The following parameters in the Search query API can be used to create a valid query string:
- kind parameter
-
kind is a required parameter that has the following format:
account-id:data-source-id:type:schema-version
- query parameter
- The data ecosystem provides comprehensive query options in Lucene query syntax.
- Example fields in the query parameter
-
The best way to learn the query language it is to start with a few basic examples.Note: The kind parameter is required and is omitted for brevity in following examples.Note: All storage record properties are in a
datablock. Any reference to a field inside the block is prefixed withdata.
- query operations
- The query parameter supports the following operations to do a robust search
on query strings:
- Grouping.
Multiple terms or clauses can be grouped with parentheses to form sub queries.
{ "query": "data.Rig_Contractor:(Ocean OR Drilling) AND Exploration NOT Basin" } - Reserved characters.
The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
If you need to use any of the characters that function as operators in your query itself, you can escape them with a leading backslash. For example, to search for (1+1)=2, your query is \(1\+1\)\=2. Failing to escape these special characters correctly might lead to a syntax error that prevents your query from running.Note:< and >can’t be escaped at all. The only way to prevent it from attempting to create a range query is to remove it from the query string. - Wildcard.
Wildcard searches can be run on individual terms by using a question mark
?to replace a single character, and an asterisk*to replace zero or more characters.Therefore, wildcard queries can use an enormous amount of memory and can affect the performance. Use them sparingly.Note: Leading wildcards are unavailable by the data ecosystem search. It is not recommended to allow a wildcard at the beginning of a word because all terms in the index need to be examined in case they match.{ "query": "data.Rig_Contractor:Oc?an Dr*" } - Date format
The valid format for date is "date with optional time".
date-element ['T' [time-element] [offset]]Table 1. Date element format Element Format date-elementyyyy ['-' MM ['-' dd]] time-elementHH [minute-element] | [fraction] minute-element':' mm [second-element] | [fraction] second-element':' ss [fraction] fraction('.' | ',') digit+ offset'Z' | (('+' | '-') HH [':' mm [':' ss [('.' | ',') SSS]]]) - Range queries.
Ranges can be specified for date, numeric, or string fields. Inclusive ranges are specified with brackets [min TO max]. Exclusive ranges are specified with curly brackets {min TO max}.
- All
SpudDatein 2012:{ "query": "data.SpudDate:[2012-01-01 TO 2012-12-31]" } - Count 1 - 5:
{ "query": "data.Count:[1 TO 5]" } - Count from 10 upwards:
{ "query": "data.Count:[10 TO *]" } - Ranges with one side unbounded can use the following
syntax:
{ "query": "data.ProjDepth:>10" } - To combine an upper and lower bound with the simplified syntax, join two clauses with an
AND operator:
{ "query": "data.ProjDepth:(>=10 AND <20)" }
- All
- Grouping.
- sort parameter
- The sort feature supports the
int,float,double,long, anddatetimedata types. It does not supportarrayobject,nestedobject, orstringfield. Records that contain unsupported types do not appear in the response. Records that do not have the sorted fields or have empty values are listed last in the result.
- spatialFilter parameter
- The data ecosystem supports geopoint and geospatial data, which supports "latitude and longitude" pairs.
Cross-kind search operations
account-id:data-source-id:type:schema-version
Each of the text partitioned by a colon (:) can be replaced with wildcard characters to support
cross-kind search.- Search across the data source, all types, and versions for the common data
partition:
{ "kind": "common:*:*:*" } - Search across the data source, the type well with schema version
1.0.0
:
{ "kind": "common:*:well:1.0.0" } - Search across all types and versions for the welldb namespace in the
common data
partition:
{ "kind": "common:welldb:*:*" }
To use the API, see the Search cross cluster API reference.
Cross-partition queries
Use the cross-partition queries to search records from multiple partitions.
To run cross-partition searches, provide a comma-separated list of partitions in the data-partition-id header.
Cross-partition searches deal with larger data sets and entitlements from multiple partitions. As a result, single-partition searches have better performance than cross-partition searches.
private and
common partitions.Search query with cursor
A search request returns a single page of results. You can use the Query with
cursor endpoint to retrieve larger (or all) results from a single search request, as you
use a cursor on a traditional database.
The Query with cursor endpoint is not intended for real-time requests, but
rather for processing large amounts of data.
For more information, see the Search query with cursor API reference.
The parameters that are passed in the request body are the same as in the Query
endpoint, except for the offset and cursor values. Offset is not a valid parameter in the
Query with cursor.
The results from a request reflect the state of the index at the time when the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update, or delete) affect later search requests only.
Query with cursor endpoint to retrieve the next batch of results. As the
next batches of results are retrieved by the Query with cursor endpoint, the cursor
value might not change. Do not expect a different cursor value in each response.Search API endpoint permissions
| Endpoint URL | Method | Minimum permissions required | Data permissions required |
|---|---|---|---|
| /api/search/v2/query | POST | users.datalake.viewers | Yes |
| /api/search/v2/ccs/query | POST | users.datalake.viewers | Yes |
| /api/search/v2/query_with_cursor | POST | users.datalake.viewers | Yes |