/db2whrest/v1/search: POST

Searches a database for data.

The following table shows which roles can access this REST API endpoint:
Table 1. Access by role
Data admin Data user Collection Admin Admin Service user
1 1 Χ Χ
1The search is restricted to documents that are tagged with collections to which the user ID has a datauser role assigned.
The search endpoint has the following format:
/db2whrest/v1/search -H 'Authorization: Bearer <token>' -X POST -d@<data.json> 
-H "Content-Type: application/json"
The endpoint accepts requests in JSON format and returns a response in a self-describing data set. It also returns the query time and number of elements in the result set. If an endpoint operation takes more than 10 seconds to complete it is converted to an asynchronous operation. For more information on asynchronous endpoint operations, see Asynchronous endpoints.
You can modify a search endpoint with the following fields. An example follows this list:
query
Specifies a search query string.
filters
Specifies an array of dictionaries that filter the results of the query. Each dictionary must contain the following three fields:
key
The name of the field or column to be returned.
operator
One of the following operators: =, >, <, <>, <=, >=, in, like.
value
The value of the field or column to filter on.
group_by
Specifies a list of fields to be used to summarize search results. If the group_by field is not specified, the search returns record-level information.
Note: The combination "group_by": ["filename"] causes the query to be applied to the duplicate file table. All other group_by combinations cause the query to be applied to the summary tables.
sort_by
Specifies an array of dictionary objects. Each dictionary object must specify a field or column name to be sorted on and a sort direction. Valid sort directions are asc and desc.
limit
Specifies the maximum number of rows that can be returned by the response.
The following example illustrates how to specify these fields:
{
    "query": "platform='Spectrum Scale'",
    "filters": [
      {
        "key": "project",
        "operator": "is",
        "value": "null"
      }
    ],
    "group_by": ["Filesystem","Owner","Site"],
    "sort_by": [{"Filesystem": "asc"},{"Owner": "asc"}],
    "limit": 100000
}

Synopsis of the request URL

curl -k -H 'Authorization: Bearer <token>' https://<spectrum_discover_host>/db2whrest/v1/search
-X POST -d@search.json -H "Content-Type: application/json"

Supported request types and response formats

Supported request types:
  • POST
Supported response formats:
  • JSON

Examples

  1. The following example shows how to define search parameters and format the data that is returned:
    1. Step 1: Define the search parameters in a file named search.json:
      {
          "query": "platform='Spectrum Scale'",
          "filters": [
            {
              "key": "project",
              "operator": "is",
              "value": "null"
            }
          ],
          "group_by": ["Filesystem","Owner","Site"],
          "sort_by": [{"Filesystem": "asc"},{"Owner": "asc"}],
          "limit": 100000
      }
      
    2. Step 2: Submit the request:
      curl -k -H 'Authorization: Bearer <token>' https://<spectrum_discover_host>/db2whrest/v1/
      search -X POST -d@search.json -H "Content-Type: application/json"
      
    The following response is returned:
    {
    "facet_tree": {
    "OWNER": "[{\"owner\":\"nobody\",\"count\":8},{\"owner\":\"benjamin\",\"count\":2},
    {\"owner\":\"_NULL_\",\"count\":989},{\"owner\":\"borgato\",\"count\":2},
    {\"owner\":\"boston\", \"count\":4554},{\"owner\":\"root\",\"count\":366104932},
    {\"owner\":\"beale\",\"count\":203545},
    {\"owner\":\"baldwin\",\"count\":2},{\"owner\":\"behr\",\"count\":785144},
    {\"owner\":\"babcock\", \"count\":9082943},{\"owner\":\"broadwood\",\"count\":375}]",
    "PLATFORM": "[{\"platform\":\"Spectrum Scale\",\"count\":376182496}]",
    "FILESYSTEM": "[{\"filesystem\":\"fs11-1m-me1\",\"count\":185493076},
    {\"filesystem\":\"filesys1\",\"count\":10077057},{\"filesystem\":\"fs10-1m-me1\",
    \"count\":180612363}]",
    "CLASSIFICATION": "[{\"classification\":null,\"count\":376182496}]",
    "DEPARTMENT": "[{\"department\":null,\"count\":376182496}]",
    "CLUSTER": "[{\"cluster\":\"host.ibm.com\",\"count\":366105439},{\"cluster\":
    \"filesys1.university.edu\",\"count\":10077057}]",
    "TIER": "[{\"tier\":\"system\",\"count\":376182496}]",
    "ARCHIVE": "[{\"archive\":null,\"count\":376182496}]",
    "SITE": "[{\"site\":\"host\",\"count\":366105439},{\"site\":\"\",\"count\":10077057}]",
    "PROJECT": "[{\"project\":null,\"count\":376182496}]"},
    "query_time_secs": 1.174846,
    "rows": "[{\"filesystem\":\"fs10-1m-me1\",\"owner\":\"_NULL_\",\"site\":\"host\",
    \"count\":468,
    \"sum\":52933001066},{\"filesystem\":\"fs10-1m-me1\",\"owner\":\"nobody\",\"site\":
    \"host\", \"count\":4,\"sum\":20976},{\"filesystem\":\"fs10-1m-me1\",\"owner\":\"root\",
    \"site\":\"host\",\"count\":180611891,\"sum\":2002705351263},{\"filesystem\":
    \"fs11-1m-me1\",\"owner\":\"_NULL_\", \"site\":\"host\",\"count\":521,\"sum\":58473947666},
    {\"filesystem\":\"fs11-1m-me1\",\"owner\":\"nobody\",\"site\":\"host\",\"count\":4,\"sum\":20976},{\"filesystem\":\"fs11-1m-me1\",
    \"owner\":\"root\",\"site\":\"host\",\"count\":185492551,\"sum\":2174830065250},
    {\"filesystem\":\"filesys1\",\"owner\":\"baldwin\",\"site\":\"\",\"count\":2,
    \"sum\":16388},{\"filesystem\":\"filesys1\",\"owner\":\"behr\",\"site\":\"\",
    \"count\":785144,\"sum\":5000771899189}, {\"filesystem\":\"filesys1\",
    \"owner\":\"boston\",\"site\":\"\",\"count\":4554,\"sum\":57670240959030},
    {\"filesystem\":\"filesys1\",\"owner\":\"beale\",\"site\":\"\",\"count\":203545,
    \"sum\":69505800364825},{\"filesystem\":\"filesys1\",\"owner\":\"root\",\"site\":
    \"\",\"count\":490,\"sum\":265209686947},{\"filesystem\":\"filesys1\",\"owner\":
    \"broadwood\",\"site\":\"\",\"count\":375,\"sum\":48214210142151},{\"filesystem\":
    \"filesys1\",\"owner\":\"benjamin\",\"site\":\"\",\"count\":2,\"sum\":3140759161},
    {\"filesystem\":\"filesys1\",\"owner\":\"babcock\",\"site\":\"\",\"count\":9082943,
    \"sum\":48534270142857},{\"filesystem\":\"filesys1\",\"owner\":\"borgato\",
    \"site\":\"\",\"count\":2,\"sum\":4415704787}]","count": 15,"doc_count": 376182496
    }
    
    In the following code block, the information from the rows field of the response is reflowed so that it is easier to read. The sum column is omitted. You can see that the rows are grouped by file system, owner, and site and are sorted by file system and owner:
    "rows": "[
    {\"filesystem\":\"fs10-1m-me1\", \"owner\":\"_NULL_\", \"site\":\"host\", \"count\":468,
    {\"filesystem\":\"fs10-1m-me1\", \"owner\":\"nobody\", \"site\":\"host\", \"count\":4,
    {\"filesystem\":\"fs10-1m-me1\", \"owner\":\"root\",   \"site\":\"host\", \"count\":1806,
    {\"filesystem\":\"fs11-1m-me1\", \"owner\":\"_NULL_\", \"site\":\"host\", \"count\":521,
    {\"filesystem\":\"fs11-1m-me1\", \"owner\":\"nobody\", \"site\":\"host\", \"count\":4,
    {\"filesystem\":\"fs11-1m-me1\", \"owner\":\"root\",   \"site\":\"host\", \"count\":1854,
    {\"filesystem\":\"filesys1\",    \"owner\":\"baldwin\",\"site\":\"\",     \"count\":2,
    {\"filesystem\":\"filesys1\",    \"owner\":\"behr\",   \"site\":\"\",     \"count\":785144,
    {\"filesystem\":\"filesys1\",    \"owner\":\"boston\", \"site\":\"\",     \"count\":4554,
    {\"filesystem\":\"filesys1\",    \"owner\":\"beale\",  \"site\":\"\",     \"count\":203545,
    {\"filesystem\":\"filesys1\",    \"owner\":\"root\",   \"site\":\"\",     \"count\":490,
    {\"filesystem\":\"filesys1\",    \"owner\":\"broadwood\",\"site\":\"\",   \"count\":375,
    {\"filesystem\":\"filesys1\",    \"owner\":\"benjamin\", \"site\":\",     \"count\":2,
    {\"filesystem\":\"filesys1\",    \"owner\":\"babcock\",\"site\":\"\",     \"count\":9082943,
    {\"filesystem\":\"filesys1\",    \"owner\":\"borgato\",\"site\":\"\",     \"count\":2,
    
  2. The following example shows how to search for duplicate files with a size greater than 1 MiB:
    1. Step 1: Define the search parameters in a file named search.json:
      {
          "query": "",
          "filters": [
            {
              "key": "size",
              "operator": ">",
              "value": 1048576
            }
          ],
          "group_by": ["size", "filename"],
          "sort_by": [{"size": "asc"}, {"filename": "asc"}],
          "limit": 100
      }
      
    2. Step 2: Submit the request:
      curl -k -H 'Authorization: Bearer <token>' https://<spectrum_discover_host>/db2whrest/v1/
      search -X POST -d@search.json -H "Content-Type: application/json"
    The following response is returned. Some of the rows of the response are omitted:
    {
    "facet_tree":{"OWNER": "[{\"owner\":\"behr\",\"count\":160096},{\"owner\":\"babcock\",
    \"count\":14609},{\"owner\":\"beale\",\"count\":612},{\"owner\":\"root\",\"count\":34}]", 
    "DEPARTMENT":"[{\"department\":null,\"count\":175351}]","FILESYSTEM": "[{\"filesystem\":
    \"filesys1\",\"count\":175351}]",    "PROJECT": "[{\"project\":\"TCGA_kirc\",\"count\":86},
    {\"project\":\"TCGA_ucs\",\"count\":14},{\"project\":\"TCGA_stad\",\"count\":21},
    {\"project\":\"TCGA_lusc\",\"count\":20},{\"project\":\"acc\",\"count\":2},{\"project\":
    \"TCGA_meso\",\"count\":54},{\"project\":\"TCGA_skcm\",\"count\":107},{\"project\":
    \"TCGA_ov\",\"count\":81},{\"project\":\"kich\",\"count\":10},{\"project\":\"TCGA_lgg\",
    \"count\":137},{\"project\":\"TCGA_prad\",\"count\":662},{\"project\":\"TCGA_laml\",
    \"count\":6},{\"project\":\"TCGA_thym\",\"count\":54},{\"project\":\"TCGA_thca\",\"count\":
    255},{\"project\":\"kirp\",\"count\":256},{\"project\":\"hnsc\",\"count\":22},{\"project\":
    \"TCGA_pcpg\",\"count\":2},{\"project\":\"Eichler\",\"count\":877},{\"project\":
    \"TCGA_ucec\",\"count\":238},{\"project\":\"TCGA_luad\",\"count\":1417},{\"project\":
    \"Level1\",\"count\":6},{\"project\":\"brca\",\"count\":14},{\"project\":null,\"count\":
    168855},{\"project\":\"TCGA_paad\",\"count\":196},{\"project\":\"TCGA_read\",\"count\":244}
    ,{\"project\":\"cesc\",\"count\":613},{\"project\":\"coad\",\"count\":176},{\"project\":
    \"dlbc\",\"count\":20},{\"project\":\"blca\",\"count\":471},{\"project\":\"TCGA_sarc\",
    \"count\":90},{\"project\":\"TCGA_lihc\",\"count\":206},{\"project\":\"gbm\",\"count\":120},
    {\"project\":\"esca\",\"count\":19}]","CLASSIFICATION": "[{\"classification\":null,
    \"count\":175351}]","ARCHIVE": "[{\"archive\":null,\"count\":175351}]"},"query_time_secs":
    1.377808,"rows": "[{\"size\":1048608,\"filename\":\"NA-C-ms22-cm0.mcdat\",\"count\":4106,
    \"sum\":4305584448},{\"size\":1048608,\"filename\":\"asm_g100-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g1083-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g111-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g1122-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g134-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g145-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g147-C-ms22-cm0.mcdat\",
    \"count\":3,\"sum\":3145824},{\"size\":1048608,\"filename\":\"asm_g149-C-ms22-cm0.mcdat\",
    \"count\":4,\"sum\":4194432},{\"size\":1048608,\"filename\":\"asm_g151-C-ms22-cm0.mcdat\",
    \"count\":2,\"sum\":2097216},{\"size\":1048608,\"filename\":\"asm_g153-C-ms22-cm0.mcdat\",
    \"count\":2,
    ...
    8583},{\"size\":1053910,\"filename\":\"asm_g2981.asm\",\"count\":2,\"sum\":2107820},
    {\"size\":1053914,\"filename\":\"seqDB.v006.dat\",\"count\":2,\"sum\":2107828},
    {\"size\":1054721,\"filename\":\"asm_g3384.asm\",\"count\":2,\"sum\":2109442}]","count":
    100,"doc_count":4318
    }
    
    In the following code block, the information from the rows field of the response is reflowed so that it is easier to read. You can see that the rows are grouped by size and file name and that they are sorted by size and file name in ascending order:
    "rows":
    "[
    {\"size\":1048608,\"filename\":\"NA-C-ms22-cm0.mcdat\",   \"count\":4106,\"sum\":4305584448},
    {\"size\":1048608,\"filename\":\"asm_g100-C-ms22-cm0.mcdat\",   \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g1083-C-ms22-cm0.mcdat\",  \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g111-C-ms22-cm0.mcdat\",   \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g1122-C-ms22-cm0.mcdat\",  \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g134-C-ms22-cm0.mcdat\",   \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g145-C-ms22-cm0.mcdat\",   \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g147-C-ms22-cm0.mcdat\",   \"count\":3,\"sum\":3145824},
    {\"size\":1048608,\"filename\":\"asm_g149-C-ms22-cm0.mcdat\",   \"count\":4,\"sum\":4194432},
    {\"size\":1048608,\"filename\":\"asm_g151-C-ms22-cm0.mcdat\",   \"count\":2,\"sum\":2097216},
    {\"size\":1048608,\"filename\":\"asm_g153-C-ms22-cm0.mcdat\",   \"count\":2,
    ...
    {\"size\":1053910,\"filename\":\"asm_g2981.asm\",               \"count\":2,\"sum\":2107820},
    {\"size\":1053914,\"filename\":\"seqDB.v006.dat\",              \"count\":2,\"sum\":2107828},
    {\"size\":1054721,\"filename\":\"asm_g3384.asm\",               \"count\":2,\"sum\":2109442}
    ]",
  3. The following example shows how to perform a nongrouped search (record level results) for files owned by benjamin:
    1. Step 1: Define the search parameters in a file named search.json:
      
      {
          "query": "",
          "filters": [
            {
              "key": "owner",
              "operator": "=",
              "value": "benjamin"
            }
          ],
          "group_by": [],
          "sort_by": [],
          "limit": 100
      }
      
    2. Step 2: Submit the following request:
      curl -k -H 'Authorization: Bearer <token>' https://<spectrum_discover_host>/db2whrest/v1/
      search -X POST -d@search.json -H "Content-Type: application/json"
    The following response is returned:
    * Connection #0 to host localhost left intact
    {
    "query_time_secs": 1422.651552,"rows": "[{\"filesystem\":\"filesys1\",\"revision\":\"MO1\",
    \"site\":\"\",\"platform\":\"Spectrum Scale\",\"cluster\":\"filesys1.university.edu\",
    \"inode\":257173,\"owner\":\"benjamin\",\"group\":\"iacs\",\"permissions\":\"-r--r--r--\",
    \"fileset\":\"hg19\",\"uid\":null,\"gid\":null,\"path\":\"\\/filesys1\\/hg19\\/\",
    \"filename\":\"Homo_sapiens_assembly19.fasta.fai\",\"filetype\":\"fasta\",\"migstatus\":
    \"resdnt\",\"migloc\":\"NA\",\"mtime\":\"2014-08-08T19:37:13.000Z\",\"atime\":
    \"2017-08-27T22:21:11.000Z\",\"ctime\":\"2016-02-22T19:54:30.000Z\",\"inserttime\":
    \"2018-08-02T15:47:25.000Z\",\"tier\":\"system\",\"size\":2780,\"qpart\":1,\"fkey\":
    \"filesys1.university.edufilesys1257173\",\"project\":null,\"department\":null,
    \"archive\":null,\"classification\":null,\"tag5\":null,\"tag6\":null,\"tag7\":null,
    \"tag8\":null,\"tag9\":null,\"tag10\":null,\"tag11\":null,\"tag12\":null,\"tag13\":null,
    \"tag14\":null,\"tag15\":null,\"tag16\":null},{\"filesystem\":\"filesys1\",\"revision\":
    \"MO1\",\"site\":\"\",\"platform\":\"Spectrum Scale\",\"cluster\":
    \"filesys1.university.edu\",\"inode\":322176,\"owner\":\"benjamin\",\"group\":\"iacs\",
    \"permissions\":\"-r--r--r--\",\"fileset\":\"hg19\",\"uid\":null,\"gid\":null,\"path\":
    \"\\/filesys1\\/hg19\\/\",\"filename\":\"Homo_sapiens_assembly19.fasta\",\"filetype\":
    \"fasta\",\"migstatus\":\"resdnt\",\"migloc\":\"NA\",\"mtime\":
    \"2014-08-08T19:37:13.000Z\",\"atime\":\"2017-08-27T22:21:15.000Z\",\"ctime\":
    \"2016-02-22T19:44:09.000Z\",\"inserttime\":\"2018-08-02T15:47:25.000Z\",\"tier\":
    \"system\",\"size\":3140756381,\"qpart\":4,
    \"fkey\":\"filesys1.university.edufilesys1322176\",\"project\":null,\"department\":null,
    \"archive\":null,\"classification\":null,\"tag5\":null,\"tag6\":null,\"tag7\":null,\"tag8\":null,
    \"tag9\":null,\"tag10\":null,\"tag11\":null,\"tag12\":null,\"tag13\":null,\"tag14\":null,
    \"tag15\":null,\"tag16\":null}]","doc_count": 2,"count": 2,"facet_tree": {"OWNER":
     "[{\"owner\":\"benjamin\",\"count\":2.0}]","FILESYSTEM": "[{\"filesystem\":\"filesys1\",
    \"count\":2.0}]","ARCHIVE": "[{\"archive\":null,\"count\":2.0}]","CLUSTER": 
    "[{\"cluster\":\"filesys1.university.edu\",\"count\":2.0}]","SITE": "[{\"site\":\"\",
    \"count\":2.0}]","CLASSIFICATION": "[{\"classification\":null,\"count\":2.0}]",
    "DEPARTMENT": "[{\"department\":null,\"count\":2.0}]","PLATFORM": "[{\"platform\":
    \"Spectrum Scale\",\"count\":2.0}]","TIER": "[{\"tier\":\"system\",\"count\":2.0}]",
    "PROJECT": "[{\"project\":null,\"count\":2.0}]"}
    }
    
    In the following code block, the information from the rows field of the response is reflowed for better viewing. Many columns are omitted. You can see that the response returns two rows in which the owner field is benjamin:
    "rows": "[
    {\"filesystem\":\"filesys1\",\"revision\":\"MO1\",\"site\":\"\", ...\"owner\":\"benjamin\", ...
    {\"filesystem\":\"filesys1\",\"revision\":\"MO1\",\"site\":\"\", ...\"owner\":\"benjamin\", ...}
    ]",