/db2whrest/v1/report: POST

Creates a curation report definition and runs it immediately.

The following table shows which roles can access this REST API endpoint:
Table 1. Access by role
Data admin Data user Collection Admin Admin Service user
Χ Χ Χ Χ

Synopsis of the request URL

curl -k -H 'Authorization: Bearer <token>' https://<spectrum_discover_host>/db2whrest/v1/report -X POST -d@report.json -H "Content-Type: application/json"
This endpoint has the same parameters as the /db2whrest/v1/search endpoint and also has a name parameter:
query
Specifies a search query string.
filters
Specifies an array of dictionaries that filter the results of the query. Each dictionary must contain the following three fields:
key
The name of the field or column to be returned.
operator
One of the following operators: =, >, <, <>, <=, >=, is, like.
value
The value of the field or column to filter on.
group_by
Specifies a list of fields to be used to summarize search results. Grouped queries return output columns for count and sum, whereas nongrouped queries return all columns for each row. If the group_by field is not specified, the search returns record-level information.
sort_by
Specifies an array of dictionary objects. Each dictionary object must specify a field or column name to be sorted on and a sort direction. Valid sort directions are asc and desc.
limit
Specifies the maximum number of rows that can be returned by the response.
name
Specifies a name for the report output file. If this parameter is not specified, the endpoint uses the randomly generated UUID as the file name base.
The following example illustrates how to specify these fields:
{
    "name": "Unassigned Project Report",
    "query": "platform='Spectrum Scale'",
    "filters": [
      {
        "key": "project",
        "operator": "is",
        "value": "null"
      }
    ],
    "group_by": ["Filesystem","Owner","Site"],
    "sort_by": [{"Filesystem": "asc"},{"Owner": "asc"}],
    "limit": 100000
}

Supported request types and response formats

Supported request types:
  • POST
Supported response formats:
  • JSON

Status codes

201
The operation is successful.
All other status code values
The operation failed.
  • 201: The operation is successful.
  • All other status code values: The operation failed.

Examples

  1. The following example shows how to create a report:
    1. Step 1: Define the search parameters in a file named report.json:
      {
          "name": "Unassigned Project Report",
          "query": "platform='Spectrum Scale'",
          "filters": [
            {
              "key": "project",
              "operator": "is",
              "value": "null"
            }
          ],
          "group_by": ["Filesystem","Owner","Site"],
          "sort_by": [{"Filesystem": "asc"},{"Owner": "asc"}],
          "limit": 100000
      }
      
    2. Step 2: Submit the following request:
      curl -k -H 'Authorization: Bearer <token>' https://<spectrum_discover_host>/db2whrest/v1/report -X POST -d@report.json -H "Content-Type: application/json"
  2. The response includes the ID of the new report and the status of the operation ("report created"):
    {"report": "b5ff3126-353d-4d7d-857a-750cc20b8bab", "status": "report created"}

Report scripts

IBM Spectrum® Discover contains a set of predefined JSON and scripts to generate specific reports.
  1. JSON Examples
    • The following example lists the files or objects that are accessed in last 0 - 30 days. This report is created in the UI.

      age_report_0-30_days_since_access_detail.json
      {
          "name": "Age Report Detail. 0-30 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 30 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • This example summarizes the files or objects that are accessed in last 0 - 30 days and are grouped by data source. This report is created in the UI.

      age_report_0-30_days_since_access_summary.json
      {
          "name": "Age Report Summary. 0-30 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 30 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
    • This example lists the files or objects that are accessed in last 30 - 60 days. This report is created in the UI.

      age_report_30-60_days_since_access_detail.json
      {
          "name": "Age Report Detail. 30-60 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 30 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 60 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • This example summarizes the files or objects that are accessed in last 30 - 60 days and are grouped by data source. This report is created in the UI.

      age_report_30-60_days_since_access_summary.json
      {
          "name": "Age Report Summary. 30-60 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 30 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 60 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
    • This example lists the files or objects that are accessed in last 60 - 90 days. This report is created in the UI.

      age_report_60-90_days_since_access_detail.json
      {
          "name": "Age Report Detail. 60-90 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 60 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 90 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • The following example summarizes the files or objects that are accessed in last 60 - 90 days and are grouped by data source. This report is created in the UI.

      age_report_60-90_days_since_access_summary.json
      {
          "name": "Age Report Summary. 60-90 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 60 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 90 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
    • The following example lists the files or objects that are accessed in last 90 - 180 days. This report is created in the UI.

      age_report_90-180_days_since_access_detail.json
      {
          "name": "Age Report Detail. 90-180 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 90 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 180 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • The following example summarizes the files or objects that are accessed in last 90 - 180 days and are grouped by data source. This report is created in the UI.

      age_report_90-180_days_since_access_summary.json
      {
          "name": "Age Report Summary. 90-180 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 90 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 180 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
    • The following example lists the files or objects that are accessed in last 180 - 360 days. This report is created in the UI.

      age_report_180-360_days_since_access_detail.json
      {
          "name": "Age Report Detail. 180-360 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 180 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 360 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • The following example summarizes the files or objects that are accessed in last 180 - 360 days and are grouped by data source. This report is created in the UI.

      age_report_180-360_days_since_access_summary.json
      {
          "name": "Age Report Summary. 180-360 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 180 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 360 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
    • The following example lists the files or objects that are accessed in last 360 - 720 days. This report is created in the UI.

      age_report_360-720_days_since_access_detail.json
      {
          "name": "Age Report Detail. 360-720 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 360 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 720 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • The following example summarizes the files or objects that are accessed in last 360 - 720 days and are grouped by data source. This report is created in the UI.

      age_report_360-720_days_since_access_summary.json
      {
          "name": "Age Report Summary. 360-720 Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 360 DAYS"
            },
            {
              "key": "atime",
              "operator": ">",
              "value": "NOW() - 720 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
    • The following example lists the files or objects accessed that are not accessed in the last 720 days. This report is created in the UI.

      age_report_720+_days_since_access_detail.json
      {
          "name": "Age Report Detail. 720+ Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 720 DAYS"
            }
          ],
          "group_by": [],
          "sort_by": []
      }
    • The following example summarizes the files or objects that are accessed in last 720 days and are grouped by data source. This report is created in the UI.

      age_report_720+_days_since_access_summary.json
      {
          "name": "Age Report Summary. 720+ Days",
          "query": "",
          "filters": [
            {
              "key": "atime",
              "operator": "<=",
              "value": "NOW() - 720 DAYS"
            }
          ],
          "group_by": ["datasource"],
          "sort_by": [{"datasource": "asc"}]
      }
  2. SQL scripts
    • This script provides the count of potentially duplicate files across the heterogeneous environment. This report is not created in the UI.

      duplicate_files_by_count.sql
      select filename, size, count(fkey) from metaocean group by filename,
          size order by count(fkey) desc limit 20;
    • This script provides the size of potentially duplicate files across the heterogeneous environment. This report is not created in the UI.

      duplicate_files_by_total_size.sql
      select filename,entrysize,entrycount,totalsize from (select filename, 
          size as entrysize, count(fkey) as entrycount, count(fkey)*size as TotalSize 
          from metaocean group by filename,size) where entrycount>1 order by totalsize desc;
    • This script provides the size of potentially duplicate files across the heterogeneous environment. This report is not created in the UI.

      size_snap.sql
      select datasource,count(*),sum(size)/1024/1024/1024,max(mtime),max(atime) 
          from metaocean group by datasource with ur
    • This script provides a view of the capacity that is consumed per collection. This report is not created in the UI.

      space_per_collection.sql
      select metaocean.collection,count(*),sum(size)/1024/1024/1024,max(mtime),datasource,
          tier from metaocean group by metaocean.collection,datasource,tier order by max(mtime)
          desc with ur
    • This script provides a view of the capacity that is consumed per file type. This report is not created in the UI.

      space_per_filetype.sql
      select filetype,datasource,tier,count(*),sum(size)/1024/1024/1024 from metaocean 
          group by filetype,datasource,tier order by filetype,datasource,tier desc with ur
    • This script provides a view of the capacity that is consumed per user. This report is not created in the UI.

      space_per_user.sql
      select owner,tier,count(*),sum(size)/1024/1024/1024,max(mtime),max(atime) from metaocean 
          group by owner,tier order by owner,tier with ur
Reports can also be generated by using one of two CLI utilities that are provided with IBM Spectrum Discover. Both exist on the IBM Spectrum Discover nodes:
  • /opt/ibm/metaocean/reports/generate_report.py
    To run the utilities, log in to an IBM Spectrum Discover node and run generate_report.py with the following usage:
    python generate_report.py [-h] [-o filename] -u username [infile]
    • Run a report with an SQL input file:
      python generate_report.py -u sdadmin -o report.csv sql/space_per_user.sql
      Enter password for SD user 'sdadmin':
    • Run a report with a JSON input file:
      python generate_report.py -u sdadmin  sql/age_report_0-30_days_since_access_detail.json
      Enter password for SD user 'sdadmin':

    This requires a IBM Spectrum Discover user name and input file. You must have the Data Admin role to generate reports. The tool prompts for a password

    Input file examples are stored in the /opt/ibm/metaocean/reports/sql directory. There is a mix of JSON and SQL files in the directory. All JSON files create a report in the IBM Spectrum Discover UI. The SQL files create a CSV file.

  • /opt/ibm/metaocean/reports/generate_path_age_report.py
    Important: This tool does not create reports in the IBM Spectrum Discover UI.
    Here is the usage of the tool:
    generate_path_age_report.py [-h] -u username -r report -p pathlevel -a archive
    
    optional arguments:
      -h, --help            show this help message and exit
      -u username, --user username
                            User name with authority to create reports
      -r report, --report report
                            The report type to be generated (ARDS, ARDSOW, AROW, ARPL, 
                            CPPL, FTMB, CPFT, FTMB50)
      -p pathlevel, --pathlevel pathlevel
                            The level of path used in some reports
      -a archive, --archive archive
                            The archive threshold in months
    
    The generate_path_age_report.py tool needs a minimum of three parameters:
    • pathlevel
      The level of path used in some reports, for example:
      1 = /x/, 2=/x/y/
      If your report does not use this parameter a value of 1 must be used.
    • archive

      The archive threshold is the number of months since the last time a file is considered relevant for archiving for reporting purposes. If your report does not use this parameter a value of 1 must be used.

    • report
      The report type is one of the following codes:
      ARDS   = Summary of archivable capacity grouped by datasource
      ARDSOW = Summary of archivable capacity grouped by datasource
      AROW   = Summary of archivable capacity grouped by owner
      ARPL   = Summary of archivable capacity grouped by specified path level
      CPPL   = Summary of capacity grouped by specified path level
      FTMB   = Summary of filetype usage by month, previous 12 months
      CPFT   = Summary of capacity grouped by filetype
      FTMB50 = Summary of filetype usage by month, previous 12 months' top 50 filetypes
Here is an example of running the generate_path_age_report.py tool:
python generate_path_age_report.py -u sdadmin -r ARDSOW -p 2 -a 12
Starting to create report type 'ARDSOW' for user: sdadmin
Enter password for SD user 'sdadmin':
Setting up path level summary table
Generating path level summary table
Generating path level summary table complete.
Generating report
Generating report - building temporary table.
Generating report - querying temporary table.
Generating report - writing output file.
Report generated successfully. Results are in 'rpt_ar_ds_ow.csv'.