Queries on uploaded files and data sets

Queries on uploaded files and data sets are processed by the Query service and the Compute service. This type of co-processing increases performance of queries.

The Compute service processes the queries entirely or partially, and returns the result to the query service. Potentially, the whole query can be processed by the Compute service, and the query service might only need to perform additional, local processing of the result.

Tip: The Compute service and Query service reside on the same computer, and by default communicate with each other by using an ephemeral port requested from the operating system.

Upgrading data to the new Parquet format

The Parquet format that is used to store uploaded files and data sets has changed between Cognos® Analytics versions 11.0.x and 11.1. Run the ParquetUpgrade command before users start running dashboards and reports. This ensures that all workloads immediately benefit from the Compute service performance gains. If a query uses data that wasn’t converted, the query service internally initiates the conversion and the users experience a one-time performance degradation when they run the dashboards, stories, reports, or explorations in Cognos Analytics 11.1. Subsequent queries that are run by the compute service use the converted data.

For more information, see Upgrading uploaded files and data sets to the new Parquet format .

Best practices for improving query performance on uploaded files and data sets

Use the following best practices when working with queries based on uploaded files and data sets:
  • Save frequently calculated expressions as columns.

    This practice reduces the amount of expression evaluation at run time. Projecting, comparing, and sorting simple column references and simple values (literals) is more efficient than evaluating expressions.

  • Avoid storing large numbers of columns that are never used by queries.

    While data is both compressed and encoded to reduce the amount of storage, it’s still recommend to avoid storing redundant or unnecessary columns.

  • Sort the input on the column that is most frequently used in filters.

    For large uploaded files and data sets, sorting the input can enhance the evaluation of predicates. Sorting the data on the common column that is used in a filter, for example Country or Store, groups rows with the same value. If a query includes predicates on that column, the query can determine more efficiently which blocks of data it can ignore as it navigates the data. Use the sort option when creating a data set, and sort the input prior to uploading a file.

Data types to store data from uploaded files and data sets

The data in uploaded files and data sets is stored in the following data types:
  • All integer types (small, integer, and bigint) are stored as bigint.
  • All approximate numeric types (real, float, and double) are stored as double.
  • All precise numeric values are stored as decimal to the maximum precision of 38.
  • All character types (char, nchar, varchar, nvarchar, clob, nlclob) are stored as national varchar with no maximum precision.
  • All temporal types (date, timestamp, time, timestamp/time with time zone) are stored as timestamp.
  • Interval types are stored in a format understood to be an interval. In previous releases, the value was stored as a string. Report server renders interval values.

If a source value is a decimal data type with a precision > 38, the query service attempts to store the value as a decimal type with a precision of 38. If a value is too large, the query service returns an error indicating the source column, value, and logical row number in the input data.

Trailing spaces are removed from any character values.

Timestamps and times with time zones are normalized to a value based on the coordinated universal time (UTC).