Reading data from a data source

You can read Cognos Analytics data in a notebook using the Python or R programming languages.

You can read the following types of Cognos Analytics data sources in a notebook:
  • Uploaded CSV or XLS files
  • Data sets
  • Data modules
  • Framework Manager packages, including OLAP data
Note: Cognos Analytics does not support reading from data that requires user input. For example, a query subject in Framework Manager with a parameterized filter.
To quickly insert the read_data() method in a notebook cell, do the following steps:
  1. Create or edit a notebook and position your cursor in the cell after which you want to do the read.
  2. Click Sources sources icon.
  3. Navigate to a data source, select it, and then click Open. The method is inserted in the cell with the data source to read specified.
Alternatively, you can type the code in a cell:
In Python
data = CADataConnector.read_data(parameters as described in the following sections)
One of the following items is returned from the method:
DataFrame
Returned by default.
List of DataFrame
Returned when the sheet_name parameter is specified and multiple sheets are requested.
Iterator of DataFrame
Returned when the chunksize or iterator parameter is specified.

Parameters common to all data sources

For examples of how to code the read_data() method, see Python notebook examples.

The following parameters can be specified for all supported data source types:
Parameter Required or optional Description

path

id

Specify one of path or id
Path to the data source. If the data source is in My content, specify .my_folders at the start of the path. If the data source is in Team content, specify .public_folders at the start of the path. For example, to specify a file called sales-notebook that is stored in My content, specify
path=".my_folders/sales-notebook"
The ID of the file. For information about how to get the ID of a file, see Finding the ID of a file. For example:
id="i1F8D76C0FAD34J9CA50118746935D9X7"
usecols optional
List of integers or strings that identifies a subset of columns to be returned. You can specify one or more columns by the position number of the column in the file or by column name. For example, to request columns City and Quantity, specify
In Python
usecols=["City","Quantity"]
If not specified, then all columns are returned.
Note: This parameter is not applicable for OLAP data in a Framework Manager package.
chunksize optional
An integer. If specified, the method returns an iterator. The value specifies the number of rows to return each time the method is invoked. For example, to return 3 rows, specify
chunksize=3

If both chunksize and iterator are not specified, then a DataFrame is returned.

iterator optional
If specified, the method returns an iterator. Specifying
iterator=True
without chunksize specified returns one row each time the method is invoked.

Specifying iterator=True with chunksize specified, returns chunksize rows each time the method is invoked.

Reset the iterator using the reset method:
In Python
iterator.reset()
method.
nrows optional

The maximum number of rows returned. Specify an integer.

If omitted, the maximum number of rows returned is 10,000. If
nrows=0
is specified, then the column headings are returned.

XLS file parameters

Uploaded files are treated as sheets of data. Reading one of these data sources returns the rows for the specified sheet of data in the data source.

The following parameter applies only to XLS files:
sheet_name
Optional. Integer, string, or a list of integers or strings. Specifies the integer position of a specific sheet or tab in an XLS file. The first sheet is 0. If you specify a list of strings or integers, then a list of the corresponding sheets is returned. If not specified, the value defaults to 0. Example:
In Python
sheet_name=["sheet2","sheet3"]

Data module parameters

Reading data from a data module must reflect the modeling done on the data source. For example, a join between two tables and aggregation types to apply.

The following parameters apply only to data modules:
Parameter Required or optional Description
table_name Optional

String or list of strings. Restrict the scope of the request to specific table(s) in the data module. Specify the table(s) by table name(s).

If you omit table_name, the read_data() method returns a DataFrame that contains the names of the tables defined in the data module. Example:
In Python
table_name=["table1","table2"]
calculation Applies to version 11.1.5 and subsequent versions unless specifically overridden. Optional String or list of strings. Read a calculation from a data module. Specify a calculated column in the data module.

Package parameters

Reading data from a package must reflect the modeling done on the data source. For example, the relationship between objects in the package.

The following parameters apply only to packages:
Parameter Required or optional Description
query_subject Optional String or list of strings. Restrict the scope of the request to the specific query subjects(s) in the package. Specify the query subject(s) by query subject name(s). Example:
query_subject="query1"
folder_name Optional String or list of strings. Restrict the scope of the request to a folder in the package. Displays the contents of all folder(s) with that name in the package. If you want to see only the content of folder c that's in folder b that's in folder a specify the folder_name parameter as a list:
In Python
folder_name=["a","b","c"]
This returns only the folder c in this path, even if there are other folders with the name c in the package.
metadata Optional. Applies only to OLAP data in a Framework Manager package. Value can be True or False.
Specifying
metadata=True
returns the query subjects in the package. Specifying
metadata=False
returns the data in the package. If not specified, then a default value of False is used.
Note: You can use either single or double quotation marks in a method but not a mix of both.