Reading data from a data source

You can read Cognos Analytics data in a notebook using the Python or R programming languages.

You can read the following types of Cognos Analytics data sources in a notebook:

Uploaded CSV or XLS files
Data sets
Data modules
Framework Manager packages, including OLAP data

Note: Cognos Analytics does not support reading from data that requires user input. For example, a query subject in Framework Manager with a parameterized filter.

To quickly insert the read_data() method in a notebook cell, do the following steps:

Create or edit a notebook and position your cursor in the cell after which you want to do the read.
Click Sources .
Navigate to a data source, select it, and then click Open. The method is inserted in the cell with the data source to read specified.

Alternatively, you can type the code in a cell:

In Python

data = CADataConnector.read_data(parameters as described in the following sections)

One of the following items is returned from the method:

DataFrame: Returned by default.
List of DataFrame: Returned when the sheet_name parameter is specified and multiple sheets are requested.
Iterator of DataFrame: Returned when the chunksize or iterator parameter is specified.

Parameters common to all data sources

For examples of how to code the read_data() method, see Python notebook examples.

The following parameters can be specified for all supported data source types:

Parameter	Required or optional	Description
path id	Specify one of path or id	Path to the data source. If the data source is in My content, specify .my_folders at the start of the path. If the data source is in Team content, specify .public_folders at the start of the path. For example, to specify a file called `sales-notebook` that is stored in My content, specify `path=".my_folders/sales-notebook"` The ID of the file. For information about how to get the ID of a file, see Finding the ID of a file. For example: `id="i1F8D76C0FAD34J9CA50118746935D9X7"`
usecols	optional	List of integers or strings that identifies a subset of columns to be returned. You can specify one or more columns by the position number of the column in the file or by column name. For example, to request columns City and Quantity, specify In Python `usecols=["City","Quantity"]` If not specified, then all columns are returned. Note: This parameter is not applicable for OLAP data in a Framework Manager package.
chunksize	optional	An integer. If specified, the method returns an iterator. The value specifies the number of rows to return each time the method is invoked. For example, to return 3 rows, specify `chunksize=3` If both chunksize and iterator are not specified, then a DataFrame is returned.
iterator	optional	If specified, the method returns an iterator. Specifying `iterator=True` without chunksize specified returns one row each time the method is invoked. Specifying `iterator=True` with chunksize specified, returns chunksize rows each time the method is invoked. Reset the iterator using the reset method: In Python `iterator.reset()` method.
nrows	optional	The maximum number of rows returned. Specify an integer. If omitted, the maximum number of rows returned is 10,000. If `nrows=0` is specified, then the column headings are returned.

XLS file parameters

Uploaded files are treated as sheets of data. Reading one of these data sources returns the rows for the specified sheet of data in the data source.

The following parameter applies only to XLS files:

sheet_name

Optional. Integer, string, or a list of integers or strings. Specifies the integer position of a specific sheet or tab in an XLS file. The first sheet is 0. If you specify a list of strings or integers, then a list of the corresponding sheets is returned. If not specified, the value defaults to 0. Example:

In Python

sheet_name=["sheet2","sheet3"]

Data module parameters

Reading data from a data module must reflect the modeling done on the data source. For example, a join between two tables and aggregation types to apply.

The following parameters apply only to data modules:

Parameter	Required or optional	Description
table_name	Optional	String or list of strings. Restrict the scope of the request to specific table(s) in the data module. Specify the table(s) by table name(s). If you omit table_name, the `read_data()` method returns a `DataFrame` that contains the names of the tables defined in the data module. Example: In Python `table_name=["table1","table2"]`
calculation	Optional	String or list of strings. Read a calculation from a data module. Specify a calculated column in the data module.

Package parameters

Reading data from a package must reflect the modeling done on the data source. For example, the relationship between objects in the package.

The following parameters apply only to packages:

Parameter	Required or optional	Description
query_subject	Optional	String or list of strings. Restrict the scope of the request to the specific query subjects(s) in the package. Specify the query subject(s) by query subject name(s). Example: `query_subject="query1"`
folder_name	Optional	String or list of strings. Restrict the scope of the request to a folder in the package. Displays the contents of all folder(s) with that name in the package. If you want to see only the content of folder c that's in folder b that's in folder a specify the folder_name parameter as a list: In Python `folder_name=["a","b","c"]` This returns only the folder c in this path, even if there are other folders with the name c in the package.
metadata	Optional. Applies only to OLAP data in a Framework Manager package.	Value can be True or False. Specifying `metadata=True` returns the query subjects in the package. Specifying `metadata=False` returns the data in the package. If not specified, then a default value of False is used.

Note: You can use either single or double quotation marks in a method but not a mix of both.