Cleaning data
Data is often messy and inconsistent. You might want to clean your data so that it's clearer and easier to read.
About this task
The Clean options that are available for a column depend on the column data type. Some options can be specified for multiple columns with the same data type, and some for singular columns only.
The following options are available to clean your data:
- Whitespace
- Trim leading and trailing whitespace
Select this check box to remove leading and trailing whitespace from strings.
- Convert case to
- UPPERCASE, lowercase, Do not
change
Use this option to change the case of all characters in the string to either uppercase or lowercase, or to ensure that the case of each individual character is unchanged.
- Return a substring of characters
- Return a string that includes only part of the original string in each value. For example, an
employee code can be stored as CA096670, but you need only the number 096670 so you use this option
to remove the CA part. You can specify this option for singular columns only.
For the Start value, type a number that represents the position of a character in the string that will start the substring. Number 1 represents the first character in the string. For the Length value, specify the number of characters that will be included in the substring.
- NULL values
- Specify NULL-handling options for columns with text, numeric, date, and time data types that
allow NULL values. When Cognos Analytics detects that a column does not allow NULL values, these
options are not available for that column.
The default value for each option depends on the column data type. For text data, the default is an empty string. For numbers, the default is 0. For dates, the default is 2000-01-01. For time, the default is 12:00:00. For date and time (timestamp), the default is 2000-01-01T12:00:00.
The entry field for each option also depends on the column data type. For text, the entry field accepts alphanumeric characters, for numbers, the entry field accepts only numeric input. For dates, a date picker is provided to select the date, and for time, a time picker is provided to select the time.
The following NULL-handling options are available:
Procedure
Example
If you want to use an empty string instead of NULL in a given column, but your uploaded file sometimes uses the string n/a to indicate that the value is unknown, you can replace n/a with NULL, and then replace NULL with the empty string.