Length analysis is used to refine the existing length metadata definition for selective columns, such as data type string columns, based on the actual data values that are present in the column.
Length analysis is useful if the original column length was set without knowledge or regard to the actual data values that the column would contain (for example, VarChar 255). If a different length for a column is determined from analysis, the existing metadata for the column can be changed in the original data source, or can be used to define the column in a new target schema for the data.
Each data value in a column's frequency distribution is analyzed to infer the length required for storing that individual data value. Then, all of the individual length inferences for the column are summarized by length to develop a frequency distribution of inferred lengths for the column. The system determines the longest inferred length in that frequency distribution which can store all of the columns data values.
During column analysis processing, the system constructs each column's frequency distribution and then analyzes each distinct value to determine what length must be used for storing that particular data value. After every individual data value has been analyzed, the system summarizes the individual results to create a frequency distribution by inferred lengths for that column. The system uses the longest length as the inferred length for the column since it can hold all of the existing data values. This system-inferred length is then recorded in the repository as the inferred selection and becomes by default the chosen selection, as shown in the following example.

You view the length analysis, if applicable, when the column analysis review of columns is viewed. At the detailed column view, length analysis has its own panel for viewing results as part of the Properties Analysis tab. From this panel, you can accept the system-inferred length or can use a drop-down list to override the system's inference with another length. If you override the system-inferred length selection, the new selection is recorded in the repository as the chosen length. The process is ultimately completed when you review all of the column properties and mark the column property function as reviewed.
A column's required length is provided in the length analysis summary. Many columns will result in only a single inferred length for all of its data values. In that case, unless you are aware of some future data values outside of the capabilities of the inferred length, you should accept the system's inference.
However, when there are multiple inferred lengths, you should take notice of the frequency count for the selected inferred length. If that frequency count is low relative to the row count of the table, some invalid data values might be causing an excessive length to be inferred. (A drill-down from the inferred length in the summary will show what data values require that length.) If that is the case, you can either override the inferred length property or flag those data values as “invalid” and ask the system to re-inference.
Another common situation that appears in length analysis is when a variable length string column is defined with a length of 128 or 255. For example, the system will infer a change to a length based on the data value with the most characters. If so, it might be a database administration design issue whether to change the column's metadata definition to the inferred length or leave it at the “standard” of 128 or 255.
Like other column properties, there is an advantage in maintaining the consistency of length assignments across columns in the data environment.
You have only one decision to make for each applicable column for length. You can either accept the system-inferred length or override the inferred length by selecting another.
After you make this decision, you can continue to review the other column properties or can mark the column properties review as complete.
There are no significant system performance considerations for the length analysis function.