IDAX.PCA - Build a PCA model

Use this stored procedure to build a PCA model.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

To create a PCA model, ensure that you meet the following prerequisites:

  1. The input columns must contain only continuous columns.
  2. The input columns must contain at least two continuous columns.
  3. The number of input columns must not be greater than 78.
  4. The input table must not be empty.

Syntax

IDAX.PCA(in parameter_string varchar(32672))

Parameter descriptions

parameter_string
Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
Data type: VARCHAR(32672)
The following list shows the parameter values:
model
Mandatory.
The name of the PCA model that is to be built.
Data type: VARCHAR(ANY)
intable
Mandatory.
The name of the input table.
Data type: VARCHAR(ANY)
id
Mandatory.
The column of the input table that identifies a sequence ID.
Data type: VARCHAR(ANY)
incolumn
Optional.
The columns of the input table that have specific properties, which are separated by a semi-colon (;).
If this parameter is not specified, all columns of the input table have default properties.
Default: none
Data type: VARCHAR(ANY)
incolumn
Optional.
The columns of the input table that have specific properties, which are separated by a semi-colon (;).
Each column is succeeded by one or more of the following properties:
  • By type nominal (":nom", ":nominal") or by type continuous (":cont", ":continuous"). By default, numerical types are continuous, and all other types are nominal.
  • By type nominal (:nom) or by type continuous (:cont). By default, numerical types are continuous, and all other types are nominal.
  • By role id (":id"), target (":target"), input (":active", ":in", ":input") or ignore (":ignore", ":inactive").
  • By role :id, :target, :input, or :ignore.
If this parameter is not specified, all columns of the input table have default properties.
Default: none
Data type: VARCHAR(ANY)
coldeftype
Optional.
The default type of the input table columns.
Allowed values are nom and cont.
Allowed values are:
  • "nom" and "nominal" for type nominal
  • "cont" and "continuous" for type continuous
If the parameter is not specified, all columns are continuous columns.
Default: none
Data type: VARCHAR(ANY)
Data type: VARCHAR(10)
coldefrole
Optional.
The default role of the input table columns.
Allowed values are input and ignore.
Allowed values are:
  • "active", "in", and "input" for role input
  • "ignore" and "inactive" for role ignore
If the parameter is not specified, all columns are input columns.
Default: input
Data type: VARCHAR(ANY)
Data type: VARCHAR(8)
colPropertiesTable
Optional.
The input table where properties of the columns of the input table are stored in the same format as the output format of the IDAX.COLUMN_PROPERTIES() stored procedure.
If this parameter is not specified, the column properties of the input table column properties are detected automatically.
Restriction: The following tables are not supported:
  • colPropertiesTable with "COLROLE" column with value 'objweight'
  • colPropertiesTable with "COLWEIGHT" column with value '<wgt>'
Default: none
Data type: VARCHAR(ANY)
forceeigensolve
Optional.
A flag that indicates whether the eigenvalue decomposition is to be forced when the model is calculated which might be marginally faster that the standard singular value decomposition.
This parameter value is ignored for the Spark implementation of PCA. It is, however, still accepted as a parameter to be compatible with the previous SQL-based version of PCA.
Allowed values are 'true' and 'false'.
Setting this flag to true might be marginally faster than the standard singular value decomposition. If the input table is a singular matrix, setting the flag to true does not produce a result.
Set this flag to true only if the modeling process for an input table seems to be slow or seems to produce wrong results.
Default: false
Data type: VARCHAR(ANY)
centerdata
Optional.
A flag that indicates whether the input values are to be shifted to zero-centured.
Allowed values are 'true' and 'false'.
Default: true
Data type: VARCHAR(ANY)
scaledata
Optional.
A flag that indicates whether unit variance scaling is to be automatically applied for the input values.
Allowed values are 'true' and 'false'.
Default: false
Data type: VARCHAR(ANY)
savescores
Optional.
A flag that indicates whether the score on each PCA is to be calculated and stored in the PCA model.
Allowed values are 'true' and 'false'.
Default: false
Data type: VARCHAR(ANY)

Returned information

A result set that contains the string 't' (for 'true') if the PCA model was built successfully. If the PCA model was not built successfully, an error message is shown.

Example

CALL IDAX.PCA('model=customer_churn_pca, intable=customer_churn_train, id=cust_id, coldefrole=ignore, 
incolumn=DURATION;AVG_SPENT_RETAIN_PM;AVG_SQ_SPENT_RETAIN_PM;ANNUAL_REVENUE_MIL;TOTAL_EMPLOYEES;TOTAL_BUY;TOTAL_BUY_FREQ;TOTAL_BUY_FREQ_SQ,
centerData=true, scaleData=true, forceEigenSolve=false, saveScores=true');