IDAX.GLM - Build a generalized linear model (GLM)

Use this stored procedure to synchronously or asynchronously build a GLM model.

Authorization

The privileges held by the authorization ID of the statement must include the IDAX_USER role.

Syntax

IDAX.GLM(in parameter_string varchar(32672), in synchronous boolean default true)

Parameter descriptions

parameter_string
Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
Data type: VARCHAR(32672)
The following list shows the parameter values:
model
Mandatory.
The name of the generalized linear model that is to be built.
Data type: VARCHAR(64)
intable
Mandatory.
The name of the input table.
Data type: VARCHAR(128)
id
Mandatory.
The column of the input table that identifies a unique instance ID.
Data type: VARCHAR(128)
target
The name of the column of the input table column for which a value is predicted.
Only numeric target columns are accepted.
Data type: NVARCHAR(128)
incolumn
Optional.
The columns of the input table that have specific properties, which are separated by a semi-colon (;).
Each column is succeeded by one or more of the following properties:
  • By type nominal (":nom", ":nominal") or by type continuous (":cont", ":continuous"). By default, numerical types are continuous, and all other types are nominal.
  • By type nominal (:nom) or by type continuous (:cont). By default, numerical types are continuous, and all other types are nominal.
  • By role id (":id"), target (":target"), input (":active", ":in", ":input") or ignore (":ignore", ":inactive").
  • By role :id, :target, :input, or :ignore.
If this parameter is not specified, all columns of the input table have default properties.
Default: none
Data type: VARCHAR(ANY)
coldeftype
Optional.
The default type of the input table columns.
Allowed values are nom and cont.
Allowed values are:
  • "nom" and "nominal" for type nominal
  • "cont" and "continuous" for type continuous
If the parameter is not specified, all numeric columns are continuous columns, and all other columns are nominal columns.
Default: none
Data type: VARCHAR(ANY)
Data type: VARCHAR(10)
coldefrole
Optional.
The default role of the input table columns.
Allowed values are input and ignore.
Allowed values are:
  • "active", "in", and "input" for role input
  • "ignore" and "inactive" for role ignore
If the parameter is not specified, all columns are input columns.
Default: input
Data type: VARCHAR(ANY)
Data type: VARCHAR(8)
intercept
Optional.
A flag that indicates whether the model is built with an intercept value.
Allowed values are 'true' and 'false'.
Default: true
Data type: VARCHAR(ANY)
effect
Optional.
The definition of the allowed effects or interactions between input columns.
The effect or interaction is a list of factors or covariates that is separated by a semicolon (;). Categorical columns are used as factors in the model; continuous fields are used as covariates.
By default, all input columns are considered as independent. They do not interact with each other.
You can combine factors and covariates by a star (*) and build nested terms for your model. You can use nested terms for modeling the effect of a factor or covariate the values of which do not interact with the levels of another factor. For example, a grocery store chain might follow the spending habits of its customers at several store locations. Because each customer frequents only one of these locations, the customer effect can be said to be nested within the store location effect.
Additionally, you can include interaction effects, such as polynomial terms that involve the same covariate, or add multiple levels of nesting to the nested term.
Nested terms have the following restrictions:
  • All factors within an interaction must be unique. Thus, if A is a factor, specifying A*A is invalid.
  • All factors within a nested effect must be unique. Thus, if A is a factor, specifying A(A) is invalid.
  • No effect can be nested within a covariate. Thus, if A is a factor and X is a covariate, specifying A(X) is invalid.
Default: c1; c2; c3; ...., cn
Where c1, c2, c3, ..cn are the input columns.
Data type: VARCHAR(ANY)
family
Optional.
The type of the distribution.
Allowed values are bernoulli, gaussian, poisson, binomial, negativebinomial, wald (inversegaussian), and gamma.
Default: bernoulli
Data type: VARCHAR(ANY)
link
Optional.
The type of the link function.
Allowed values are 'clog', 'cloglog', 'gaussit', 'identity', 'log', 'logit', 'oddspower', 'power', 'probit', and 'sqrt'.
Default: logit
Data type: VARCHAR(ANY)
link_param
Optional.
An extra parameter that is used for the links 'oddspower' and 'power'.
The range of value depends on the link function that is used.
Default: 1
Data type: DOUBLE
maxit
Optional.
The maximum number of iterations.
Default: 20
Data type: INTEGER
tol
Optional.
The tolerance for the linear equation solver when to consider a value to be equal to zero.
Default: 1e-7
Data type: DOUBLE
trials
The column of the input table that contains the number of trials for the binomial distribution.
This parameter is mandatory when the value of the family parameter is binomial.
For other distributions, this parameter is ignored.
Default: ''
Data type: VARCHAR(ANY)
spark_partitions
Optional.
Number of Spark partitions that are created with the input data.
This number determines the degree of parallelism that is used to create the model.
Note: If the number of rows per partition is too small, the resulting model might be imprecise. For small input data sets, even the default value might be too large. Therefore, explicitly set a smaller value.
Default: Number of available CPUs divided by 2.
Data type: INTEGER
synchronous
Optional.
Synchronous or asynchronous execution of the GLM stored procedure.
In a synchronous execution, the GLM stored procedure waits until the execution is completed. In an asynchronous execution, the GLM procedure returns as soon as it is submitted.
To get the status and the submission ID of an asynchronously executed GLM stored procedure, call the IDAX.APP_STATUS() stored procedure after you launch the GLM stored procedure.
To cancel an asynchronously executed GLM stored procedure, call the IDAX.CANCEL_APP() stored procedure while the GLM stored procedure is running.
Default: true

Returned information

The procedure does not return any SQL results.

Example

CALL IDAX.GLM('model=adult_mdl, intable=adult_train, id=id, target=age, link=identity, family=gaussian');