IDAX.GLM - Build a generalized linear model (GLM)
Use this stored procedure to synchronously or asynchronously build a GLM model.
Authorization
The privileges held by the authorization ID of the statement must include the IDAX_USER role.
Syntax
IDAX.GLM(in parameter_string varchar(32672), in synchronous boolean default true)
Parameter descriptions
- parameter_string
- Mandatory one-string parameter that contains pairs of <parameter>=<value> entries that are separated by a comma.
- Data type: VARCHAR(32672)
- The following list shows the parameter values:
-
- model
- Mandatory.
- The name of the generalized linear model that is to be built.
- Data type: VARCHAR(64)
- intable
- Mandatory.
- The name of the input table.
- Data type: VARCHAR(128)
- id
- Mandatory.
- The column of the input table that identifies a unique instance ID.
- Data type: VARCHAR(128)
- target
- The name of the column of the input table column for which a value is predicted.
- Only numeric target columns are accepted.
- Data type: NVARCHAR(128)
- incolumn
- Optional.
- The columns of the input table that have specific properties, which are separated by a semi-colon (;).
- Each column is succeeded by one or more of the following properties:
- By type nominal (":nom", ":nominal") or by type continuous (":cont", ":continuous"). By default, numerical types are continuous, and all other types are nominal.
- By type nominal (
:nom
) or by type continuous (:cont
). By default, numerical types are continuous, and all other types are nominal. - By role id (":id"), target (":target"), input (":active", ":in", ":input") or ignore (":ignore", ":inactive").
- By role
:id
,:target
,:input
, or:ignore
.
- If this parameter is not specified, all columns of the input table have default properties.
- Default: none
- Data type: VARCHAR(ANY)
- coldeftype
- Optional.
- The default type of the input table columns.
- Allowed values are
nom
andcont
. - Allowed values are:
- "nom" and "nominal" for type nominal
- "cont" and "continuous" for type continuous
- If the parameter is not specified, all numeric columns are continuous columns, and all other columns are nominal columns.
- Default: none
- Data type: VARCHAR(ANY)
- Data type: VARCHAR(10)
- coldefrole
- Optional.
- The default role of the input table columns.
- Allowed values are
input
andignore
. - Allowed values are:
- "active", "in", and "input" for role input
- "ignore" and "inactive" for role ignore
- If the parameter is not specified, all columns are input columns.
- Default: input
- Data type: VARCHAR(ANY)
- Data type: VARCHAR(8)
- intercept
- Optional.
- A flag that indicates whether the model is built with an intercept value.
- Allowed values are 'true' and 'false'.
- Default: true
- Data type: VARCHAR(ANY)
- effect
- Optional.
- The definition of the allowed effects or interactions between input columns.
- The effect or interaction is a list of factors or covariates that is separated by a semicolon (;). Categorical columns are used as factors in the model; continuous fields are used as covariates.
- By default, all input columns are considered as independent. They do not interact with each other.
- You can combine factors and covariates by a star (*) and build nested terms for your model. You can use nested terms for modeling the effect of a factor or covariate the values of which do not interact with the levels of another factor. For example, a grocery store chain might follow the spending habits of its customers at several store locations. Because each customer frequents only one of these locations, the customer effect can be said to be nested within the store location effect.
- Additionally, you can include interaction effects, such as polynomial terms that involve the same covariate, or add multiple levels of nesting to the nested term.
- Nested terms have the following restrictions:
- All factors within an interaction must be unique. Thus, if A is a factor, specifying A*A is invalid.
- All factors within a nested effect must be unique. Thus, if A is a factor, specifying A(A) is invalid.
- No effect can be nested within a covariate. Thus, if A is a factor and X is a covariate, specifying A(X) is invalid.
- Default: c1; c2; c3; ...., cn
- Where c1, c2, c3, ..cn are the input columns.
- Data type: VARCHAR(ANY)
- family
- Optional.
- The type of the distribution.
- Allowed values are bernoulli, gaussian, poisson, binomial, negativebinomial, wald (inversegaussian), and gamma.
- Default: bernoulli
- Data type: VARCHAR(ANY)
- link
- Optional.
- The type of the link function.
- Allowed values are 'clog', 'cloglog', 'gaussit', 'identity', 'log', 'logit', 'oddspower', 'power', 'probit', and 'sqrt'.
- Default: logit
- Data type: VARCHAR(ANY)
- link_param
- Optional.
- An extra parameter that is used for the links 'oddspower' and 'power'.
- The range of value depends on the link function that is used.
- Default: 1
- Data type: DOUBLE
- maxit
- Optional.
- The maximum number of iterations.
- Default: 20
- Data type: INTEGER
- tol
- Optional.
- The tolerance for the linear equation solver when to consider a value to be equal to zero.
- Default: 1e-7
- Data type: DOUBLE
- trials
- The column of the input table that contains the number of trials for the binomial distribution.
- This parameter is mandatory when the value of the family parameter is binomial.
- For other distributions, this parameter is ignored.
- Default: ''
- Data type: VARCHAR(ANY)
- spark_partitions
- Optional.
- Number of Spark partitions that are created with the input data.
- This number determines the degree of parallelism that is used to create the model.
-
Note: If the number of rows per partition is too small, the resulting model might be imprecise. For small input data sets, even the default value might be too large. Therefore, explicitly set a smaller value.
- Default: Number of available CPUs divided by 2.
- Data type: INTEGER
- synchronous
- Optional.
- Synchronous or asynchronous execution of the GLM stored procedure.
- In a synchronous execution, the GLM stored procedure waits until the execution is completed. In an asynchronous execution, the GLM procedure returns as soon as it is submitted.
- To get the status and the submission ID of an asynchronously executed GLM stored procedure, call the IDAX.APP_STATUS() stored procedure after you launch the GLM stored procedure.
- To cancel an asynchronously executed GLM stored procedure, call the IDAX.CANCEL_APP() stored procedure while the GLM stored procedure is running.
- Default: true
Returned information
The procedure does not return any SQL results.
Example
CALL IDAX.GLM('model=adult_mdl, intable=adult_train, id=id, target=age, link=identity, family=gaussian');