Start of change

AI_SIMILARITY scalar function

The AI_SIMILARITY function computes a similarity score between two values.

Read syntax diagramSkip visual syntax diagramAI_SIMILARITY(expression-1USINGMODEL COLUMNcolumn-name, expression-2USINGMODEL COLUMNcolumn-name)

The schema is SYSIBM.

The AI_SIMILARITY function computes a similarity score using the values returned by expression-1 and expression-2. The machine learning model used to compute the score is determined by the columns specified in expression-1 and expression-2, or you can explicitly specify it in the column-name in the USING MODEL COLUMN clause.

expression-1
An expression that specifies the first value on which the similarity score is computed. The value returned by expression-1 must be a built-in data type that is not a binary data type, a LOB data type, XML, ROWID, CHAR FOR BIT DATA, or VARCHAR FOR BIT DATA.
expression-2
An expression that specifies the second value on which the similarity score is computed. The value returned by expression-2 must be a built-in data type that is not a binary data type, a LOB data type, XML, ROWID, CHAR FOR BIT DATA, or VARCHAR FOR BIT DATA.
column-name

An identifier that specifies a column to be used as the model column, which determines the machine learning model used to evaluate the function.

Arguments to AI_SIMILARITY must specify a machine learning model and a machine learning model column that is used to evaluate the function. The following rules are used to determine which model and model columns are used for each argument:
  • If the expression argument is a standalone column reference and no model column is explicitly specified with the USING MODEL COLUMN clause, the standalone column is the model column that is used to evaluate the function, and the model is the model table associated with the table that the column belongs to.
  • If the expression argument is a standalone column reference, but the USING MODEL COLUMN clause specifies a different column, the column specified in the USING MODEL COLUMN clause is the model column, and the model is the model table associated with the table that the model column belongs to.
  • Any kind of expression other than a standalone column reference either must specify a model column name using the USING MODEL COLUMN clause, or the model column must be able to be inferred according to these rules:
    • The model column can be explicitly specified by the USING MODEL COLUMN clause. The model is determined by the table that the model column belongs to.
    • The model column of an argument that is not a standalone column-reference and does not have a column specified by the USING MODEL COLUMN clause will be inferred by the model column of the other argument. At least one of the arguments must have a model column.
  • All model columns specified in the function invocation must refer to columns that belong to the same table or view.
  • AI must be enabled for the table or view that the model is associated with and the model must be trained. All model columns must be included in the model.
  • If either expression specifies a model column that was identified during model training as a primary key column, the other expression must specify the same model column or must not specify any model column.

Each model column, either specified as a standalone column reference or with the USING MODEL COLUMN clause, may be a qualified name. The qualifier must not be a synonym name or a correlation name of a table expression. The qualifier must refer to a table name or a view name, or to an alias to a table name or view name.

If the model column is a numeric column as indicated during the training of the model, the value of the expression is cast to DOUBLE during execution of the function. The value of the numeric column is converted into a string token based on the clustering of the value. You can find the minimum values for each cluster in the Model details panel. If the value is outside the range of DOUBLE, the result of the function is a null value

The result is a double-precision floating point number (FLOAT) that is the similarity score. The result is a number between -1.0 and 1.0, where -1.0 means that the values are least similar, and 1.0 means that they are most similar.

The result can be null; if any argument is null, the result is the null value. If expression-1 or expression-2 evaluates to a value that was not seen during model training, and the model column used is a categorical one as indicated during the training of the model, the result is the null value. This does not apply when the model column a numeric one.

Notes

Configuration requirement
SQL Data Insights must be configured in Db2 to use this function.
AI_SIMILARITY must not be specified in a CREATE VIEW statement
AI_SIMILARITY must not be specified in a CREATE VIEW statement, and must not be specified in the fullselect of a materialized-query-definition of a CREATE TABLE statement.
AI_SIMILARITY is not deterministic
The AI_SIMILARITY function reads data from the table associated with the model. The function is not deterministic. Training of the model is also not deterministic. The model may change slightly, even if trained again using the same data, which can cause small differences in the similarity scores produced by the function. This can affect the similarity ordering of results.

Examples

Find the top five customers by ID most similar to the customer with ID '3668-QPYBK'.

SELECT AI_SIMILARITY(CUSTOMERID, '3668-QPYBK'), CHURN.*
FROM CHURN
ORDER BY 1 DESC
FETCH FIRST 5 ROWS ONLY;

Find the top three payment methods most similar to 'YES' in the CHURN column.

SELECT DISTINCT AI_SIMILARITY(PAYMENTMETHOD, 'YES' USING MODEL COLUMN CHURN), 
PAYMENTMETHOD
FROM CHURN
ORDER BY 1 DESC
FETCH FIRST 3 ROWS ONLY
End of change