Start of change

AI_ANALOGY scalar function

The AI_ANALOGY function computes an analogy score between two sets of values.

Read syntax diagramSkip visual syntax diagramAI_ANALOGY(source-1, target-1,source-2,target-2)

source or target:

Read syntax diagramSkip visual syntax diagram expressionUSINGMODEL COLUMNcolumn-name

The schema is SYSIBM.

The AI_ANALOGY function computes an analogy score using the values returned by the arguments. The arguments to the function specify two pairs, where there is a relationship between source-1 and source-2 and a relationship between target-1 and target-2. The interaction between the pairs form an analogy, which can be thought of as a human language analogy:source-1 is to target-1 as source-2 is to target-2.

source-1

The expression specifies the first source value for the analogy. The value must not be a binary, LOB, XML, ROWID, CHAR FOR BIT DATA, or VARCHAR FOR BIT DATA data type.

The column-name is an identifier in the USING MODEL COLUMN clause, which can be used to specify which machine learning model and column to use for the evaluation of the function.

target-1

The expression specifies the first target value for the analogy. The value must not be a binary, LOB, XML, ROWID, CHAR FOR BIT DATA, or VARCHAR FOR BIT DATA data type.

The column-name is an identifier in the USING MODEL COLUMN clause, which can be used to specify which machine learning model and column to use for the evaluation of the function.

source-2

The expression specifies the second source value for the analogy. The value must not be a binary, LOB, XML, ROWID, CHAR FOR BIT DATA, or VARCHAR FOR BIT DATA data type . The values for source-1 and source-2 must not be the same.

If the source-1 and source-2 model columns are both a SQL DI numeric data type, Db2 may return an error even if the two numeric values are different. This happens when the numeric values belong to the same cluster during model training. In this case, SQL DI treats them as the same value (token). See the Model details page of the SQL DI UI for more information how numeric values in a cluster are processed in function arguments.

The column-name is an identifier in the USING MODEL COLUMN clause, which can be used to specify which machine learning model and column to use for the evaluation of the function.

target-2

The expression specifies the second target value for the analogy. The value must not be a binary, LOB, XML, ROWID, CHAR FOR BIT DATA, or VARCHAR FOR BIT DATA data type.

The column-name is an identifier in the USING MODEL COLUMN clause, which can be used to specify which machine learning model and column to use for the evaluation of the function.

Arguments to AI_ANALOGY must specify a machine learning model and machine learning model columns that are used to evaluate the function. The following rules are used to determine which model and model column is used for each argument:
  • If an expression argument is a standalone column reference and no model column is explicitly specified with the USING MODEL COLUMN clause, the standalone column is the model column that is used to evaluate the function, and the model is the model table associated with the Db2 table that the column belongs to.
  • If the expression argument is a standalone column reference, but the USING MODEL COLUMN clause specifies a different column, the column specified in the USING MODEL COLUMN clause is the model column, and the model is the model table associated with the Db2 table that the model column belongs to.
  • A model column must be determined for at least one of source-1 and source-2. A model column can be explicitly specified in the USING MODEL COLUMN clause, or the expression must be a simple column reference. If a model column can be determined for one of source-1 or source-2 but not the other, then the model column that is determined is used for both source-1 and source-2. If a model column is specified for both source-1 and source-2, they must be the same model column.
  • A model column must be determined for at least one of target-1 and target-2. The model column can be explicitly specified in the USING MODEL COLUMN clause, or the expression must be a simple column reference. If a model column can be determined for one of target-1 or target-2 but not the other, then the model column that is determined is used for both target-1 and target-2. If a model column is determined for both target-1 and target-2, they must be the same model column.
  • All model columns specified in the function invocation must refer to columns that belong to the same table or view.
  • AI must be enabled for the table or view that the model is associated with and the model must be trained. All model columns must be included in the model.

The model column, either specified as a standalone column reference or with the USING MODEL COLUMN clause, can be a qualified name. The qualifier must not be a synonym name or a correlation name of a table expression. The qualifier must refer to a table name or a view name, or to an alias to a table name or view name.

The result is a double-precision floating point number (FLOAT) that is the analogy score. Larger-valued positive results indicate a better analogy than smaller results. If the expressions in target-1 and target-2 return the same value, the result of the function is -1 indicating a poor analogy, unless the expressions in source-1 and source-2 return values that are very similar (the similarity score >= 0.9).

The result can be null; if any argument is null, the result is the null value. If the arguments to the function contain values that were not seen during model training, the result is the null value.

By default, Db2 returns all results from a query. If you want to limit the result set to just the source-2 argument, specify the WHERE predicate in your query.

Notes

Configuration requirement
SQL Data Insights must be configured in Db2 to use this function.
AI_ANALOGY must not be specified in a CREATE VIEW statement
AI_ANALOGY must not be specified in a CREATE VIEW statement, and must not be specified in the fullselect of a materialized-query-definition of a CREATE TABLE statement.
AI_ANALOGY is not deterministic
The function reads data from the table associated with the model. The AI_ANALOGY function is not deterministic. Training of the model is also not deterministic. The model may change slightly, even if trained again using the same data, which can cause small differences in the analogy scores produced by the function which can affect the ordering of results.

Examples

The customer with ID ‘1066_JKSGK’ has churned. Given the relationship of that customer to ‘YES’ in the churn column, find customers with the same relationship to ‘NO’ in the churn column, in other words, customers unlikely to churn.

SELECT AI_ANALOGY('YES' USING MODEL COLUMN CHURN,
		'1066_JKSGK' USING MODEL COLUMN CUSTOMERID,
		'NO' USING MODEL COLUMN CHURN,
		CUSTOMERID),
	CHURN.*
FROM CHURN
ORDER BY 1 DESC
FETCH FIRST 5 ROWS ONLY

The source-2 argument specifies the column for contract terms. If you want to see just the results from source-2 and the rows where the CONTRACT is "One year," add the WHERE CONTRACT = 'One year' predicate to your query as shown in the following example:

SELECT * FROM  
 (SELECT DISTINCT AI_ANALOGY( 
           'Month-to-month' USING MODEL COLUMN CONTRACT,       
           'Electronic check' USING MODEL COLUMN PAYMENTMETHOD,
           'One year',  
           PAYMENTMETHOD) AS SIMILARITY, 
         CONTRACT, PAYMENTMETHOD 
  FROM ADMF001.CHURN   
  WHERE CONTRACT = 'One year') 
WHERE SIMILARITY > 0.0  
ORDER BY SIMILARITY DESC 
FETCH FIRST 5 ROWS ONLY; 
End of change