IBM Streams 4.2.1

Toolkit com.ibm.streams.mining 2.1.0

SPL standard and specialized toolkits > com.ibm.streams.mining 2.1.0

General Information

The Mining Toolkit includes operators that you can use to mine data streams by applying models. The models can involve complex data algorithms that are integrated from InfoSphere Warehouse.

Mining data streams to extract relevant information or intelligence is critical for a majority of stream processing applications that range from fraud detection, to customer segmentation, to churn or intrusion prevention. In most cases, stream mining requires applying models that were learned from history to streaming data in order to detect patterns of interest. The models that need to be applied can involve complex data mining algorithms that might not be easily supported by InfoSphere Streams built-in operators. The Mining Toolkit supports scoring these complex data mining algorithms. This toolkit integrates algorithms from the IBM InfoSphere Warehouse using the Predictive Model Markup Language (PMML) standard.

PMML is a standard XML representation that allows specifications of different mining models, their ensembles, and associated preprocessing. PMML is supported by several state-of-the-art statistics and data mining software tools such as InfoSphere Warehouse, R / Rattle, SAS Enterprise Miner, SPSS, and Weka. The supported algorithms include both supervised approaches, where data labels and ground truth is available during model training, and unsupervised approaches, where no ground truth is available during model training. The following mining algorithms are supported in the toolkit:
  • Classification algorithms, including:
    • Decision Trees
    • Naive Bayes
    • Logistic Regression
  • Clustering algorithms, including:
    • Demographic Clustering
    • Kohonen Clustering
  • Regression algorithms, including:
    • Linear Regression
    • Polynomial Regression
    • Transform Regression
  • Associations algorithms, including:
    • Association Rules
The Mining Toolkit uses algorithms that are integrated from InfoSphere Warehouse. These algorithms support the following versions of PMML:
  • Decision Trees algorithm:
    • PMML versions 2.0 - 3.0
  • Naive Bayes algorithm:
    • PMML versions 2.0 - 3.2
  • Logistic Regression algorithm:
    • PMML versions 2.0 - 3.2
  • Demographic Clustering algorithm
    • PMML versions 2.0 - 3.0
  • Kohonen Clustering algorithm:
    • PMML versions 2.0 - 3.0
  • Linear Regression algorithm:
    • PMML versions 2.0 - 3.0
  • Polynomial Regression algorithm:
    • PMML versions 2.0 - 3.0
  • Transform Regression algorithm:
    • PMML versions 2.0 - 3.0
  • Association Rules algorithm:
    • PMML versions 2.0 - 3.2

If your data mining software cannot produce models at these levels, you can use a model integration technique that does not rely on PMML, such as the IBM SPSS Analytics Toolkit for InfoSphere Streams Version 1.0 or the InfoSphere Streams R-project Toolkit:

  • The SPSS Analytics Toolkit contains InfoSphere Streams operators that integrate with SPSS Modeler and SPSS Collaboration and Deployment Services products to implement various aspects of SPSS Modeler predictive analytics in your InfoSphere Streams applications. The SPSS Analytics Toolkit provides the SPSSScoring, SPSSPublish, and SPSSRepository operators for scoring models. The SPSS Analytics Toolkit is installed by the SPSS Modeler Solution Publisher product, which is included in SPSS Collaboration and Deployment Services 5.0, or later.
  • The R-project Toolkit provides the RScript operator that integrates InfoSphere Streams with the R environment. You can use the R-project Toolkit to score mining models that are built in the R environment. The R-project toolkit is included when you install InfoSphere Streams.

NOTE: The Mining Toolkit is not supported on RHEL 7 platforms.

Developing and running applications that use the Mining Toolkit
Version
2.1.0
Required Product Version
4.0.0.0

Namespaces

com.ibm.streams.mining.scoring
The Mining Toolkit has four toolkit operators.
Operators