Topic
1 reply Latest Post - ‏2013-03-26T17:15:08Z by SystemAdmin
SystemAdmin
SystemAdmin
1245 Posts
ACCEPTED ANSWER

Pinned topic SPL compilation error. Classification. PMML.

‏2013-03-26T09:41:04Z |
Hi.

We are trying to use the mining toolkit to perform a trivial classification (logistic regression) for testing purposes. We have exported the PMML from SPSS Modeler 15 and are trying to use it in a Classification node in our Streams application (Streams v3.0).

The model is a binomial logistic regression on the target variable "Above_1.5" with one continuous independent variable ("teamTrafikkNorm"). The model is trained by simply setting "Above_1.5" to True if "teamTrafikkNorm" is above 1.5.

Trying to compile the SPL code (posted below) gives the following error messages:

" CDISP0232E A user error was encountered while generating code for operator 'Classification_1' " (Line 37, Column 3)

" CDISP9164E Model file '<model file path>/above15.pmml' is not a valid PMML document" (Line 1, Column 1)

We are new to the whole "exporting PMML files in SPSS and using them in Streams" thing so we do not know where to start to look for errors. The PMML-code is posted below as well.

Does anyone see what's wrong?

Thanks

SPL code (Main.spl):

use com.ibm.streams.mining.scoring::Classification ;   composite Main 
{ graph () as FileSink_1 = FileSink(Classification_1_out0) 
{ param file : 
"out.csv" ; format : csv ; flush : 10u ; 
}   (stream<float64 x, float64 y, float64 z, float64 t, rstring info> FileSource_1_out0) as FileSource_1 = FileSource() 
{ param file : 
"walk2.csv" ; format : csv ; output FileSource_1_out0 : info = FileName() ; 
}   (stream<rstring info, float64 t, float64 x, float64 y, float64 z, float64 teamTrafikkNorm> Aggregate_1_out0) as Aggregate_1 = Aggregate(FileSource_1_out0 as inPort0Alias) 
{ window inPort0Alias : tumbling, count(10), partitioned ; param partitionBy : info ; output Aggregate_1_out0 : teamTrafikkNorm = Average(abs(x) + abs(y) + abs(z)) ; 
}   (stream<rstring info, float64 t, float64 teamTrafikkNorm, rstring klasse, float64 confidence> Classification_1_out0) as Classification_1 = Classification(Aggregate_1_out0) 
{ param model : 
"above15standard4regr.pmml" ; teamTrafikkNorm : 
"teamTrafikkNorm" ; 
}   
}


PMML-file:

<?xml version=
"1.0" encoding=
"UTF-8"?><PMML version=
"4.0" xmlns=
"http://www.dmg.org/PMML-4_0" xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=
"http://www.dmg.org/PMML-4_0 pmml-4-0.xsd"> <Header copyright=
"(C) Copyright IBM Corp. 1989, 2011."> <Application name=
"IBM SPSS Statistics 20.0" version=
"20.0.0.1"/> </Header> <DataDictionary numberOfFields=
"2"> <DataField dataType=
"string" displayName=
"Above_1.5" name=
"Above_1.5" optype=
"categorical"> <Extension extender=
"spss.com" name=
"format" value=
"1"/> <Extension extender=
"spss.com" name=
"width" value=
"1"/> <Extension extender=
"spss.com" name=
"decimals" value=
"0"/> <Value displayValue=
"F" property=
"valid" value=
"F"/> <Value displayValue=
"T" property=
"valid" value=
"T"/> </DataField> <DataField dataType=
"double" displayName=
"teamTrafikkNorm" name=
"teamTrafikkNorm" optype=
"continuous"> <Extension extender=
"spss.com" name=
"format" value=
"5"/> <Extension extender=
"spss.com" name=
"width" value=
"10"/> <Extension extender=
"spss.com" name=
"decimals" value=
"3"/> </DataField> </DataDictionary> <GeneralRegressionModel functionName=
"classification" modelName=
"x-contrastLogistic" modelType=
"multinomialLogistic" targetVariableName=
"Above_1.5"> <Extension extender=
"spss.com" name=
"numberParameters" value=
"2"/> <MiningSchema> <MiningField missingValueTreatment=
"asIs" name=
"Above_1.5" usageType=
"predicted"/> <MiningField missingValueTreatment=
"asIs" name=
"teamTrafikkNorm" usageType=
"active"/> </MiningSchema> <ParameterList> <Parameter label=
"Constant" name=
"P0000001"/> <Parameter label=
"teamTrafikkNorm" name=
"P0000002"/> </ParameterList> <CovariateList> <Predictor name=
"teamTrafikkNorm"/> </CovariateList> <PPMatrix> <PPCell parameterName=
"P0000002" predictorName=
"teamTrafikkNorm" value=
"1"/> </PPMatrix> <ParamMatrix> <PCell beta=
"-14984.0601305434" df=
"1" parameterName=
"P0000001" targetCategory=
"T"/> <PCell beta=
"9989.85623210975" df=
"1" parameterName=
"P0000002" targetCategory=
"T"/> </ParamMatrix> </GeneralRegressionModel> </PMML>
  • SystemAdmin
    SystemAdmin
    1245 Posts
    ACCEPTED ANSWER

    Re: SPL compilation error. Classification. PMML.

    ‏2013-03-26T17:15:08Z  in response to SystemAdmin
    The Streams Mining Toolkit supports specific versions of PMML See link at:
    http://pic.dhe.ibm.com/infocenter/streams/v3r0/topic/com.ibm.swg.im.infosphere.streams.mining-toolkit.doc/doc/supportedpmmlversions.html

    The error you are seeing is because the PMML produced by the SPSS Modeler is version 4.0 and is not compatible with the Mining Toolkit.

    Here's some general information on Model Scoring in Streams

    There are 2 basic options:

    Mining toolkit
    This is included as part of Streams install. It is limited to "older" PMML versions. It works well with PMML generated from IBM InfoSphere Warehouse. (Note if you use SPSS Modeler and install the Data Warehouse integration, you can use SPSS Modeler to build PMML models in the warehouse, that PMML will be at the right version to use in the Streams Mining Toolkit).

    IBM SPSS Analytics Toolkit for InfoSphere Streams Version 1.0
    The SPSS Analytics Toolkit contains InfoSphere Streams operators that integrate with IBM SPSS Modeler and SPSS Collaboration and Deployment Services products to implement various aspects of SPSS Modeler predictive analytics in your InfoSphere Streams applications. Specifically it provides SPSSScoring, SPSSPublish, and SPSSRepository operators.

    The SPSS Analytics Toolkit is installed by the SPSS Modeler Solution Publisher product, which is shipped in SPSS Collaboration and Deployment Services release 5.0 and later (see http://www.ibm.com/software/analytics/spss/products/deployment/cds/ for details).
    I've attached the PDF manual for the SPSS Analytics Toolkit for your convenience.

    Hope this helps,
    Mike