Looking for good tip.
I have a statistic table consisting of millions of events having 3 key fields PROD (Productivity), TYPE (4 classes) and SIZE (binned into 3 classifications).
We want to remove the events whose PROD falls in the 5% and 95% tiles from each sample set of TYPE - SIZE (12 total sets) .... and use the remaining events to do analysis.
So I'd like to see if anyone has a better way to do this in the modeler. Below I provide high level flow for current method (w/o modeler) and the approach I am taking with modeler.
I am looking to see if anyone has a better way then the modeler approach of FOR loops... thing
- Binning does not allows me to bin with catigorical variables.
- How do I get SPSS EXAMINE to work on Modeler so I can use the output table.
CURRENT METHOD (non modeler - using python scripts)
1. I invoke SPSS EXAMINE on all events": PROD by TYPE X SIZE to get the 5% and 95% PROD limits for 12 sets.
2. Extract the 5% and 95% limits from the EXAMINE report
3. Build a SPSS script that classifies each event for each of the 12 sets
if (TYPE = t and SIZE = s and [ 5%_Limit < PROD < 95%_LIMIT] ) then classify as OUTLIER else classify as INTERIER.
4. Execute SPSS script across all events.
APPROACH USING MODELER (under development)
using standalone script to build the stream dynamically.
create statisticnode [containing all the events]
Let me know if someone has a better method.