Topic
  • 19 replies
  • Latest Post - ‏2013-02-03T20:21:48Z by SystemAdmin
ULandreman
ULandreman
6 Posts

Pinned topic Documentation on spssaux?

‏2011-07-26T21:31:12Z |
Some python code that used to work in previous version of SPSS Statistics have stopped working since I've upgraded to SPSS v.19.
I sense my problems may involve the spssaux module - in particular some recent changes to it. Where can I find the documentation on this module?

Urban Landreman
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-26T21:34:53Z  
    There should not be any breakage due to using a newer version of the spssaux.py module. Maybe we can figure out the problem if you post more information.

    The functions and classes in that module are documented via docstrings in the code. Each public function has an explanation immediately following the declaration that explains what it does and what the parameters are.

    Regards,
    Jon Peck
  • ULandreman
    ULandreman
    6 Posts

    Re: Documentation on spssaux?

    ‏2011-07-26T22:18:30Z  
    I've attached a .jpg screen shot showing the error messages I get.
    The Python code generates a .csv file with data dictionary info on a .sav file.
    I was getting one error message originally connected with the line
    spss.Submit(cmd)

    I added the lines:
    import locale
    locale.setlocale(locale.LC_ALL, "english")

    and that got me past that error message.
    Now I'm getting a 2nd error message that's connected with the line
    vMissingValues=spssaux.GetMissingValues(i)
    The entire Python code is:
    1. Write dictionary about variables in an SPSS dataset
    from easygui import *
    import sys
    import spss
    import spssaux

    import locale
    locale.setlocale(locale.LC_ALL, "english")
    msgBox = msgbox(msg="Select the SPSS .sav file of interest", title="Data Dictionary Generator", ok_button="OK")
    spssSavFile = fileopenbox(msg=None, title=None, default=None)
    if spssSavFile == None:
    print "No file selected"
    else:
    cmd = 'GET FILE="'+spssSavFile+'".'
    print "About to run "+ cmd
    spss.Submit(cmd)
    1. spss.Submit()
    print "Got past spss.Submit(cmd)"
    print "GET FILE='"spssSavFile"'."
    outputfile = filesavebox(msg="Where do you want the results saved?", title="Data Dictionary Results", default="DataDictionary.csv")
    varcount=spss.GetVariableCount()
    print varcount
    varInfoAll = []
    dbOrderNum = 0
    for i in xrange(varcount):
    dbOrderNum = dbOrderNum + 1
    varInfo = []
    vVarLabel = spss.GetVariableLabel(i).replace(chr(34),'')
    vFormat1 = spss.GetVariableFormat(i)0:1
    valLabels = spssaux.GetValueLabels(i)

    if len(valLabels) > 0:
    vCodeList=[]
    for vCode in valLabels.keys():
    print "vVarLabel - " + vCode
    if vFormat1 == "F":
    try:
    int(vCode)
    VCodeX = int(vCode)
    except:
    VCodeX = vCode

    vCodeList.append(VCodeX)
    else:
    vCodeList.append(vCode)
    vCodeList.sort()
    vValLabels=chr(34)
    for vCode in vCodeList:
    if vFormat1 == "F":
    vCode2 = str(vCode)
    valLabels2 = valLabelsstr(vCode)
    else:
    vCode2 = vCode
    valLabels2 = valLabelsvCode
    vValLabels = vValLabels + vCode2 + ": " + (valLabels2.strip()).replace(chr(34),'') +"\n"
    vValLabels = vValLabels0:len(vValLabels) - 1 + chr(34)
    else:
    vValLabels="None"
    if vValLabels[1] == "-" and len(vValLabels)>252:
    vValLabels = chr(34) + "minus " + vValLabels2:len(vValLabels)
    vMissingValues=spssaux.GetMissingValues(i)
    if len(vMissingValues) == 0:
    vMissingValuesEdited = ''
    else:
    vMissingValuesEdited = chr(34)+ vMissingValues + chr(34)
    varInfo.extend(http://spss.GetVariableName(i).upper(), vVarLabel, spss.GetVariableFormat(i), vValLabels,vMissingValuesEdited,dbOrderNum)
    varInfoAll.append(varInfo)

    varInfoAll.sort()
    ofile = open(outputfile, 'wb')
    oline = "Variable,Variable Label,Format,Value Labels,Missing Values,Dataset Order Number\n"
    ofile.write(oline)

    for (vName, vLabel, vFormat,vValLabels,vMissingValues,dbOrderNum) in varInfoAll:
    oline = vName + ',' + chr(34) + vLabel + chr(34) + ',' + vFormat + "," + vValLabels + ","vMissingValues","+str(dbOrderNum)+'\n'
    ofile.write(oline)
    ofile.close()
    ****************
    Thanks
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-27T00:31:29Z  
    I've attached a .jpg screen shot showing the error messages I get.
    The Python code generates a .csv file with data dictionary info on a .sav file.
    I was getting one error message originally connected with the line
    spss.Submit(cmd)

    I added the lines:
    import locale
    locale.setlocale(locale.LC_ALL, "english")

    and that got me past that error message.
    Now I'm getting a 2nd error message that's connected with the line
    vMissingValues=spssaux.GetMissingValues(i)
    The entire Python code is:
    1. Write dictionary about variables in an SPSS dataset
    from easygui import *
    import sys
    import spss
    import spssaux

    import locale
    locale.setlocale(locale.LC_ALL, "english")
    msgBox = msgbox(msg="Select the SPSS .sav file of interest", title="Data Dictionary Generator", ok_button="OK")
    spssSavFile = fileopenbox(msg=None, title=None, default=None)
    if spssSavFile == None:
    print "No file selected"
    else:
    cmd = 'GET FILE="'+spssSavFile+'".'
    print "About to run "+ cmd
    spss.Submit(cmd)
    1. spss.Submit()
    print "Got past spss.Submit(cmd)"
    print "GET FILE='"spssSavFile"'."
    outputfile = filesavebox(msg="Where do you want the results saved?", title="Data Dictionary Results", default="DataDictionary.csv")
    varcount=spss.GetVariableCount()
    print varcount
    varInfoAll = []
    dbOrderNum = 0
    for i in xrange(varcount):
    dbOrderNum = dbOrderNum + 1
    varInfo = []
    vVarLabel = spss.GetVariableLabel(i).replace(chr(34),'')
    vFormat1 = spss.GetVariableFormat(i)0:1
    valLabels = spssaux.GetValueLabels(i)

    if len(valLabels) > 0:
    vCodeList=[]
    for vCode in valLabels.keys():
    print "vVarLabel - " + vCode
    if vFormat1 == "F":
    try:
    int(vCode)
    VCodeX = int(vCode)
    except:
    VCodeX = vCode

    vCodeList.append(VCodeX)
    else:
    vCodeList.append(vCode)
    vCodeList.sort()
    vValLabels=chr(34)
    for vCode in vCodeList:
    if vFormat1 == "F":
    vCode2 = str(vCode)
    valLabels2 = valLabelsstr(vCode)
    else:
    vCode2 = vCode
    valLabels2 = valLabelsvCode
    vValLabels = vValLabels + vCode2 + ": " + (valLabels2.strip()).replace(chr(34),'') +"\n"
    vValLabels = vValLabels0:len(vValLabels) - 1 + chr(34)
    else:
    vValLabels="None"
    if vValLabels[1] == "-" and len(vValLabels)>252:
    vValLabels = chr(34) + "minus " + vValLabels2:len(vValLabels)
    vMissingValues=spssaux.GetMissingValues(i)
    if len(vMissingValues) == 0:
    vMissingValuesEdited = ''
    else:
    vMissingValuesEdited = chr(34)+ vMissingValues + chr(34)
    varInfo.extend(http://spss.GetVariableName(i).upper(), vVarLabel, spss.GetVariableFormat(i), vValLabels,vMissingValuesEdited,dbOrderNum)
    varInfoAll.append(varInfo)

    varInfoAll.sort()
    ofile = open(outputfile, 'wb')
    oline = "Variable,Variable Label,Format,Value Labels,Missing Values,Dataset Order Number\n"
    ofile.write(oline)

    for (vName, vLabel, vFormat,vValLabels,vMissingValues,dbOrderNum) in varInfoAll:
    oline = vName + ',' + chr(34) + vLabel + chr(34) + ',' + vFormat + "," + vValLabels + ","vMissingValues","+str(dbOrderNum)+'\n'
    ofile.write(oline)
    ofile.close()
    ****************
    Thanks
    Here is my diagnosis. There was a change made in Statistics 19 to a table that spssaux.GetMissingValues uses. I didn't find out about this until later, so I updated the spssaux.py code too late for it to go into the plugin. The updated module is, however, posted on this site in the Python Modules collection. Download that and replace the one that was installed with the Python Essentials. It is probably in /python26/lib/site-packages/spssaux. Delete the .pyc file to make sure that the new version is recompiled and used. That should fix the problem.

    As for the locale setting, you would normally just set your locale in Stattistics with SET LOCALE. Once set, Statistics remembers this, but if it has never been set, the OS locale setting is used.

    HTH,
    Jon Peck
  • ULandreman
    ULandreman
    6 Posts

    Re: Documentation on spssaux?

    ‏2011-07-27T18:49:52Z  
    Thanks for the info on the latest version of spssaux.py.
    I replaced that version, but the error came back.
    After a little research, it seems that the problem is tied to the size of the .sav file - in particular the number of variables, not the number of cases.

    The Python code works fine with other .sav files which have fewer variables, but crashes, as we've seen, with this particular .sav file which has 3422 variables. The code crashes when it hits variable 2334.

    So, my guess is a memory/buffer size limitation may be the culprit.

    Know any way to up the size of system files from their default settings?

    Thanks.

    Urban
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-27T19:12:26Z  
    Thanks for the info on the latest version of spssaux.py.
    I replaced that version, but the error came back.
    After a little research, it seems that the problem is tied to the size of the .sav file - in particular the number of variables, not the number of cases.

    The Python code works fine with other .sav files which have fewer variables, but crashes, as we've seen, with this particular .sav file which has 3422 variables. The code crashes when it hits variable 2334.

    So, my guess is a memory/buffer size limitation may be the culprit.

    Know any way to up the size of system files from their default settings?

    Thanks.

    Urban
    Interesting. There should not be any size limits here. Can you post a version of the sav file (minus most of the data) so that I can try to reproduce this? Sav files certainly can have far more variables than you have here, but there might be something in the path of the code that is accessing the variable dictionary that has an unintended limit.

    Regards,
    Jon
  • ULandreman
    ULandreman
    6 Posts

    Re: Documentation on spssaux?

    ‏2011-07-27T19:19:26Z  
    I've attached a copy of the .sav file with 10 records in it.
    Let me know if there is anything you want me to try.

    Urban
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-27T19:28:51Z  
    I've attached a copy of the .sav file with 10 records in it.
    Let me know if there is anything you want me to try.

    Urban
    The posting mangled the indentation of the Python code. Can you post it as an attachment?
  • ULandreman
    ULandreman
    6 Posts

    Re: Documentation on spssaux?

    ‏2011-07-27T19:59:40Z  
    Here's the .py file
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-28T02:50:43Z  
    Here's the .py file
    I was able to reproduce the crash. I think this code is exposing a problem in memory management of the xmlworkspace, which the MissingValues code uses. I'll have to dig further to figure this out, but the problem is not in the spssaux.py code or your code.

    Here's what I suggest. There is an api that returns missing values for a variable, but it requires the variable index rather than the name. So first build a dictionary that maps names to index (unless your code has some other way to get this already).

    E.g.,
    
    indexes = 
    {
    } 
    
    for v in range(spss.GetVariableCount()): indexes[spss.GetVariableName(v)] = v
    


    Then in your variable loop, you can use this to get the missing value tuple:
    
    missingvalues = spss.GetVarMissingValues(indexes[z])
    

    where z is a variable name.

    This should get around the problem, and as a bonus, it will be much faster than the current code for this function.

    HTH,
    Jon
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-28T14:03:15Z  
    I was able to reproduce the crash. I think this code is exposing a problem in memory management of the xmlworkspace, which the MissingValues code uses. I'll have to dig further to figure this out, but the problem is not in the spssaux.py code or your code.

    Here's what I suggest. There is an api that returns missing values for a variable, but it requires the variable index rather than the name. So first build a dictionary that maps names to index (unless your code has some other way to get this already).

    E.g.,
    <pre class="jive-pre"> indexes = { } for v in range(spss.GetVariableCount()): indexes[spss.GetVariableName(v)] = v </pre>

    Then in your variable loop, you can use this to get the missing value tuple:
    <pre class="jive-pre"> missingvalues = spss.GetVarMissingValues(indexes[z]) </pre>
    where z is a variable name.

    This should get around the problem, and as a bonus, it will be much faster than the current code for this function.

    HTH,
    Jon
    If I had been more awake when I posted the previous post, I would have noted that there is an alternative api in the spssaux.VariableDict class that already uses the method I outlined.

    Instead of using the MissingValues property, use the MissingValues2 property. It will return a 4-tuple with the missing value spec, but it does not go through the elaborate path that MissingValues uses. This is due to the necessary api not being available when the original code was written (V15).

    Regards,
    Jon
  • ULandreman
    ULandreman
    6 Posts

    Re: Documentation on spssaux?

    ‏2011-07-29T20:19:23Z  
    Thanks for the info about the MissingValues2 property.
    I think I got everything to work now.
    I've attached my final .py file if you're interested.

    Urban
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2011-07-29T22:02:10Z  
    Thanks for the info about the MissingValues2 property.
    I think I got everything to work now.
    I've attached my final .py file if you're interested.

    Urban
    Glad this is working. The original program that fails on Statistics 19 works properly on the upcoming version, but I couldn't pin down exactly what in the program was causing the crash. It appears to be a combination of things - not just the code that the MissingValues function has to use, since running just that part over the dataset works in V19.
    Regards,
    Jon
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-01T12:17:47Z  
    Thanks for the info about the MissingValues2 property.
    I think I got everything to work now.
    I've attached my final .py file if you're interested.

    Urban
    Hi,

    I'm trying to customize this python module created by Urban such that each value label is in a single cell and across the columns. And also that the value labels are exported such that they are in the format VALUECODE "VALUELABEL" (so that I can copy and paste this if I ever want to use it to re-label the values via syntax). I am stuggling with getting them across the columns and then ensuring that if I have a comma in the label itself it is not treated as a deliminter.

    Attached an example of desired output. Any help would be much apprecaited.

    Thanks
    Jignesh
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-01T15:00:21Z  
    Hi,

    I'm trying to customize this python module created by Urban such that each value label is in a single cell and across the columns. And also that the value labels are exported such that they are in the format VALUECODE "VALUELABEL" (so that I can copy and paste this if I ever want to use it to re-label the values via syntax). I am stuggling with getting them across the columns and then ensuring that if I have a comma in the label itself it is not treated as a deliminter.

    Attached an example of desired output. Any help would be much apprecaited.

    Thanks
    Jignesh
    First, please don't attach a completely new question to an existing thread. Your question has nothing to do with spssaux documentation.

    Here is code that will create a csv file such as the one you attached, assuming that the data file is already open. You might want to make this into a function that takes the file name as its argument. You can customize the csv delimiters by modifying the csv.writer call. See the Python help on csv for details.

    
    
    
    import spssaux 
    
    import csv   output = 
    "c:/temp/dictinfo.csv"   vardict = spssaux.VariableDict() f = open(output, 
    "wb") csvwriter = csv.writer(f, quoting=csv.QUOTE_ALL)   
    
    for v in vardict: csvwriter.writerow([v.VariableName, v.VariableLabel] + \ [item 
    
    for parts in v.ValueLabels.items() 
    
    for item in parts]) f.close()
    
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-01T16:17:03Z  
    First, please don't attach a completely new question to an existing thread. Your question has nothing to do with spssaux documentation.

    Here is code that will create a csv file such as the one you attached, assuming that the data file is already open. You might want to make this into a function that takes the file name as its argument. You can customize the csv delimiters by modifying the csv.writer call. See the Python help on csv for details.

    <pre class="jive-pre"> import spssaux import csv output = "c:/temp/dictinfo.csv" vardict = spssaux.VariableDict() f = open(output, "wb") csvwriter = csv.writer(f, quoting=csv.QUOTE_ALL) for v in vardict: csvwriter.writerow([v.VariableName, v.VariableLabel] + \ [item for parts in v.ValueLabels.items() for item in parts]) f.close() </pre>
    Hi Jon! Thanks!

    The code you have provided doesn't quite provide the value labels in the format I intended. Currently it is splitting out the value and corresponding label into different cells. I wanted to combine them both into a single cell and with the label wrapped around double quotes.

    I've tried playing around with the code but to no avail. If I could get some help with this, then I could implement it into the main code, where the value labels are ordered into the same order as the dataset (I hope!)
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-01T16:22:19Z  
    Hi Jon! Thanks!

    The code you have provided doesn't quite provide the value labels in the format I intended. Currently it is splitting out the value and corresponding label into different cells. I wanted to combine them both into a single cell and with the label wrapped around double quotes.

    I've tried playing around with the code but to no avail. If I could get some help with this, then I could implement it into the main code, where the value labels are ordered into the same order as the dataset (I hope!)
    I've just noticed the example of the employee datamap I had attached had the incorrect value codes (that's a bugger & I need to check how that originated) however the format of that is how I want to export the labels. ie. code followed by space and the label wrapped around double quotes.
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-01T17:22:44Z  
    I've just noticed the example of the employee datamap I had attached had the incorrect value codes (that's a bugger & I need to check how that originated) however the format of that is how I want to export the labels. ie. code followed by space and the label wrapped around double quotes.
    You realize, I hope, that the format you want to use is ambiguous. If you have a string variable where the value can contain a blank and it has a label, you won't know where the value stops and the label starts.

    But this code will give you the merged cells.
    
    
    
    import spss, spssaux 
    
    import csv   vardict = spssaux.VariableDict() output = 
    "c:/temp/dictinfo.csv" f = open(output, 
    "wb") csvwriter = csv.writer(f, quoting=csv.QUOTE_ALL)   
    
    for v in vardict: vlabels = [item 
    
    for parts in v.ValueLabels.items() 
    
    for item in parts] vlabels = [
    " ".join([vlabels[i], vlabels[i+1]]) 
    
    for i in range(0, len(vlabels), 2)] csvwriter.writerow([v.VariableName, v.VariableLabel] + vlabels) f.close()
    
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-03T18:57:41Z  
    You realize, I hope, that the format you want to use is ambiguous. If you have a string variable where the value can contain a blank and it has a label, you won't know where the value stops and the label starts.

    But this code will give you the merged cells.
    <pre class="jive-pre"> import spss, spssaux import csv vardict = spssaux.VariableDict() output = "c:/temp/dictinfo.csv" f = open(output, "wb") csvwriter = csv.writer(f, quoting=csv.QUOTE_ALL) for v in vardict: vlabels = [item for parts in v.ValueLabels.items() for item in parts] vlabels = [ " ".join([vlabels[i], vlabels[i+1]]) for i in range(0, len(vlabels), 2)] csvwriter.writerow([v.VariableName, v.VariableLabel] + vlabels) f.close() </pre>
    Hi Jon, Thanks for your help. I've managed to get the code running. However, on this occasion, it seems to be much quicker to run a DISPLAY DICTIONARY capture the information via OMS, apply necessary transformation to get into to desired format and SAVE TRANSLATE to csv.

    I think the python code to extract the value labels is very time consuming?

    Is there any workaround in the python approach to speed things up? Or is the OMS approach best for large dataset? I have 15,000+ variables, with certain variables taking 300+ values.

    The OMS approach takes around 5 minutes whereas the python approach close to 30mins.
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Documentation on spssaux?

    ‏2013-02-03T20:21:48Z  
    Hi Jon, Thanks for your help. I've managed to get the code running. However, on this occasion, it seems to be much quicker to run a DISPLAY DICTIONARY capture the information via OMS, apply necessary transformation to get into to desired format and SAVE TRANSLATE to csv.

    I think the python code to extract the value labels is very time consuming?

    Is there any workaround in the python approach to speed things up? Or is the OMS approach best for large dataset? I have 15,000+ variables, with certain variables taking 300+ values.

    The OMS approach takes around 5 minutes whereas the python approach close to 30mins.
    That's a really big dictionary. The spssaux method will be slow in that case. You can get much faster performance there by using the spss.Dataset class, which was developed after the spssaux.VariableDict class was created.