Topic
  • 10 replies
  • Latest Post - ‏2012-10-24T18:06:27Z by SystemAdmin
SystemAdmin
SystemAdmin
2077 Posts

Pinned topic Modify Value Labels with Python

‏2012-10-18T20:43:01Z |
Hi all,

I have a data situation where my data file is littered with HTML tags in Variable Labels and Value Labels. Otherwise, all the information is fine.

I am able to strip the HTML tags from Variable Labels using Python and regex fairly simply:

BEGIN PROGRAM .
  1. clean names/labels
vdict=spssaux.VariableDict()
nvars = len(vdict)

for i in range(nvars):
myvarname = vdict[i].VariableName
myvarlabel = vdict[i].VariableLabel
myvarlabel = re.sub(r'<^>*>','',myvarlabel)
myvarlabel = re.sub(r'&','&',myvarlabel)
myvarlabel = re.sub(r' ',' ',myvarlabel)
spss.Submit(r"""VARIABLE LABELS %s "%s" .""" %(myvarname, myvarlabel))
END PROGRAM .

Is there a similar method that can be used to modify value labels? (Also, is there a method that can modify SPSS variable dictionary information without generating SPSS syntax?)

Thanks,

Dan
Updated on 2012-10-24T18:06:27Z at 2012-10-24T18:06:27Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-18T20:47:59Z  
    Looks like the submit window garbled my code, the above code is correctly displayed in the attached sps syntax file.
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-18T20:55:43Z  
    You can bypass syntax entirely in both cases. Use the spss.Dataset class or the spssdata.Spssdata class. You can get and set all the metadata properties in the SPSS variable dictionary. For value labels in particular, you get the value labels as a Python dictionary, and you can modify it and then assign the modified dictionary back.

    HTH,
    Jon Peck
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-19T12:41:33Z  
    You can bypass syntax entirely in both cases. Use the spss.Dataset class or the spssdata.Spssdata class. You can get and set all the metadata properties in the SPSS variable dictionary. For value labels in particular, you get the value labels as a Python dictionary, and you can modify it and then assign the modified dictionary back.

    HTH,
    Jon Peck
    Thanks Jon.

    I will explore using that.

    Thanks,

    Dan
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-23T18:37:57Z  
    Thanks Jon.

    I will explore using that.

    Thanks,

    Dan
    Hi Jon,

    While I was able to figure out how to modify Variable Labels using DataStep(), I am still struggling with modifying Value Labels.

    I have attached SPS files with the python blocks (I import the spss modules on SPSS startup).

    My main issue is that when I print out the Value Labels, it appears as a python dictionary, but I am having trouble iterating over the dictionary items in SPSS. When I run the attached code I get the following error message:

    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    File "C:\Python26\lib\site-packages\spss190\spss\dataStep.py", line 2272, in __getitem__
    return self.dataattvalue
    KeyError: 0


    I created a .py script and the logic seems correct and I include what I am trying to do below:

    PYTHON LOGIC
    import re
    b = {1.0: 'Poor
    1', 2.0: '2', 3.0: '3', 4.0: '4', 5.0: '5', 6.0: '6', 7.0: 'Excellent
    7', 8.0: 'Not sure'}

    for key in b:
        bkey = re.sub(r'<^>*>','',bkey)

    print a

    Any guidance here would be greatly appreciated. Please let me know if you need additional info or if the question is not clear.

    Thanks,

    Dan
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-24T00:50:01Z  
    Hi Jon,

    While I was able to figure out how to modify Variable Labels using DataStep(), I am still struggling with modifying Value Labels.

    I have attached SPS files with the python blocks (I import the spss modules on SPSS startup).

    My main issue is that when I print out the Value Labels, it appears as a python dictionary, but I am having trouble iterating over the dictionary items in SPSS. When I run the attached code I get the following error message:

    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    File "C:\Python26\lib\site-packages\spss190\spss\dataStep.py", line 2272, in __getitem__
    return self.dataattvalue
    KeyError: 0


    I created a .py script and the logic seems correct and I include what I am trying to do below:

    PYTHON LOGIC
    import re
    b = {1.0: 'Poor
    1', 2.0: '2', 3.0: '3', 4.0: '4', 5.0: '5', 6.0: '6', 7.0: 'Excellent
    7', 8.0: 'Not sure'}

    for key in b:
        bkey = re.sub(r'<^>*>','',bkey)

    print a

    Any guidance here would be greatly appreciated. Please let me know if you need additional info or if the question is not clear.

    Thanks,

    Dan
    I can't tell exactly what you are trying to do, since the code is mangled on the page, but
    -a is undefined
    -what is bkey? It isn't assigned to anything
    -is the code is supposed to read
    
    b[key] = something
    

    ?
    -try the loop a little differently.
    
    
    
    for key, value in b.items(): b[key] = re.sub(...., value)
    


    Regards,
    Jon
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-24T02:29:24Z  
    I can't tell exactly what you are trying to do, since the code is mangled on the page, but
    -a is undefined
    -what is bkey? It isn't assigned to anything
    -is the code is supposed to read
    <pre class="jive-pre"> b[key] = something </pre>
    ?
    -try the loop a little differently.
    <pre class="jive-pre"> for key, value in b.items(): b[key] = re.sub(...., value) </pre>

    Regards,
    Jon
    Hi Jon,

    Thanks for getting back.

    Sorry, my message was not clear. I am attaching the .py script since it garbled.

    The py script works, I was including it as an example of what I am trying to achieve with value labels code in the sps file.

    Basically, I am able to capture the value labels from the SAV file as a python dictionary using DataStep. However, I am having trouble iterating through the dictionary.

    The code I am trying and getting the error with is attached to the previous message and I've tried to include below, but may get garbled:

    BEGIN PROGRAM .
    with spss.DataStep():
        dataset = spss.Dataset()
        variable_list = dataset.varlist

        b = variable_list['q_1'].valueLabels
        
        for key in b:
            variable_list['q_1'].valueLabels[key] = re.sub(expression,'',b)

    END PROGRAM .

    When I modify the code to "key, value in b.items" I get a different error:
    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    AttributeError: ValueLabel instance has no attribute 'iteritems'

    I believe the problem is how I am trying to access the dictionary within DataStep
    variable_list['q_1'].valueLabels[key]
    but am not sure what the correct approach would be.
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-24T17:57:25Z  
    Hi Jon,

    Thanks for getting back.

    Sorry, my message was not clear. I am attaching the .py script since it garbled.

    The py script works, I was including it as an example of what I am trying to achieve with value labels code in the sps file.

    Basically, I am able to capture the value labels from the SAV file as a python dictionary using DataStep. However, I am having trouble iterating through the dictionary.

    The code I am trying and getting the error with is attached to the previous message and I've tried to include below, but may get garbled:

    BEGIN PROGRAM .
    with spss.DataStep():
        dataset = spss.Dataset()
        variable_list = dataset.varlist

        b = variable_list['q_1'].valueLabels
        
        for key in b:
            variable_list['q_1'].valueLabels[key] = re.sub(expression,'',b)

    END PROGRAM .

    When I modify the code to "key, value in b.items" I get a different error:
    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    AttributeError: ValueLabel instance has no attribute 'iteritems'

    I believe the problem is how I am trying to access the dictionary within DataStep
    variable_list['q_1'].valueLabels[key]
    but am not sure what the correct approach would be.
    The valueLabels object that you get back in your code is not a dictionary. It is an object that contains a dictionary of value labels, among other things.

    You can access the value labels as
    for key, value in b.data.items()

    HTH,
    Jon Peck
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-24T17:57:42Z  
    Hi Jon,

    Thanks for getting back.

    Sorry, my message was not clear. I am attaching the .py script since it garbled.

    The py script works, I was including it as an example of what I am trying to achieve with value labels code in the sps file.

    Basically, I am able to capture the value labels from the SAV file as a python dictionary using DataStep. However, I am having trouble iterating through the dictionary.

    The code I am trying and getting the error with is attached to the previous message and I've tried to include below, but may get garbled:

    BEGIN PROGRAM .
    with spss.DataStep():
        dataset = spss.Dataset()
        variable_list = dataset.varlist

        b = variable_list['q_1'].valueLabels
        
        for key in b:
            variable_list['q_1'].valueLabels[key] = re.sub(expression,'',b)

    END PROGRAM .

    When I modify the code to "key, value in b.items" I get a different error:
    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    AttributeError: ValueLabel instance has no attribute 'iteritems'

    I believe the problem is how I am trying to access the dictionary within DataStep
    variable_list['q_1'].valueLabels[key]
    but am not sure what the correct approach would be.
    The valueLabels object that you get back in your code is not a dictionary. It is an object that contains a dictionary of value labels, among other things.

    You can access the value labels as
    for key, value in b.data.items()

    HTH,
    Jon Peck
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-24T18:06:20Z  
    Hi Jon,

    Thanks for getting back.

    Sorry, my message was not clear. I am attaching the .py script since it garbled.

    The py script works, I was including it as an example of what I am trying to achieve with value labels code in the sps file.

    Basically, I am able to capture the value labels from the SAV file as a python dictionary using DataStep. However, I am having trouble iterating through the dictionary.

    The code I am trying and getting the error with is attached to the previous message and I've tried to include below, but may get garbled:

    BEGIN PROGRAM .
    with spss.DataStep():
        dataset = spss.Dataset()
        variable_list = dataset.varlist

        b = variable_list['q_1'].valueLabels
        
        for key in b:
            variable_list['q_1'].valueLabels[key] = re.sub(expression,'',b)

    END PROGRAM .

    When I modify the code to "key, value in b.items" I get a different error:
    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    AttributeError: ValueLabel instance has no attribute 'iteritems'

    I believe the problem is how I am trying to access the dictionary within DataStep
    variable_list['q_1'].valueLabels[key]
    but am not sure what the correct approach would be.
    Hi Jon,

    I've gotten it to work. Thanks for your help on this as you got me pointed in the right direction.

    Below is the syntax that I was able to use. I am not sure if it is the most elegant or efficient code, but it does work.

    
    BEGIN PROGRAM. from __future__ 
    
    import with_statement 
    
    import spss with spss.DataStep(): dataset = spss.Dataset() variable_list = dataset.varlist   dvals = [] dlabs = [] 
    
    for key in variable_list[
    'q_1'].valueLabels.data: a = re.sub(r
    '<[^>]*>',
    '',variable_list[
    'q_1'].valueLabels.data[key]) print variable_list[
    'q_1'].valueLabels.data[key] dvals.append(key) dlabs.append(a)   variable_list[
    'q_1'].valueLabels = dict(zip(dvals,dlabs)) END PROGRAM .
    


    I have also attached an .sps file in case the above gets mangled above.

    The two big pieces for me were:
    1. To access the information as a python dictionary, it is necessary to add ".data" to the valueLabels object.
    2. I wasn't able to modify any elements of the dictionary. I had to create a new dictionary and replace the entire dictionary.

    Let me know if I am off on these claims, and thanks again for your help.
  • SystemAdmin
    SystemAdmin
    2077 Posts

    Re: Modify Value Labels with Python

    ‏2012-10-24T18:06:27Z  
    Hi Jon,

    Thanks for getting back.

    Sorry, my message was not clear. I am attaching the .py script since it garbled.

    The py script works, I was including it as an example of what I am trying to achieve with value labels code in the sps file.

    Basically, I am able to capture the value labels from the SAV file as a python dictionary using DataStep. However, I am having trouble iterating through the dictionary.

    The code I am trying and getting the error with is attached to the previous message and I've tried to include below, but may get garbled:

    BEGIN PROGRAM .
    with spss.DataStep():
        dataset = spss.Dataset()
        variable_list = dataset.varlist

        b = variable_list['q_1'].valueLabels
        
        for key in b:
            variable_list['q_1'].valueLabels[key] = re.sub(expression,'',b)

    END PROGRAM .

    When I modify the code to "key, value in b.items" I get a different error:
    Traceback (most recent call last):
    File "<string>", line 9, in <module>
    AttributeError: ValueLabel instance has no attribute 'iteritems'

    I believe the problem is how I am trying to access the dictionary within DataStep
    variable_list['q_1'].valueLabels[key]
    but am not sure what the correct approach would be.
    Hi Jon,

    I've gotten it to work. Thanks for your help on this as you got me pointed in the right direction.

    Below is the syntax that I was able to use. I am not sure if it is the most elegant or efficient code, but it does work.

    
    BEGIN PROGRAM. from __future__ 
    
    import with_statement 
    
    import spss with spss.DataStep(): dataset = spss.Dataset() variable_list = dataset.varlist   dvals = [] dlabs = [] 
    
    for key in variable_list[
    'q_1'].valueLabels.data: a = re.sub(r
    '<[^>]*>',
    '',variable_list[
    'q_1'].valueLabels.data[key]) print variable_list[
    'q_1'].valueLabels.data[key] dvals.append(key) dlabs.append(a)   variable_list[
    'q_1'].valueLabels = dict(zip(dvals,dlabs)) END PROGRAM .
    


    I have also attached an .sps file in case the above gets mangled above.

    The two big pieces for me were:
    1. To access the information as a python dictionary, it is necessary to add ".data" to the valueLabels object.
    2. I wasn't able to modify any elements of the dictionary. I had to create a new dictionary and replace the entire dictionary.

    Let me know if I am off on these claims, and thanks again for your help.