Topic
8 replies Latest Post - ‏2009-12-02T11:28:18Z by SystemAdmin
SystemAdmin
SystemAdmin
2077 Posts
ACCEPTED ANSWER

Pinned topic Unicode & the spss api

‏2009-11-27T16:27:35Z |
(sorry if I posted this twice, I didn't see my previous post)

I am trying the open an excel sheet using GET DATA using spss.Submit. The sheetname is "enquête". This causes a TypeError, because of the unicode character.
What is the easiest way to deal with this? I made the function below, but I was hoping for a shorter solution, with eg. SET UNICODE ON perhaps?

Thanks in advance!
Albert-Jan()



def replace_chars(): trans = 
{
} funnychars = u
"éèêëóòôöáàâäúùüûÉÈÊËÓÒÔÖÁÀÂÄÚÙÜÛ" asciichars = 
"eeeeooooaaaauuuuEEEEOOOOAAAAUUUU" 

for f, a in zip(funnychars, asciichars): trans[ord(f)] = ord(a) 

return trans   ... (code omitted)   except UnicodeEncodeError: sheet_name = sheet.name sheet_name = str(sheet_name.translate(replace_chars())) ... (code omitted)

Updated on 2009-12-02T11:28:18Z at 2009-12-02T11:28:18Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode & the spss api

    ‏2009-11-30T16:09:05Z  in response to SystemAdmin
    Where is the error arising? In general, being in Unicode mode makes things go better, but it shouldn't be necessary in this case. However, if the sheet name is exposed as a literal in your Python code and the source encoding is not marked as cp1252 or something similar, it will cause trouble when the literal is compiled.


    What version of SPSS are you on? V14 doesn't support Unicode.


    Regards,

    Jon Peck
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode & the spss api

    ‏2009-11-30T19:00:56Z  in response to SystemAdmin
    hi Jon,

    Thank you for your reply. The error is arising when I do something like
    sheet_name = "Enquête"
    spss.Submit(""""
    GET DATA /TYPE=XLS
       /FILE='C:/temp/Map1.xls'
       /SHEET=name '%s'
       /CELLRANGE=full
       /READNAMES=on
       /ASSUMEDSTRWIDTH=32767.% sheet_name """)

    Normally I'd loop through the sheets and let xlrd get the sheet names for me, but basically the code above is equivalent. Sheet_name is in unicode and since it contains an ordinal > 128, I can't simply convert it to str. Spss.submit generates a TypeError because of the fact that sheet_name is in unicode.

    So my approach was to ditch the accents using the transformation table shown in the original post, then convert the sheet_name to str. I was hoping for a simpler solution than writing the transformation table.

    Best wishes,
    Albert-Jan
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode & the spss api

    ‏2009-11-30T20:13:16Z  in response to SystemAdmin
    In the code above, the sheet_name literal is not in Unicode as far as Python is concerned. You would need to write the literal prefixed with u. But are you still using a pre-Unicode version of SPSS?


    You can transcribe the syntax into code page 1252 using a codec. That will get rid of the type error problem.


    Suppose your entire Submit string is a Unicode string named command. Then use (after import codecs)

    spss.Submit(codecs.encode(command, "cp1252"))


    That will convert the text to code page 1252, which contains all the normal accented roman characters.


    HTH,

    Jon
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode & the spss api

    ‏2009-11-30T22:20:01Z  in response to SystemAdmin
    Hi again Jon,

    Thank you! I followed your advice, but the code still yields exceptions. I attached the complete code, including your suggestions. If I inactivate the following line (line 31 in the code), the code gives the errors shown below:
    sheet_name = str(sheet_name.translate(replace_chars()))

    <type 'unicode'> ###
    Blêd1 --- UnicodeError!
    Traceback (most recent call last):
    File "C:\Documents and Settings\Administrator\Bureaublad\test3.py", line 43, in -toplevel-
    merge_sheets2sav(xls = "d:/temp/out.xls")
    File "C:\Documents and Settings\Administrator\Bureaublad\test3.py", line 38, in merge_sheets2sav
    get_data(xls_tmp, sheet_name)
    File "C:\Documents and Settings\Administrator\Bureaublad\test3.py", line 16, in get_data
    spss.Submit("dataset name %s." % sheet_name)
    File "C:\Python24\lib\site-packages\spss\spss150\spss.py", line 1124, in Submit
    raise SpssError,error
    SpssError: errLevel 1004 Invalid argument type.

    Maybe I have to read up on codecs. I've been doing some reading about it today and it's quite interesting.

    Best wishes,
    Albert-Jan
    
    # -*- coding: cp1252 -*- 
    
    import xlrd, xlwt, os.path, spss, codecs   def replace_chars(): trans = 
    {
    } funnychars = u
    "éèêëóòôöáàâäúùüûÉÈÊËÓÒÔÖÁÀÂÄÚÙÜÛ" asciichars = 
    "eeeeooooaaaauuuuEEEEOOOOAAAAUUUU" 
    
    for f, a in zip(funnychars, asciichars): trans[ord(f)] = ord(a) 
    
    return trans   def get_data(xls, sheet_name): command = 
    ""
    "get data /type=xls /file='%s' /sheet=name '%s' /cellrange=full /readnames=on /assumedstrwidth=32767.\n
    ""
    " % (xls, sheet_name) spss.Submit(codecs.encode(command, 
    "cp1252")) spss.Submit(
    "dataset name %s." % sheet_name)   def merge_sheets2sav(xls): xls_tmp  = xls[:-4] + 
    ".tmp" wb = xlrd.open_workbook(xls) wb_tmp = xlwt.Workbook() addfilescmd = [
    "add files"] 
    
    for sheet in wb.sheets(): 
    
    try: print type(sheet.name), 
    "###" sheet_name = str(sheet.name) get_data(xls, sheet_name) except UnicodeEncodeError: sheet_name = sheet.name print sheet_name, 
    "--- UnicodeError! ###" sheet_name = str(sheet_name.translate(replace_chars())) ws = wb_tmp.add_sheet(sheet_name) 
    
    for row in range(sheet.nrows): 
    
    for col in range(sheet.ncols): cell = sheet.cell(row, col) ws.write(row, col, cell.value) wb_tmp.save(xls_tmp) get_data(xls_tmp, sheet_name) addfilescmd.append(
    " /file = " + sheet_name) spss.Submit(
    " ".join(addfilescmd) + 
    ".") out = os.path.dirname(xls) + 
    "/out.sav" spss.Submit(
    "save outfile = '%s'." % out) merge_sheets2sav(xls = 
    "d:/temp/out.xls")
    

  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode &#38; the spss api

    ‏2009-11-30T22:22:29Z  in response to SystemAdmin
    gosh, sorry, replies look so messy! No hard returns! Is it Firefox?

    By the way, I'm using Spss v14 in the office and v15 at home.

    Albert-Jan
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode &#38; the spss api

    ‏2009-12-01T02:02:13Z  in response to SystemAdmin
    Could you email the code to me (peck@us.ibm.com). I'm having trouble scraping it out of the window.


    BTW, for better editing behavior, use the Reply link rather than the Reply button to bring up a pseudo-html window rather than just typing the reply. That way you can attach files, too.
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode &#38; the spss api

    ‏2009-12-01T04:42:30Z  in response to SystemAdmin
    Looking again, I see that you have transcoded everything EXCEPT the critical thing: sheetname. Create your command string with the sheetname parameter substituted in and then encode the resulting string.
  • SystemAdmin
    SystemAdmin
    2077 Posts
    ACCEPTED ANSWER

    Unicode &#38; the spss api

    ‏2009-12-02T11:28:18Z  in response to SystemAdmin
    Hi Jon,

    Still, I can't get the program to work without using the replace_chars() function. Or did I misunderstand your advice? I will send you the code off-list.

    Best wishes,
    Albert-Jan