Topic
  • 18 replies
  • Latest Post - ‏2009-07-06T17:07:52Z by SystemAdmin
SystemAdmin
SystemAdmin
881 Posts

Pinned topic Spss Odbc Driver - International Characters

‏2009-04-07T00:35:15Z |
I'm using the standalone ODBC driver for spss (32- and 64-bit depending on machine), wíth the default settings and all.

Using the odbc driver works well for querying spss files, but I've run into an issue.
When fetching data from the value labels tables, those lines containing "international chars" returns empty.

Looking at the trace log, we see that it returns with error code 100 - No data
Any ideas? Is is possible to specify with encoding to use when reading from file or am I screwed?

devenv          e78-2684     ENTER SQLGetData HSTMT               00CDDEF8 UWORD                        2 SWORD                       -8 <SQL_C_WCHAR> PTR                 0x14229800 SQLLEN                  4094 SQLLEN *            0x1C82EC70   devenv          e78-2684      EXIT  SQLGetData  with 

return code 0 (SQL_SUCCESS) HSTMT               00CDDEF8 UWORD                        2 SWORD                       -8 <SQL_C_WCHAR> PTR                 0x14229800 SQLLEN                  4094 SQLLEN *            0x1C82EC70 (0)   devenv          e78-2684       ENTER SQLFetch HSTMT               00CDDEF8   devenv          e78-2684     EXIT  SQLFetch  with 

return code 100 (SQL_NO_DATA_FOUND) HSTMT               00CDDEF8   devenv          e78-2684  ENTER SQLCancel HSTMT               00CDDEF8   devenv          e78-2684    EXIT  SQLCancel  with 

return code 0 (SQL_SUCCESS) HSTMT               00CDDEF8   devenv          e78-2684 ENTER SQLFreeHandle SQLSMALLINT                  3 <SQL_HANDLE_STMT> SQLHANDLE           00CDDEF8   devenv          e78-2684 EXIT  SQLFreeHandle  with 

return code 0 (SQL_SUCCESS) SQLSMALLINT                  3 <SQL_HANDLE_STMT> SQLHANDLE           00CDDEF8   devenv          e78-2684      ENTER SQLDisconnect HDBC                08BE4090

  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-07T02:46:17Z  
    The driver should be able to handle this, depending on what version you are using. If you have a current version (or maybe one back), you can set the encoding and/or Unicode mode. You will find the details in the api documentation. Older versions might be picking up the wrong encoding.


    HTH,

    Jon Peck
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-07T07:03:25Z  
    Thanks for reply!

    Just for the record; I'm using version 17 of the driver.
    I'll check documentation...

    //Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-07T11:00:35Z  
    Hi again,

    I've been searching high and low how to get this working, but no results so far.

    You don't have any more concrete information you can provide?

    //Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-07T15:50:34Z  
    Sorry, I was thinkint that you were using the i/o dll, but you are using the ODBC driver. Can you tell us more about how you are using it? What happens if, say, you read a sav file into Excel with it?


    I'm not sure how the current version determines the encoding, but sav files created since SPSS version 15 have the encoding recorded internally, so that should be used. Older files would probably assume the system code page. Have you tried with a freshly created sav file?


    If you have a nonconfidential current sav file that doesn't work, please send it (peck@spss.com), and we'll investigate.


    Regards,

    Jon Peck
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-07T19:56:13Z  
    Thanks for taking interest.

    I've prepared a spss file with two variables. Now, this file doesn't contain any case data, but I'm sure this doesn't
    matter as this problem occurs on all spss files with international chars, whether they contain case data or not.
    The first variable, "TestVarENG" contains three value values and works perfect as it only contains english characters.


    The second variable, "TestVarINT" contains five value labels, where only valuelabel 3 and 4 is fetched because
    they are english-only. Valuelabel 1, 2, 5 is fetched as empty labels as they contains "international chars".


    So to use this with the odbc driver, I created a System DSN with following properties:

    Data Source Name & Description = "spss",
    Service Name = "StatisticsSAVDriverStandalone",
    Server Data Source = "SAVDB",
    Statistics Data File Name = "c:\spssTestFile.sav"



    If a use this DSN to import data into Excel, or Spss Statistics itself,
    and select everything from table spsstestfile.VLVARTestVarINT (which is the table containing the value labels)
    the result is this:

    1.00 = {empty string}
    2.00 = {empty string}
    3.00 = "Working label"
    4.00 = "Another working"
    5.00 = {empty string}
    (of course, "{empty string} is shown just as a empty field)

    Apart from testing this with Excel and Spss, I have also a simple test tool written in c#
    which just establish a odbc connection and fetch data from a table in the spss file.
    It too shows the same symptom.

    The spss odbc driver has it's default configuration, and running on Vista 32-bit English,
    and Vista 64-bit English (with the 64 bit version of the odbc driver).


    I'm not sure if there's anything else I can provide you with too
    help you at this stage, but please let me know.

    I'll mail you the test file also


    Appreciate your help!
    //Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-12T22:43:37Z  
    We have retested the ODBC access with the current driver, and it seems to be working. Here's what the engineer said.


    have been testing this today with version 6.0.2.16 of the standalone sav odbc driver (circa 17.0.2).


    It is working well for me in both Statistics and excel.


    And,

    This is posted on the spss website. This one is actually version 6.0.0.17, which is circa 17.0.0.

    http://www.spss.com/drivers/clientSPSSOA.htm win32 standalone odbc driver


    When viewing cars.sav a doctored version with extended ascii characters inserted in value labels with the data file odbc driver, it has 7 tables.





    1. cars.Cases Data view of data


    2. cars.CasesView Data view of data, but value labels are replaced for numbers where applicable.


    3. cars.Properties data set encoding


    4. cars.Variables Data Dictionary, also variable view of data set.


    5. cars.VLVARcylinder Value label map for the cylinder variable


    6. cars.VLVARfilter_$ Value label map for the filter_$ variable


    7. cars.VLVARorigin Value label map for the origin variable




    Might you have an old driver? Installing new SPSS versions doesn't necessarily update the driver.


    Regards,

    Jon
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-15T19:51:21Z  
    Hi, and thanks for your reply

    My odbc driver version is 6.00.02.33, so that shouldn't be the problem.

    Did you try, and were able to open and read the file I sent you?

    I noticed that if I changed the Character Encoding that should be used from "Locale's
    writing system" (default selection) to "Unicode" and re-saved the file(s) that
    didn't work, I were able to read them with odbc driver with all characters intact.

    But as you probably understand we can't ask our customers to re-save all their
    files with Unicode just so we can read them with odbc. So the question
    remains; is there a way to force the driver to read the files as Unicode? I
    mean, theoretically it should work, as SPSS Statistics itself can read those
    non-unicode files natively.

    I thankful for your help so far, just hoping we can sort this out.

    Regards,
    Mårten

  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-15T20:48:47Z  
    We had no trouble with the file or similar ones. That Unicode works suggests that the assumption being made about the character set in the file is going wrong in some cases. For newish sav files, the character encoding is recorded in the sav file and is used when the file is read. If you resave a code page file that is older, it should record the character set. Does such a file work?


    If the file does not have a character encoding, then the current locale encoding is assumed. That would typically be the process locale (or the SPSS locale within SPSS). If you have not set any locale in your program, it is probably running in the C locale and would fail with extended characters. If you set the process locale explicitly consistent with the code page of the file, that should be used.


    SPSS never runs in the C locale, so reading an older sav file will always be assuming something other than C.


    HTH,

    Jon
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-19T19:05:07Z  
    To answer your question; taking an old spss file and save it in Unicode mode works fine with odbc, and I can also see that
    the table Properties contains the value "UTF-8" under "Encoding" column. The other files that doesn't work instead
    have the value "Windows-1252".

    On the test machine I've set the system locale to Swedish just to be on the safe side, and also set that in my test c#
    program, but the result is no different.
    The tool "odbcisql.exe" that is included with the driver also reports Swedish as the current locale.

    To be honest I'm starting running out of ideas how to find the solution.

    I'm using SysInternal's Process Monitor to trace which files and registry keys the odbc driver is reading from, and can see
    that it looks for keys like "CP_Encoding", but there's seems to be no difference whether I set this to "UTF-8" or a
    completely faulty value like "dummyencoding", so doesn't the driver really use these registry keys?

    Also I've looked at the different workarounds-codes but none seem to apply to my problem.

    Any tips on how to proceed?

    Regards,
    Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-04-19T19:17:14Z  
    Once you resave the file in Unicode mode, it is marked with the encoding. Similarly a file from v15 or later should be marked with a code page encoding, and the driver should be using that. For older files, within SPSS, it would use the SPSS locale. If you are in your own program, the driver would use the locale of your program. If you haven't set that, it would be generally be the C locale, and things would not work. Have you tried explicitly setting your program's locale to Swedish or any other western European locale? Check your process locale to be sure.


    I'll be away most of this week and probably unable to pursue this until I return.


    Regards,

    Jon
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-05-04T08:33:58Z  
    Hi again,

    I was away last week and wasn't able to investigate further.

    Yes, I've set the locale of my test tool to swedish, and it made no difference.
    Is it possible to debug the spss driver in any way to see which locale it picked up?

    Otherwise, I'm starting to run out of ideas how to proceed with this.

    I've also noticed that the driver is searching for windows registry settings that aren't specified in the
    documentation, like cp_encoding for example. Does the driver actually use these values, so it's possible to
    force it to use specified encoding?

    Regards,
    Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-05-06T20:48:14Z  
    Is it possible to get the log file generated by ODBC driver? These are located at:


    C:\Documents and Settings\<user>\Application Data\SPSSInc\SPSSStatistics17DataFileDriver\Standalone\logging\*.log





    What is the output for below SQL command using Statistics Data file ODBC driver, this should give the type of the encoding used in SAV file?


    SELECT Encoding FROM Properties;





    SAV driver always convert to Unicode characters, so if encoding in SAV file is not 'UTF-8' then I think it's giving problem when it converts to UTF-8 using UTF8DataSource.


  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-05-14T00:00:01Z  
    Hi Again,

    I can only give you a partial answer at this moment as I'm facing some computer issues.

    But I know for a fact that the Encoding column in Properties returns "windows-1252"
    if the file as saved with the "Locale's writing system" option selected in spss
    (in which case the odbc driver can't read international chars).

    If the file instead is saved with the "Unicode" option the Encoding correctly returns
    "UTF-8" and the file is read sucessfully.


    Should I interpret your last statement as we're gonna have a hard time using the
    odbc driver for the files with windows-1252 encoding?

    Regards,
    Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-05-28T11:36:14Z  
    Hi,

    I've sent you a mail with some output from the tracing, along with some other information.

    Hope you'll have some use for it.

    Regards,
    Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-06-30T16:21:50Z  
    Just wanna bump this thread and see if you have had any time to look at this?

    Or can you give me any advice for which direction I may take?

    Currently we're using the spssio.dll to import data from sav-files to database.
    Frankly, this works very bad in multiuser environment. It's used in a
    webapplication where users uploads their files, but for whatever reason the
    component crashes sometimes.
    This is the most common error messages:

    SPSS function spssGetVarNValueLabels returned error code SPSS_NO_MEMORY
    SPSS function spssOpenRead returned error code SPSS_NO_MEMORY
    SPSS function spssOpenRead returned error code SPSS_INVALID_FILE
    Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

    When one of these errors occurs, all further calls to the component throws similar
    exceptions, whereafter the whole web server has to be restarted to make it
    work again.

    As you can imagine, this is very bad for the users.

    So the thought was to replace it with the odbc driver, but it seems we're not
    gonna be able to to this because of the issues we've discussed in this thread.
    From the users, the files are encoded with system locale - not utf-8,
    and we can't really force them to resave their files before uploading.


    Is there a way to read spss files via .Net that we are unaware of?

    Regards,
    Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-07-01T17:31:25Z  
    First, I don't think any of these approaches will be thread safe. You would need these in separate processes.

    Second, using the .NET plugin, you can issue standard SPSS commands entirely under the control of your app without an SPSS user interface present. That requires SPSS to be installed on that box. This won't be thread safe, of course, but you could serialize access to the backend apis.


    HTH,

    Jon
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-07-06T09:26:56Z  
    Hi,

    Thanks for your reply. We'll have to settle with some backend queing system then.

    Do you have any plans for developing a thread safe component?
    Thinking of how common the spss files is for case data, it would be really good to be able to have integrated support for it in applications.

    Regards,
    Mårten
  • SystemAdmin
    SystemAdmin
    881 Posts

    Spss Odbc Driver - International Characters

    ‏2009-07-06T17:07:52Z  
    I have passed on your suggestion to the product planning folks. It's a good suggestion, but I don't know how much work would be entailed.


    Regards,

    Jon