Topic
  • 7 replies
  • Latest Post - ‏2013-05-17T13:28:25Z by Albert-Jan
nhoebel
nhoebel
4 Posts

Pinned topic Merging hundreds of data sets

‏2013-05-08T15:36:38Z |

Hello everyone and thanks for taking the time to read into my problem. After successfully dealing with a lot of problems that come with using hundreds of identical datasets, I am now stumped by what I thought would be the easiest task: merging them all.

Here's what I have and want to do: I have a few hundred datasets all of which contain exactly one case. The variables are identical in each so that I can easily merge them via the dialogue "Data -> Merge Data -> Add Cases"... but that would take ages. I am looking for a way to do this operation automatically,, like a syntax/macro or a python script. I have tried some solutions with Python Essentials and SPSSINC PROCESS FILES but to no avail. This may easily be because of my inferior skills.

I would greatly appreciate any advice in this matter. Thanks a lot and have a good day!

Nicolas

  • nhoebel
    nhoebel
    4 Posts
    ACCEPTED ANSWER

    Re: Merging hundreds of data sets

    ‏2013-05-11T14:47:19Z  

    Thanks again, Jon and RCCG. Your combined suggestions solved my problem!

    OK, so here's what I did and I hope future googlers typing "merging multiple data sets" or "merging hundreds of data sets" find this response and save time. I still find it very complicated and wonder why SPSS doesn't have this on board anyway but then again SPSS is lacking a lot of functions I consider elemental.

    1. I made sure all the data sets I wished to combine were in *.sav-format and stored in the same folder.
    2. I created a file list of that folder with the Windows command prompt by navigating to said folder and executing the command "dir /b > list.txt"
    3. I edited this list and added '/FILE "{directory}/" at the beginning of each line (where {directory} equals the file's directory of course) and a simple " at the end of each line. (quite easily accomplished by putting them into seperate columns of an excel table and then copying and pasting the result into a text editor - make sure to find and replace all the blanks/separators between columns)
    4. I created a new syntax as follows:

    ADD FILES /FILE=*

    EXECUTE.

    5. Ultimately I proceeded to copy and paste 49 lines of the above at a time in between "ADD FILES" and "EXECUTE.", making sure to not forget the "." at the end of the last line of each package and ran the syntax.
    6. I proceeded with step 5 until I had copied, pasted and executed all of the lines of my txt.

    Took me about twenty minutes to half an hour but just because I had some other stuff messed up in the data sets. Maybe there is an easier solution to this, but this was quite doable and definitely faster than doing it via the UI of SPSS.

    Thanks to JonPeck and RCCG_荣_黄 for helping me find this solution. Good day to everyone and happy analyzing!

    Regards
    Nico

  • JonPeck
    JonPeck
    269 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-09T22:57:16Z  

    ADD FILES can merge up to 50 files at a time, so that would be one place to start.  You could use PROCESS FILES, but it would generally merge these one at a time.

    With ADD FILES, the challenge would be to list all the file names.  So if you have a spec like "c:/data/*.sav", you could use the glob.glob Python function to get all the file names and then build a few ADD FILES commands and run them with spss.Submit.

    If that fits your scenario, post  a few details, and we can help you with that code.

  • RCCG_荣_黄
    RCCG_荣_黄
    20 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-10T09:20:10Z  

    I think it easy to deal wiht.

    First,you can get all the data file list using "dir" command in the dos command window,then paste them in a text.

    Second, you should edit the text with the software called "UltraEdit" which can edit hundreds of lines at the same time.

    After editing the syntax command with  "UltraEdit", the last step you should run the syntax in the syntax window and theck the result.

    If you have any quetion you can call me.

  • nhoebel
    nhoebel
    4 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-10T14:42:10Z  
    • JonPeck
    • ‏2013-05-09T22:57:16Z

    ADD FILES can merge up to 50 files at a time, so that would be one place to start.  You could use PROCESS FILES, but it would generally merge these one at a time.

    With ADD FILES, the challenge would be to list all the file names.  So if you have a spec like "c:/data/*.sav", you could use the glob.glob Python function to get all the file names and then build a few ADD FILES commands and run them with spss.Submit.

    If that fits your scenario, post  a few details, and we can help you with that code.

    Dear Jon,

    thanks so much for your help and your will to help me with the code. Indeed, my files are all stored in one folder, so that would be a good shot. glob.glob is Chinese to me, I am afraid. But as far as I can tell RCCG's suggestion points into a similar direction and I think I might be able to get that done.

    If I understand correctly, the first thing is to get a list of all filenames in some way and parse it into an ADD FILES command. I will go ahead and try that and tell you about my results.

    Thanks again, Jon! You've been a great help.

    Regards
    Nico

  • nhoebel
    nhoebel
    4 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-10T14:47:30Z  

    I think it easy to deal wiht.

    First,you can get all the data file list using "dir" command in the dos command window,then paste them in a text.

    Second, you should edit the text with the software called "UltraEdit" which can edit hundreds of lines at the same time.

    After editing the syntax command with  "UltraEdit", the last step you should run the syntax in the syntax window and theck the result.

    If you have any quetion you can call me.

    Hello RCCG,

    thanks for taking the time to answer. I really appreciate your help! That does sound reasonable and manageable - I will go ahead and try that now and keep you posted!

    Regards
    Nico

  • nhoebel
    nhoebel
    4 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-11T14:47:19Z  

    Thanks again, Jon and RCCG. Your combined suggestions solved my problem!

    OK, so here's what I did and I hope future googlers typing "merging multiple data sets" or "merging hundreds of data sets" find this response and save time. I still find it very complicated and wonder why SPSS doesn't have this on board anyway but then again SPSS is lacking a lot of functions I consider elemental.

    1. I made sure all the data sets I wished to combine were in *.sav-format and stored in the same folder.
    2. I created a file list of that folder with the Windows command prompt by navigating to said folder and executing the command "dir /b > list.txt"
    3. I edited this list and added '/FILE "{directory}/" at the beginning of each line (where {directory} equals the file's directory of course) and a simple " at the end of each line. (quite easily accomplished by putting them into seperate columns of an excel table and then copying and pasting the result into a text editor - make sure to find and replace all the blanks/separators between columns)
    4. I created a new syntax as follows:

    ADD FILES /FILE=*

    EXECUTE.

    5. Ultimately I proceeded to copy and paste 49 lines of the above at a time in between "ADD FILES" and "EXECUTE.", making sure to not forget the "." at the end of the last line of each package and ran the syntax.
    6. I proceeded with step 5 until I had copied, pasted and executed all of the lines of my txt.

    Took me about twenty minutes to half an hour but just because I had some other stuff messed up in the data sets. Maybe there is an easier solution to this, but this was quite doable and definitely faster than doing it via the UI of SPSS.

    Thanks to JonPeck and RCCG_荣_黄 for helping me find this solution. Good day to everyone and happy analyzing!

    Regards
    Nico

  • JonPeck
    JonPeck
    269 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-13T17:53:05Z  

    As I suggested, using programmability it would be easy to automate this, although you would have to learn the basics of Python with SPSS.  You might want to download the Programming and Data Management book from the Books and Articles section of the Community site to see how to solve many SPSS data management tasks with this technology.

  • Albert-Jan
    Albert-Jan
    16 Posts

    Re: Merging hundreds of data sets

    ‏2013-05-17T13:28:25Z  
    • JonPeck
    • ‏2013-05-13T17:53:05Z

    As I suggested, using programmability it would be easy to automate this, although you would have to learn the basics of Python with SPSS.  You might want to download the Programming and Data Management book from the Books and Articles section of the Community site to see how to solve many SPSS data management tasks with this technology.

    import re

    import os

    import spss

     

    ## sample data

    syntax = """\

    data list free / x (f).

    begin data

    1

    end data.

    save outfile = '%%temp%%/test_%03d.sav'."""

    spss.Submit([syntax % i for i in range(123)])

     

    ## actual function

    def superAddFiles(p, resultfile, resultvar="origin"):

        """Use Spss ADD FILES on all the .sav or .zsav files in path <p> and write

        to <resultfile>. This may be more than the ADD FILES limit (50). If

        <resultfile> is not None (and a valid variable name), a variable containing

        the source file names is created. """

        savs = [sav for sav in os.listdir(p) if re.match(r".*\.z?sav$", sav, re.I)]

        cmds = "  /file=%(sav)r /in=%(sav)s" if resultvar else "  /file=%(sav)r"

        cmds = [cmds % locals() for n, sav in enumerate(savs)]

        split = ".\nexecute.\nadd files /file=*"

        crap = [cmds.insert(i, split) for i in range(n) if i and i % 50 == 0]

        adds = "add files\n%s%s" % ("\n".join(cmds), "" if n % 50 == 0 else ".")

        label, save = "", "save outfile = %r" % resultfile

        if resultvar:

            label = ("if(%(sav)s)%(resultvar)s=%(i)s.\n" +

                     "add value labels %(resultvar)s %(i)s %(sav)r.\n")

            label = "\n".join([label % locals() for i, sav in enumerate(savs)])

            save += "/drop = %s." % "\n  ".join(savs)

        spss.Submit(["cd %r." % p , adds, label, save])

     

    superAddFiles(os.getenv("temp"), "result.sav")