Topic
  • 3 replies
  • Latest Post - ‏2012-12-07T11:31:47Z by SystemAdmin
SystemAdmin
SystemAdmin
396 Posts

Pinned topic Can R do this? Or is it just more efficient to do with SPSS syntax?

‏2012-12-06T15:16:33Z |
Greetings,
We have 1680 .txt files in one directory. All files are pipe delimitated 77 variables, no headers. The data is not rectangular. That is 7 lines per one case, each case 77 variables. There is missing data in the files, that are blank. We want R (or Python) to:
loop through the directory open/read all files, make data rectangular merge them, assign variable names and export into SPSS.

We are rebuilding a dataset. Normally we get 14 files a month, open each one by point and click, run our syntax, then use merge files/add cases, to make the 14 into one SPSS file. I need to do this for 10 years worth of data. We have some R and some Python experience. Is R more efficient with this? Or should we just use manual labor.

SPSS 19
R 12.10.1

Thanks

Robin
  • SystemAdmin
    SystemAdmin
    396 Posts

    Re: Can R do this? Or is it just more efficient to do with SPSS syntax?

    ‏2012-12-06T15:31:03Z  
    You can probably do all this reasonably easy using Statistics plus one extension command.

    Here's the approach I would suggest.
    First, take one file and get it read correctly. You can probably use File >Read Text Data to construct the syntax.

    Then, with the Python Essentials for your version installed, download and install the SPSSINC PROCESS FILES extension command. If you are on Win7, you may need to start Statistics using Run As Administrator or set some environment variables in order to install the command. See the FAQ on this site or the help for details. Restart Statistics. PROCESS FILES gives you a tool to iterate a syntax file over all the files. Use Utilities > Process Data Files to construct the syntax.

    You will wind up doing an ADD FILES for each text file (after the first) even though ADD FILES can handle 50 files at a time, but that's probably not going to be an issue unless you have to do this several times per second.

    Here are a few tips.

    Getting this process started is a little bit tricky, since ADD FILES requires that you already have a data file open.
    Move one of your text files to a different directory and open it with the syntax you constructed for reading a file. Give it a dataset name, say, ACTIVE, so that it will remain open and referenceable as other files are read.

    Your syntax file to be applied to each dataset by PROCESS FILES would just have statements like
    GET DATA /TYPE=TEXT .../FILE="JOB_INPUTFILE" ...
    DATASET NAME=FRED.
    ADD FILES /FILE=ACTIVE /FILE="JOB_INPUTFILE".
    DATASET CLOSE FRED.
    JOB_INPUTFILE is defined by PROCESS FILES as a file handle for the name of the current input. It will be redefined each time another file is processed.

    You can construct the PROCESS FILE command from the menus via Utilities > Process Data Files.
    The input filespec would be something like
    c:\mydata\*.txt

    After PROCESS FILES is run, you can save the constructed file in the usual way.

    HTH,
    Jon Peck
  • Btibert3
    Btibert3
    1 Post

    Re: Can R do this? Or is it just more efficient to do with SPSS syntax?

    ‏2012-12-06T16:39:09Z  
    Short answer yes. Here is the skeleton R code (not tested, but enough to get you started)

    setwd("your-data-folder-path-goes-here")
    FILES <- list.files(pattern=".txt$")

    1. lets see how many files we have - should equal 1680
    length(FILES) == 1680 ## should print TRUE

    1. loop through each file and read with a for loop
    for (F in FILES) {

    1. do some actions specific to your files
    2. look at help for ?read.table
    3. your seperator option will be "|"
    4. there is detailed help on how to read complex files

    }
  • SystemAdmin
    SystemAdmin
    396 Posts

    Re: Can R do this? Or is it just more efficient to do with SPSS syntax?

    ‏2012-12-07T11:31:47Z  
    • Btibert3
    • ‏2012-12-06T16:39:09Z
    Short answer yes. Here is the skeleton R code (not tested, but enough to get you started)

    setwd("your-data-folder-path-goes-here")
    FILES <- list.files(pattern=".txt$")

    1. lets see how many files we have - should equal 1680
    length(FILES) == 1680 ## should print TRUE

    1. loop through each file and read with a for loop
    for (F in FILES) {

    1. do some actions specific to your files
    2. look at help for ?read.table
    3. your seperator option will be "|"
    4. there is detailed help on how to read complex files

    }
    Thank you both for your responses. I'll try both approaches.