We have 1680 .txt files in one directory. All files are pipe delimitated 77 variables, no headers. The data is not rectangular. That is 7 lines per one case, each case 77 variables. There is missing data in the files, that are blank. We want R (or Python) to:
loop through the directory open/read all files, make data rectangular merge them, assign variable names and export into SPSS.
We are rebuilding a dataset. Normally we get 14 files a month, open each one by point and click, run our syntax, then use merge files/add cases, to make the 14 into one SPSS file. I need to do this for 10 years worth of data. We have some R and some Python experience. Is R more efficient with this? Or should we just use manual labor.
This topic has been locked.
Pinned topic Can R do this? Or is it just more efficient to do with SPSS syntax?
Answered question This question has been answered.
Unanswered question This question has not been answered yet.
Re: Can R do this? Or is it just more efficient to do with SPSS syntax?2012-12-06T15:31:03ZThis is the accepted answer. This is the accepted answer.You can probably do all this reasonably easy using Statistics plus one extension command.
Here's the approach I would suggest.
First, take one file and get it read correctly. You can probably use File >Read Text Data to construct the syntax.
Then, with the Python Essentials for your version installed, download and install the SPSSINC PROCESS FILES extension command. If you are on Win7, you may need to start Statistics using Run As Administrator or set some environment variables in order to install the command. See the FAQ on this site or the help for details. Restart Statistics. PROCESS FILES gives you a tool to iterate a syntax file over all the files. Use Utilities > Process Data Files to construct the syntax.
You will wind up doing an ADD FILES for each text file (after the first) even though ADD FILES can handle 50 files at a time, but that's probably not going to be an issue unless you have to do this several times per second.
Here are a few tips.
Getting this process started is a little bit tricky, since ADD FILES requires that you already have a data file open.
Move one of your text files to a different directory and open it with the syntax you constructed for reading a file. Give it a dataset name, say, ACTIVE, so that it will remain open and referenceable as other files are read.
Your syntax file to be applied to each dataset by PROCESS FILES would just have statements like
GET DATA /TYPE=TEXT .../FILE="JOB_INPUTFILE" ...
ADD FILES /FILE=ACTIVE /FILE="JOB_INPUTFILE".
DATASET CLOSE FRED.
JOB_INPUTFILE is defined by PROCESS FILES as a file handle for the name of the current input. It will be redefined each time another file is processed.
You can construct the PROCESS FILE command from the menus via Utilities > Process Data Files.
The input filespec would be something like
After PROCESS FILES is run, you can save the constructed file in the usual way.
Btibert3 270003S7RA1 Post
Re: Can R do this? Or is it just more efficient to do with SPSS syntax?2012-12-06T16:39:09ZThis is the accepted answer. This is the accepted answer.Short answer yes. Here is the skeleton R code (not tested, but enough to get you started)
FILES <- list.files(pattern=".txt$")
- lets see how many files we have - should equal 1680
- loop through each file and read with a for loop
- do some actions specific to your files
- look at help for ?read.table
- your seperator option will be "|"
- there is detailed help on how to read complex files