IC5Notice: We have upgraded developerWorks Community to the latest version of IBM Connections. For more information, read our upgrade FAQ.
Topic
  • 8 replies
  • Latest Post - ‏2007-01-23T21:54:59Z by SystemAdmin
SystemAdmin
SystemAdmin
7754 Posts

Pinned topic How To Split Single Sequential File into Multiple Sequential Files?

‏2007-01-04T14:02:26Z |
I am working on a design for a ETL job that must accept a sequential file as input. The input file contains data for a number of "locations", typically 10 -15 locations or so. We do not know the locations, only that the file will contain multiple records for each location. The job(s) must split the input file into a single file for each location. So if there are 15 locations in the input file we will end up with 15 individual files as the output. The output files must be named with the location name in the filename.

My initial thought on this was to first run a job to obtain a dataset containing the unique list of locations in the input file. Then using this list run another job that will filter the input file by each item in the locations dataset creating an output file with a name based on the location code. I'm not sure how to do this or if it is even possible. It needs to be driven from the unique list of locations. I'm thinking the Job Sequence would need to loop though the location dataset and pass each row value to the main processing job as a parameter. Is this possible? Can you suggest how I should approach this problem?

Thanks
Ken R.
Updated on 2007-01-23T21:54:59Z at 2007-01-23T21:54:59Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-04T19:10:21Z  
    If your sitting on unix this can be done easily by a script. If you can post sample data maybe i can help you write one too. I posted a script like this in dsxchange several months ago that does exactly the same. But i need to have an idea of what your input looks like and what the output is going to look like.
    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-04T19:36:00Z  
    If your sitting on unix this can be done easily by a script. If you can post sample data maybe i can help you write one too. I posted a script like this in dsxchange several months ago that does exactly the same. But i need to have an idea of what your input looks like and what the output is going to look like.
    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein
    Yes, this will be processed on AIX servers. I have attached some sample test data. Here's what I'm trying to do with it:

    1. The file needs to be split based on the first column. Columns are delimited with semi-colons.

    2. The first and last rows of the source (header/trailer) must exist in all output files as well.

    3. Output files must be named as the source file with the value of the first column appended.

    A shell script may be the way to go. If I can ever figure out how to set the UserStatus in a parallel job (or server job) then I can probably get DataStage to do it. For some reason the DSSetUserStatus routine is not available in our DataStage!

    Thanks for the help
    Updated on 2007-01-04T19:36:00Z at 2007-01-04T19:36:00Z by SystemAdmin
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-04T20:04:08Z  
    Yes, this will be processed on AIX servers. I have attached some sample test data. Here's what I'm trying to do with it:

    1. The file needs to be split based on the first column. Columns are delimited with semi-colons.

    2. The first and last rows of the source (header/trailer) must exist in all output files as well.

    3. Output files must be named as the source file with the value of the first column appended.

    A shell script may be the way to go. If I can ever figure out how to set the UserStatus in a parallel job (or server job) then I can probably get DataStage to do it. For some reason the DSSetUserStatus routine is not available in our DataStage!

    Thanks for the help
    Use this
    code
    #!/usr/bin/ksh
    #Change the following four lines to fit your environment
    export filepath=/Data/SFDCDEV/temp/export_t_entry.out
    export tempFile1=/Data/SFDCDEV/scripts/my1.tmp
    export tempFile2=/Data/SFDCDEV/scripts/my2.tmp
    export newFileDir=/Data/SFDCDEV/scripts

    #Strip out the header and footer
    sed 1d $filepath | sed -e '$d' > $tempFile1

    #Main processing
    cat $tempFile1 | sort | awk -F"\;" '{print $1}' | uniq > $tempFile2
    cat $tempFile2 | while read filename
    do
    cat $filepath | grep -w $filename > $newFileDir/$filename.txt
    done

    #remove temp files
    rm -f tempFile1
    rm -f tempFile2
    echo "All done"
    [/code]

    UserStatus cannot be set in a px job. In server job you can set it by calling DSSetUserStatus('JobName') and to retrieve it you can use DSGetJobInfo(). Read about them in DataStage help.
    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-04T20:37:55Z  
    Use this
    code
    #!/usr/bin/ksh
    #Change the following four lines to fit your environment
    export filepath=/Data/SFDCDEV/temp/export_t_entry.out
    export tempFile1=/Data/SFDCDEV/scripts/my1.tmp
    export tempFile2=/Data/SFDCDEV/scripts/my2.tmp
    export newFileDir=/Data/SFDCDEV/scripts

    #Strip out the header and footer
    sed 1d $filepath | sed -e '$d' > $tempFile1

    #Main processing
    cat $tempFile1 | sort | awk -F"\;" '{print $1}' | uniq > $tempFile2
    cat $tempFile2 | while read filename
    do
    cat $filepath | grep -w $filename > $newFileDir/$filename.txt
    done

    #remove temp files
    rm -f tempFile1
    rm -f tempFile2
    echo "All done"
    [/code]

    UserStatus cannot be set in a px job. In server job you can set it by calling DSSetUserStatus('JobName') and to retrieve it you can use DSGetJobInfo(). Read about them in DataStage help.
    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein
    Elegant and simple! Thanks! You have saved me a bunch of work.

    I don't have much experience with regular expressions. Is it possible to code the grep to only select words in the first field location? The production data will contain many records and some of the key values will be numeric. It is possible a match could be found in another field of the record.

    Thanks, I appreciate your help.

    Ken
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-04T21:50:58Z  
    Elegant and simple! Thanks! You have saved me a bunch of work.

    I don't have much experience with regular expressions. Is it possible to code the grep to only select words in the first field location? The production data will contain many records and some of the key values will be numeric. It is possible a match could be found in another field of the record.

    Thanks, I appreciate your help.

    Ken
    O, i forgot to add a carat. Change the code between do and done to
    code
    cat $filepath | grep -w ^$filename > $newFileDir/$filename.txt
    [/code]
    That is it, your done. :)

    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-04T21:53:12Z  
    Elegant and simple! Thanks! You have saved me a bunch of work.

    I don't have much experience with regular expressions. Is it possible to code the grep to only select words in the first field location? The production data will contain many records and some of the key values will be numeric. It is possible a match could be found in another field of the record.

    Thanks, I appreciate your help.

    Ken
    Change the code between 'do' and 'done' to the following. The only difference is that now i added a carat(^) before $filename. This will grep for only the ones that are in the first column.
    code
    cat $filepath | grep -w ^$filename > $newFileDir/$filename.txt
    [/code]
    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-23T11:26:08Z  

    You can do a regular DataStage Job and split it to the number of Sequential files you need (assuming you know the no.) using the [b]constraint[/b]. The names of the Sequential files will be fixed (for example File_1, File_2, ...).
    At the [b]After-job routine[/b], at the end of the Job, you can [b]rename[/b] the name of the Sequential files according to the Input first Column of the Input file (for example File_1 ==> File_(Input.location1), File_2 ==> File_(Input.location2), ...).

    Better late than never.
  • SystemAdmin
    SystemAdmin
    7754 Posts

    Re: How To Split Single Sequential File into Multiple Sequential Files?

    ‏2007-01-23T21:54:59Z  

    You can do a regular DataStage Job and split it to the number of Sequential files you need (assuming you know the no.) using the [b]constraint[/b]. The names of the Sequential files will be fixed (for example File_1, File_2, ...).
    At the [b]After-job routine[/b], at the end of the Job, you can [b]rename[/b] the name of the Sequential files according to the Input first Column of the Input file (for example File_1 ==> File_(Input.location1), File_2 ==> File_(Input.location2), ...).

    Better late than never.
    THe only problem with that approach is that the number of splits need to be known pre-hand and should remain constant. The more dynamic the code the better.

    [i]Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.[/i]
    • Albert Einstein