Examples of extracting data from Microsoft Excel files (DataStage®)
You can build sample jobs that extract data from Microsoft Excel files. In this example, Sequential File stage is used as output stage. You can use any other output stage for creating the output.
To get the files for the examples, extract the IS_install\Clients\Samples\Connectors\UnstructuredData_Samples.zip file.
Example 1: Extracting data from a range in a Microsoft Excel file
Create a job that uses the Excel stage to retrieve data from a range in a Microsoft Excel spread sheet.
This example uses the sample Microsoft Excel file Employee1.xls, which contains details of employees working in an organization. This sample file has three spread sheets: Sheet1, Sheet2, and Sheet3.Sheet1 contains information about the employees in every department in the organization. Sheet2 and Sheet3 are blank. In this example, you extract business information about only the employees who work for department B01.
- To extract data from a range in an Excel file:
-
- Create a job that includes one Excel stage and one Sequential File stage.
- Double-click the Excel stage.
- In the Configuration window, specify the full file path of the Microsoft Excel input file Employee1.xls.
- From the Range option list, select Specify the entire data range to extract the data in a specific range.
- In the Range expression field, specify Sheet1!A16:K28.
- From the Column header field, select the First row of data ranges. When the First row of data ranges is selected, the first row is regarded as the header and the Excel stage starts extracting from the second row.
- Clear the checkbox next to the Microsoft Excel columns E (PHONE NO) and I (BIRTH DATE).
- Click Import and OK. When you click Import, the Map pane at the lower right of the Configuration window updates.
- In the EMP_NO column to Integer, and then click OK. page, change the type of the
- Double-click the Sequential file stage. On the Properties page, specify the path where you want the output file to be created, followed by the file name OutputOfExample1.txt, and click OK.
- Save the job. Now, you can compile and run the job.
The following table displays the information in a Microsoft Excel input file that contains the employee information for different departments.
EMP NO | FIRST NAME | MID INIT | LAST NAME | PHONE NO | HIRE DATE | JOB | SEX | BIRTH DATE |
---|---|---|---|---|---|---|---|---|
Employees in DEPT_A00 | ||||||||
10 | CHRISTINE | I | HAAS | 3978 | 1/1/1995 | PRES | F | 8/24/1963 |
20 | MICHAEL | L | THOMSON | 3476 | 10/10/2003 | MANAGER | M | 2/2/1976 |
30 | SALLY | A | KWAN | 4738 | 4/5/2005 | MANAGER | F | 5/11/1971 |
50 | JOHN | B | GEYER | 6789 | 8/17/1979 | MANAGER | M | 9/15/1955 |
Employees in DEPT_B01 | ||||||||
60 | IRVING | F | STERN | 6423 | 9/14/2003 | MANAGER | M | 7/7/1975 |
70 | EVA | D | PULASKI | 7831 | 9/30/2003 | MANAGER | F | 5/26/2003 |
90 | EILEEN | W | HENDERSON | 5498 | 8/15/2000 | MANAGER | F | 5/15/1971 |
100 | THEODORE | Q | SPENSER | 742 | 6/19/2000 | MANAGER | M | 12/18/1980 |
110 | VINCENZO | G | LUCCHESSI | 3490 | 5/16/1988 | SALESREP | M | 11/5/1958 |
120 | SEAN | O'CONNELL | 2167 | 12/51993 | CLERK | M | 10/18/1972 | |
130 | DELORES | M | QUINTANA | 4578 | 7/28/2001 | ANALYST | F | 9/15/1955 |
140 | HEATHER | A | NICHOLLS | 1793 | 12/15/2006 | ANALYST | F | 1/19/1976 |
150 | BRUCE | ADAMSON | 4510 | 2/12/2002 | DESIGNER | M | 5/17/1972 | |
160 | ELIZABETH | R | PIANKA | 3782 | 10/11/2006 | DESIGNER | F | 4/12/1980 |
1770 | MASATOSHI | J | YOSHIMURA | 2890 | 9/15/1999 | DESIGNER | M | 1/5/1981 |
180 | MARILYN | S | SCOUTTEN | 1682 | 7/7/2003 | DESIGNER | F | 2/21/1978 |
After the job runs, you can open the OutputOfExample1.txt file to view the result.
"60","IRVING","F","STERN","2003-09-14","MANAGER ","M","72250","500"
"70","EVA","D","PULASKI","2005-09-30","MANAGER ","F","96170","700"
"90","EILEEN","W","HENDERSON","2000-08-15","MANAGER ","F","89750","600"
"100","THEODORE","Q","SPENSER","2000-06-19","MANAGER ","M","86150","500"
"110","VINCENZO","G","LUCCHESSI","1988-05-16","SALESREP","M","66500","900"
"120","SEAN"," ","O'CONNELL","1993-12-05","CLERK ","M","49250","600"
"130","DELORES","M","QUINTANA","2001-07-28","ANALYST ","F","73800","500"
"140","HEATHER","A","NICHOLLS","2006-12-15","ANALYST ","F","68420","600"
"150","BRUCE"," ","ADAMSON","2002-02-12","DESIGNER","M","55280","500"
"160","ELIZABETH","R","PIANKA","2006-10-11","DESIGNER","F","62250","400"
"170","MASATOSHI","J","YOSHIMURA","1999-09-15","DESIGNER","M","44680","500"
"180","MARILYN","S","SCOUTTEN","2003-07-07","DESIGNER","F","51340","500"
Example 2: Extracting data from multiple Microsoft Excel sheets
This example uses the sample Microsoft Excel file Employee2.xls. This sample file has the following sheets: DEPT A00, DEPT B01, DEPT C01, and DEPT D01. Each sheet contains information about the employees in the department.
The data structure of each sheet is similar. Each sheet has the EMP NO, FIRST NAME, MID INIT, LAST NAME, PHONE NO, HIRE DATE, JOB, and ADDRESS columns, and the third row is the header. But each sheet has a different number of rows.
- To extract data from multiple files:
-
- In the Configuration window, specify the full file path of the Microsoft Excel input file Employee2.xls.
- From the Range option, select Specify the start row.
- In the Range expression field, specify A3:H3 . When the stage runs with Specify the first row option and no specific sheet name is specified in the Range expression, the job finds the last row and extracts rows to the last row at a runtime.
- In the Column header, select the First row of data ranges.
- On the Property tab, select the checkbox next to the property to extract the property value. In this example, select the Sheetname as the property.
- Click Import. The stage generates the column mappings.
- To make the SheetNname column the first column in the list, select the SheetName column and click Up until the SheetName column is the first column in the list.
- In the mapping table, insert a row for ADDRESS column in the input file that contains hyperlink.
- Click Insert
- In the Excel item option, select Column ADDRESS.
- In the Import option cell in the new row, select the Hyper link address.
- Specify the DataStage column name EMAIL_ADDRESS for the new row. Then, click OK.
- Click tab to change the data type or other attributes.
- Change the type of EMP_NO column to Integer. Click OK.
- On the Properties page, specify the path where you want the output file to be created, followed by the file name OutputOfExample2.txt. Click OK.
- Save the job. Now, you can compile and run the job.
An example input Microsoft Excel files that contain the employee information for each department in the different sheets. The job extracts of employee data from all sheets are displayed in the form of the following tables:
EMP NO | FIRST NAME | MID INIT | LAST NAME | PHONE NO | HIRE DATE | JOB | SEX | BIRTH DATE |
---|---|---|---|---|---|---|---|---|
10 | CHRISTINE | I | HAAS | 3978 | 1/1/1995 | PRES | F | 8/24/1963 |
20 | MICHAEL | L | THOMSON | 3476 | 10/10/2003 | MANAGER | M | 2/2/1976 |
30 | SALLY | A | KWAN | 4738 | 4/5/2005 | MANAGER | F | 5/11/1971 |
50 | JOHN | B | GEYER | 6789 | 8/17/1979 | MANAGER | M | 9/15/1955 |
EMP NO | FIRST NAME | MIDI NIT | LAST NAME | PHONE NO | HIRE DATE | JOB | SEX | BIRTH DATE |
---|---|---|---|---|---|---|---|---|
60 | IRVING | F | STERN | 6423 | 9/14/2003 | MANAGER | M | 7/7/1975 |
70 | EVA | D | PULASKI | 7831 | 9/30/2003 | MANAGER | F | 5/26/2003 |
90 | EILEEN | W | HENDERSON | 5498 | 8/15/2000 | MANAGER | F | 5/15/1971 |
100 | THEODORE | Q | SPENSER | 742 | 6/19/2000 | MANAGER | M | 12/18/1980 |
110 | VINCENZO | G | LUCCHESSI | 3490 | 5/16/1988 | SALESREP | M | 11/5/1958 |
120 | SEAN | O'CONNELL | 2167 | 12/51993 | CLERK | M | 10/18/1972 | |
130 | DELORES | M | QUINTANA | 4578 | 7/28/2001 | ANALYST | F | 9/15/1955 |
140 | HEATHER | A | NICHOLLS | 1793 | 12/15/2006 | ANALYST | F | 1/19/1976 |
150 | BRUCE | ADAMSON | 4510 | 2/12/2002 | DESIGNER | M | 5/17/1972 | |
160 | ELIZABETH | R | PIANKA | 3782 | 10/11/2006 | DESIGNER | F | 4/12/1980 |
1770 | MASATOSHI | J | YOSHIMURA | 2890 | 9/15/1999 | DESIGNER | M | 1/5/1981 |
180 | MARILYN | S | SCOUTTEN | 1682 | 7/7/2003 | DESIGNER | F | 2/21/1978 |
After the job runs, you can open the OutputOfExample2.txt file that contains the following result.
"DEPT A00","10","CHRISTINE","I","HAAS","3978","1995-01-01","PRES ","CHRISTINE HAAS","mailto:CHRISTINE%20HAAS@abc.com"
"DEPT A00","20","MICHAEL","L","THOMPSON","3476","2003-10-10","MANAGER ","MICHAEL THOMPSON","mailto:MICHAEL%20THOMPSON@abc.com"
"DEPT A00","30","SALLY","A","KWAN","4738","2005-04-05","MANAGER ","SALLY KWAN","mailto:SALLY%20KWAN@abc.com"
"DEPT A00","50","JOHN","B","GEYER","6789","1979-08-17","MANAGER ","JOHN GEYER","mailto:JOHN%20GEYER@abc.com"
"DEPT B01","60","IRVING","F","STERN","6423","2003-09-14","MANAGER ","IRVING STERN","mailto:IRVING%20STERN@abc.com"
"DEPT B01","70","EVA","D","PULASKI","7831","2005-09-30","MANAGER ","EVA PULASKI","mailto:EVA%20PULASKI@abc.com"
"DEPT B01","90","EILEEN","W","HENDERSON","5498","2000-08-15","MANAGER ","EILEEN HENDERSON","mailto:EILEEN%20HENDERSON@abc.com"
"DEPT B01","100","THEODORE","Q","SPENSER","972","2000-06-19","MANAGER ","THEODORE SPENSER","mailto:THEODORE%20SPENSER@abc.com"
"DEPT B01","110","VINCENZO","G","LUCCHESSI","3490","1988-05-16","SALESREP","VINCENZO LUCCHESSI","mailto:VINCENZO%20LUCCHESSI@abc.com"
"DEPT B01","120","SEAN"," ","O'CONNELL","2167","1993-12-05","CLERK ","SEAN O'CONNELL","mailto:SEAN%20O'CONNELL@abc.com"
"DEPT B01","130","DELORES","M","QUINTANA","4578","2001-07-28","ANALYST ","DELORES QUINTANA","mailto:DELORES%20QUINTANA@abc.com"
"DEPT B01","140","HEATHER","A","NICHOLLS","1793","2006-12-15","ANALYST ","HEATHER NICHOLLS","mailto:HEATHER%20NICHOLLS@abc.com"
"DEPT B01","150","BRUCE"," ","ADAMSON","4510","2002-02-12","DESIGNER","BRUCE ADAMSON","mailto:BRUCE%20ADAMSON@abc.com"
"DEPT B01","160","ELIZABETH","R","PIANKA","3782","2006-10-11","DESIGNER","ELIZABETH PIANKA","mailto:ELIZABETH%20PIANKA@abc.com"
"DEPT B01","170","MASATOSHI","J","YOSHIMURA","2890","1999-09-15","DESIGNER","MASATOSHI YOSHIMURA","mailto:MASATOSHI%20YOSHIMURA@abc.com"
"DEPT B01","180","MARILYN","S","SCOUTTEN","1682","2003-07-07","DESIGNER","MARILYN SCOUTTEN","mailto:MARILYN%20SCOUTTEN@abc.com"
"DEPT C01","190","JAMES","H","WALKER","2986","2004-07-26","DESIGNER","JAMES WALKER","mailto:JAMES%20WALKER@abc.com"
"DEPT C01","200","DAVID"," ","BROWN","4501","2002-03-03","DESIGNER","DAVID BROWN","mailto:DAVID%20BROWN@abc.com"
"DEPT C01","210","WILLIAM","T","JONES","942","1998-04-11","DESIGNER","WILLIAM JONES","mailto:WILLIAM%20JONES@abc.com"
"DEPT C01","220","JENNIFER","K","LUTZ","672","1998-08-29","DESIGNER","JENNIFER LUTZ","mailto:JENNIFER%20LUTZ@abc.com"
"DEPT C01","230","JAMES","J","JEFFERSON","2094","1996-11-21","CLERK ","JAMES JEFFERSON","mailto:JAMES%20JEFFERSON@abc.com"
"DEPT C01","240","SALVATORE","M","MARINO","3780","2004-12-05","CLERK ","SALVATORE MARINO","mailto:SALVATORE%20MARINO@abc.com"
"DEPT C01","250","DANIEL","S","SMITH","961","1999-10-30","CLERK ","DANIEL SMITH","mailto:DANIEL%20SMITH@abc.com"
"DEPT C01","260","SYBIL","P","JOHNSON","8953","2005-09-11","CLERK ","SYBIL JOHNSON","mailto:SYBIL%20JOHNSON@abc.com"
"DEPT D01","270","MARIA","L","PEREZ","9001","2006-09-30","CLERK ","MARIA PEREZ","mailto:MARIA%20PEREZ@abc.com"
"DEPT D01","280","ETHEL","R","SCHNEIDER","8997","1997-03-24","OPERATOR","ETHEL SCHNEIDER","mailto:ETHEL%20SCHNEIDER@abc.com"
"DEPT D01","290","JOHN","R","PARKER","4502","2006-05-30","OPERATOR","JOHN PARKER","mailto:JOHN%20PARKER@abc.com"
"DEPT D01","300","PHILIP","X","SMITH","2095","2002-06-19","OPERATOR","PHILIP SMITH","mailto:PHILIP%20SMITH@abc.com"
"DEPT D01","310","MAUDE","F","SETRIGHT","3332","1994-09-12","OPERATOR","MAUDE SETRIGHT","mailto:MAUDE%20SETRIGHT@abc.com"
"DEPT D01","320","RAMLAL","V","MEHTA","9990","1995-07-07","FIELDREP","RAMLAL MEHTA","mailto:RAMLAL%20MEHTA@abc.com"
"DEPT D01","330","WING"," ","LEE","2103","2006-02-23","FIELDREP","WING LEE","mailto:WING%20LEE@abc.com"
"DEPT D01","340","JASON","R","GOUNOT","5698","1977-05-05","FIELDREP","JASON GOUNOT","mailto:JASON%20GOUNOT@abc.com"
Example 3: Extracting data from multiple ranges that have different data structures in a Microsoft Excel file
This example uses the sample Microsoft Excel file Employee3.xls. This sample file has two spreadsheets, Departments and Employees, which have different data structures.
In this example, the Excel stage has two output links. You extract data from the Departments sheet to the first link and from the Employees sheet to the second link.
- To extract different data structures:
-
- Create a job that includes one Excel stage and two Sequential File stages. You can rename your links as Departments and Employees.
- Double-click the Excel stage. In the Configuration window, specify the full file path of the Microsoft Excel input file Employee3.xls.
- Specify the data to extract from the Departments spreadsheet and complete
the following sub steps to generate the column mappings.
- From the Link list box, select Departments.
- From the Range option list, select Specify the entire range.
- From the Range expression field, specify Departments!A2:C6 .
- From the Column header, select the First row of data ranges.
- Click Import, and then click OK.
- Specify the data to extract from the Employees spreadsheet and complete
the below sub steps to generate the column mappings.
- From the Link list box, select Employees.
- From the Range Option list, Specify the entire range.
- From the Range expression field, specify Employees!A2:L34.
- From the Column header, select First row of data ranges.
- Click Import, and then click OK.
- On the Output page, select the Employees link as the Output name.
- On the Columns page, change the data type of the EMP_NO column to an integer, and then click OK.
- Double-click the first Sequential File stage Output_1 and specify the path to create the output file, followed by the file name OutputOfExample3_1.txt.
- Double-click the second Sequential File stage Output_2, and specify the path to create the output file, followed by the file name OutputOfExample3_2.txt
- Save the job. Now, you can compile and run the job.
An example input Microsoft Excel file Employee3.xls contains department information in the Departments sheet and employee information in the Employees sheet. The job extracts department data to OutputOfExample3_1.txt file and employee data to OutputOfExample3_2.txt.
After the job runs, open the OutputOfExample3_1.txt file and OutputOfExample3_2.txt file. The OutputOfExample3_1.txt file should match the Departments sheet and OutputOfExample3_2.txt file should match the Employees sheet from the Employee.xls file.