Joining records

Often, you have two data sets with common key fields and different data fields, and you want to join their records, that is, for records with matching keys, you want to create output records with some fields from one data set and some fields from the other data set. You might want a one record to one record join, a one record to many records join, a many records to one record join, or even a many records to many records join.

A JOINKEYS application helps you to perform various "join" applications on two data sets by one or more keys. You can do an inner join, full outer join, left outer join, right outer join and unpaired combinations. The two data sets can be of different types (fixed, variable, VSAM, and so on) and lengths, and have keys in different locations.

The records from the input data sets can be processed in a variety of ways before and after they are joined using most of the DFSORT control statements you learned about previously including SORT or COPY, INCLUDE or OMIT, INREC, OUTREC and OUTFIL.

Suppose you have two input data sets, REGION.IN1 and REGION.IN2 as shown in Table 1 and Table 2. REGION.IN1 has RECFM=FB and LRECL=35 and REGION.IN2 has RECFM=FB and LRECL=27, so their LRECLs are different.

Table 1. REGION.IN1 data set for JOINKEYS application
Region	Headquarters	Regional Director
1 5	6 20	21 35
East West North South	Philadelphia San Jose Boston Charlotte	C. Kent B. Wayne P. Parker D. Prince

Table 2. REGION.IN2 data set for JOINKEYS application
Office	Region	Employees	Evaluation	Established
1 4	5 9	10 13	14 23	24 27
0001 0001 0002 0002 0001 0003 0002 0003 0001 0004	East South South East West East West South North South	0050 0023 0068 0125 0052 0028 0105 0054 0200 0070	Fair Good Fair Excellent Good Good Excellent Fair Fair Good	1983 1976 1978 1986 1995 1994 2001 1992 1991 2002

From these two input data sets, you want to create an output data set, REGION.OUT. For each record in REGION.IN2, you want to look up the corresponding Region in REGION.IN1, and combine fields from the two records into one output record in REGION.OUT, as shown in Table 3.

Table 3. REGION.OUT data set for JOINKEYS application
Office	Region	Regional Director	Employees	Evaluation	Headquarters
1 4	5 9	10 24	25 28	29 38	39 53
0001 0002 0003 0001 0001 0002 0003 0004 0001 0002	East East East North South South South South West West	C. Kent C. Kent C. Kent P. Parker D. Prince D. Prince D. Prince D. Prince B. Wayne B. Wayne	0050 0125 0028 0200 0023 0068 0054 0070 0052 0105	Fair Excellent Good Fair Good Fair Fair Good Good Excellent	Philadelphia Philadelphia Philadelphia Boston Charlotte Charlotte Charlotte Charlotte San Jose San Jose

Write the following DFSORT JCL and control statements to use a JOINKEYS application to create REGION.OUT from REGION.IN1 and REGION.IN2.

//JN1 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//IN1 DD DSN=DSN=A123456.REGION.IN1,DISP=SHR
//IN2 DD DSN=DSN=A123456.REGION.IN2,DISP=SHR
//SORTOUT DD DSN=A123456.REGION.OUT,DISP=(NEW,CATLG,DELETE),UNIT=3390,
//  SPACE=(CYL,(5,5))
//SYSIN DD *
  JOINKEYS F1=IN1,FIELDS=(1,5,A)  F1 has ddname IN1 and key in 1-5
  JOINKEYS F2=IN2,FIELDS=(5,5,A)  F2 has ddname IN2 and key in 5-9
  REFORMAT FIELDS=(F2:1,4,  Office from F2
      F1:1,5,21,15,         Region and Regional Director from F1
      F2:10,4,14,10,        Employees and Evaluation from F2
      F1:6,15)              Headquarters from F1
  OPTION COPY               Copy joined records
/*

Two JOINKEYS statements are required: one for the F1 data set and another for the F2 data set. In this case, the first JOINKEYS statement identifies IN1 as the ddname for the F1 data set and indicates an ascending key (Region) in positions 1-5 of that data set. The second JOINKEYS statement identifies IN2 as the ddname for the F2 data set and indicates an ascending key (Region) is at positions 5-9 of that data set. Each key in the F1 data set must be of the same length and order (ascending or descending) as the corresponding key in the F2 data set, but does not have to be in the same location. The keys are always treated as unsigned binary (INREC can be used to "normalize" the keys in each data set before the records are joined, if necessary).

The F1 data set will be sorted by the key in positions 1-5. The F2 data set will be sorted by the key in positions 5-9. If the records in a data set are already sorted by the key, you can specify SORTED on the JOINKEYS statement to tell DFSORT to copy the records of that data set rather than sorting them.

Records with the same key in both data sets are joined and constructed as directed by the REFORMAT statement using F1: for fields from the F1 record and F2: for fields from the F2 record. This REFORMAT statement creates joined records from the following fields:

Output positions 1-4: Office from F2
Output positions 5-9: Region from F1
Output positions 10-24: Regional Director from F1
Output positions 25-28: Employees from F2
Output positions 29-38: Evaluation from F2
Output positions 39-53: Headquarters from F1

The resulting joined records are 53 bytes long and are copied to the SORTOUT data set (REGION.OUT).

If we wanted to sort the resulting joined records on the Headquarters and Office fields, we could replace the OPTION COPY statement with the following SORT statement:

  SORT FIELDS=(39,15,CH,A,1,4,CH,A)

The resulting sorted output records in REGION.OUT are shown in Table 4.

Table 4. REGION.OUT data set Sorted by Headquarters and Office
Office	Region	Regional Director	Employees	Evaluation	Headquarters
1 4	5 9	10 24	25 28	29 38	39 53
0001 0001 0002 0003 0004 0001 0002 0003 0001 0002	North South South South South East East East West West	P. Parker D. Prince D. Prince D. Prince D. Prince C. Kent C. Kent C. Kent B. Wayne B. Wayne	0200 0023 0068 0054 0070 0050 0125 0028 0052 0105	Fair Good Fair Fair Good Fair Excellent Good Good Excellent	Boston Charlotte Charlotte Charlotte Charlotte Philadelphia Philadelphia Philadelphia San Jose San Jose

You can also use a JOINKEYS application to match records from two different input data sets in various ways. Suppose you have two input data sets, CITIES.IN1 and CITIES.IN2 as shown in Table 5 and Table 6.

Table 5. CITIES.IN1 data set
City	State	District
1 20	21 35	36 37
GILROY GILROY MORGAN HILL PALO ALTO PALO ALTO PALO ALTO SACRAMENTO SAN JOSE SAN JOSE SAN MARTIN AUSTIN DALLAS DALLAS BARRE BARRE STOWE	CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA TEXAS TEXAS TEXAS VERMONT VERMONT VERMONT	05 10 03 15 08 21 05 02 10 12 21 15 25 07 12 09

Table 6. CITIES.IN2 data set
State	City
1 15	16 35
TEXAS CALIFORNIA CALIFORNIA VERMONT CALIFORNIA TEXAS VERMONT CALIFORNIA CALIFORNIA	AUSTIN SACRAMENTO GILROY BURLINGTON MODESTO LAREDO BARRE LOS ANGELES SAN JOSE

From these two data sets, you want to create a CITIES.OUT data set with the records for Cities that appear in CITIES.IN1, but not in CITIES.IN2, as shown in Table 7.

Table 7. CITIES.OUT data set
City	State	District
1 20	21 35	36 37
MORGAN HILL PALO ALTO PALO ALTO PALO ALTO SAN MARTIN DALLAS DALLAS STOWE	CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA TEXAS TEXAS VERMONT	03 15 08 21 12 15 25 09

Write the following DFSORT JCL and control statements to use a JOINKEYS application to create CITIES.OUT from CITIES.IN1 and CITIES.IN2.

//JN2 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTJNF1 DD DSN=DSN=A123456.CITIES.IN1,DISP=SHR
//SORTJNF2 DD DSN=DSN=A123456.CITIES.IN2,DISP=SHR
//SORTOUT DD DSN=A123456.CITIES.OUT,DISP=OLD
//SYSIN DD *
* F1 keys are State and City - F1 is already sorted
* by those keys.
  JOINKEYS FILE=F1,FIELDS=(21,15,A,1,20,A),SORTED
* F2 keys are State and City 
  JOINKEYS FILE=F2,FIELDS=(1,15,A,16,20,A)
* Keep the records in F1 that do not have a match in F2
* for the State and City.
  JOIN UNPAIRED,F1,ONLY
  OPTION COPY
/*

The first JOINKEYS statement identifies SORTJNF1 as the ddname for the F1 data set and indicates ascending keys (State and City) in positions 21-35 and 1-20 of that data set. Since the F1 records are already in order by the State and City fields, SORTED is used to do a Copy of the F1 records rather than a Sort.

The second JOINKEYS statement identifies SORTJNF2 as the ddname for the F2 data set and indicates ascending keys (State and City) in positions 1-15 and 16-35 of that data set. Since the F2 records are not already in order by the State and City fields, SORTED is not used and a Sort in performed for the F2 records on the indicated keys.

JOIN UNPAIRED,F1,ONLY is used to restrict the output (SORTOUT) to the records in F1 that do not have matching keys in F2. Since we want the entire F1 record, we do not need a REFORMAT statement.

Now suppose we want to use the CITIES.IN1 and CITIES.IN2 data sets again, but this time we want to produce the following output from these two data sets:

BOTH.OUT: CALIFORNIA and TEXAS cities that appear in both CITIES.IN1 and CITIES.IN2.
F1ONLY.OUT: CALIFORNIA and TEXAS cities that appear in CITIES.IN1, but not in CITIES.IN2.
F2ONLY.OUT: CALIFORNIA and TEXAS cities that appear in CITIES.IN2, but not in CITIES.IN1.

Write the following DFSORT JCL and control statements to use a JOINKEYS application to create BOTH.OUT, F1ONLY.OUT and F2ONLY.OUT from CITIES.IN1 and CITIES.IN2.

//JN3 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//CITIES1 DD DSN=A123456.CITIES.IN1,DISP=SHR
//CITIES2 DD DSN=A123456.CITIES.IN2,DISP=SHR
//JNF1CNTL DD *                                            
  OMIT COND=(21,15,CH,EQ,C'VERMONT')                       
//JNF2CNTL DD *                                            
  OMIT COND=(1,15,CH,EQ,C'VERMONT') 
//BOTH DD DSN=A123456.BOTH.OUT,DISP=OLD
//F1ONLY DD DSN=A123456.F1ONLY.OUT,DISP=OLD
//F2ONLY DD DSN=A123456.F2ONLY.OUT,DISP=OLD
//SYSIN DD * 
  JOINKEYS F1=CITIES1,FIELDS=(21,15,A,1,20,A),SORTED          
  JOINKEYS F2=CITIES2,FIELDS=(1,15,A,16,20,A)                 
  JOIN UNPAIRED,F1,F2                                      
  REFORMAT FIELDS=(?,F1:1,37,F2:1,35)                      
  OPTION COPY                                              
  OUTFIL FNAMES=BOTH,INCLUDE=(1,1,CH,EQ,C'B'),     
    BUILD=(2,37)                                   
  OUTFIL FNAMES=F1ONLY,INCLUDE=(1,1,CH,EQ,C'1'),   
    BUILD=(2,37)                                   
  OUTFIL FNAMES=F2ONLY,INCLUDE=(1,1,CH,EQ,C'2'),   
    BUILD=(54,20,39,15)                            
/*

The first JOINKEYS statement identifies CITIES1 as the ddname for the F1 data set and indicates ascending keys (State and City) in positions 21-35 and 1-20 of that data set. The JNF1CNTL data set contains an OMIT statement to remove the VERMONT records from the F1 data set so they will not be joined. Since the F1 records are already in order by the State and City fields, SORTED is used to do a Copy rather than a Sort for the F1 data set.

The second JOINKEYS statement identifies CITIES2 as the ddname for the F2 data set and indicates ascending keys (State and City) in positions 1-15 and 16-35 of that data set. The JNF2CNTL data set contains an OMIT statement to remove the VERMONT records from the F2 data set so they will not be joined. Since the F2 records are not already in order by the State and City fields, SORTED is not used and a Sort in performed for the F2 data set.

Since we want to separate out the BOTH, F1ONLY and F2ONLY joined records, we use a JOIN statement with UNPAIRED,F1,F2 to keep the unpaired joined records as well as the paired join records. In the REFORMAT statement, we use ? as the first field to give us an indicator of whether each key was found in both records ('B' indicator), only in F1 ('1' indicator) or only in F2 ('2' indicator). After the indicator, we put positions 1-37 (City, State, District) from F1 and 1-35 (State, City) from F2. After the records are joined, they will look as shown in Table 8.

Table 8. Joined City records
Indicator	F1 City	F1 State	F1 District	F2 State	F2 City
1 1	2 21	22 36	37 38	39 53	54 73
B B 2 2 1 1 1 1 B B B 1 B 1 1 2	GILROY GILROY MORGAN HILL PALO ALTO PALO ALTO PALO ALTO SACRAMENTO SAN JOSE SAN JOSE SAN MARTIN AUSTIN DALLAS DALLAS	CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA TEXAS TEXAS TEXAS	05 10 03 15 08 21 05 02 10 12 21 15 25	CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA TEXAS TEXAS	GILROY GILROY LOS ANGELES MODESTO SACRAMENTO SAN JOSE SAN JOSE AUSTIN LAREDO

Now we can use OUTFIL statements to create our three different output data sets. We write positions 2-38 (F1 City, F1 State, F1 District) of the joined records with an indicator of 'B' to positions 1-37 of the BOTH.OUT data set. We write positions 2-38 (F1 City, F1 State, F1 District) of the joined records with an indicator of '1' to positions 1-37 of the F1ONLY.OUT data set. We write positions 54-73 (F2 City) and 39-53 (F2 State) of the joined records with an indicator of '2' to positions 1-35 of the F2ONLY.OUT data set. The resulting records are shown in Table 9, Table 10 and Table 11.

Table 9. BOTH.OUT data set
City	State	District
1 20	21 35	36 37
GILROY GILROY SACRAMENTO SAN JOSE SAN JOSE AUSTIN	CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA TEXAS	05 10 05 02 10 21

Table 10. F1ONLY.OUT data set
City	State	District
1 20	21 35	36 37
MORGAN HILL PALO ALTO PALO ALTO PALO ALTO SAN MARTIN DALLAS DALLAS	CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA CALIFORNIA TEXAS TEXAS	03 15 08 21 12 15 25

Table 11. F2ONLY.OUT data set
City	State
1 20	21 35
LOS ANGELES MODESTO LAREDO	CALIFORNIA CALIFORNIA TEXAS

These are just a few of the many types of join operations you can do with JOINKEYS. See z/OS DFSORT Application Programming Guide for complete details of using the JOINKEYS, JOIN and REFORMAT statements along with the other DFSORT statements to perform different JOINKEYS applications.

Summary

This chapter covered the use of JOINKEYS, JOIN and REFORMAT to join records from two data sets.

The next chapter will cover methods of calling DFSORT from a program.