- Validate prerequisites
- Define the delimited flat file and the positional flat file to Optim using Optim Connect Studio file-table mapping
- Define ODBC sources
- Define the data sources to InfoSphere Federation Server
- Extract and de-identify data
- Downloadable resources
- Related topics
De-identify flat files and delimited files using InfoSphere Optim Data Privacy Solution and InfoSphere Federation Server
Maintain regulatory compliance for testing scenarios
IBM InfoSphere Optim provides a solution to streamline test data management, making it easier and faster to create right-sized test databases that accurately reflect end-to-end business processes. The solution can be extended with the Data Privacy Solution, which delivers powerful data transformation capabilities to mask confidential corporate data so that you can use it safely for application testing. These transformations include everything from simple data masking to pre-packaged transformation algorithms for complex data elements like credit card numbers, e-mail addresses, and national identifiers.
In an environment of data sensitivity, there is a need to mask data not only in traditional relational databases, but also in flat positional and flat delimited files that could be part of a testing scenario. Flat positional files are the same as flat delimited files except that the carriage return is not explicit and there is no delimiter between columns.
If you send files to vendors, off-shore testers, or even internal testers, the data in those files could be subject to regulations. This article shows you how to use the InfoSphere Optim Test Data Management with the Data Privacy Option and the InfoSphere Federation Server to mask data appropriately. You will learn to map the flat file data to tables using Optim Connect Studio, which are then defined as ODBC data sources and made accessible to InfoSphere Federation Server using the ODBC wrapper, as shown in Figure 1.
Figure 1. Flat files accessed using InfoSphere Federation Server and Optim Connect
(View a larger version of Figure 1.)
The high-level steps described in this article include:
- Validate that you have the proper prerequisites. You need InfoSphere Optim Test Data Management with Data Privacy Option at a minimum of Version 7.3.1. The Optim Connect software needs to be at Version 5.3.2 or later, and the InfoSphere Federation Server should be at Version 9.5 or later. You also need an Optim generic license, which is included with the product.
- Define the flat delimited file that needs masking (delim1.txt) and the subsequent output file (delim2.txt) so Optim can access them using Optim Connect. Also define the flat positional file that needs masking (posit1.txt) and the subsequent output file (posit2.txt) so they can be accessed by using Optim Connect. Optim Connect Studio is the tool used to define them.
- Define the files as ODBC data sources so that InfoSphere Federation Server can access them.
- Create appropriate definitions (ODBC wrapper, nicknames, and user mappings) for the data sources in InfoSphere Federation Server.
- Extract the data, define the masking algorithm, and insert with masking into the new output file (delim2.txt, posit2.txt). The masking in this case uses one of the simpler functions within Optim (substring), but the entire set of masking functionality is available for these files just as it is with relational tables.
The instructions in this article are based on the assumption that Optim 7.3.1, Optim Connect 5.3.2, and InfoSphere Federated Server 9.5 or later have been successfully installed on a Windows® platform.
Optim Test Data Management and Privacy needs to be at Version 7.3.1. Optim Connect needs to be at Version 5.3.1. Optim Connect must be at this level to be able to mask positional flat files. InfoSphere Federation Server needs to be at Version 9.5 or later. Complete the following steps to validate your software levels.
- Start Optim, and click Help > About
Optim, as shown in Figure 2.
Figure 2. Optim version level validation
The version and level are displayed, as shown in Figure 3.
Figure 3. Optim version level
- Start Optim Connect Studio, and click Help > About
Optim Connect Studio, as shown in Figure 4.
Figure 4. Verify Optim Connect version level
The version and level are displayed, as shown in Figure 5.
Figure 5. Optim Connect version level
Optim Connect must be at this level to work with positional flat files.
- To check the level of InfoSphere Federation server, enter the command
db2licm –lfrom a DOS prompt, which returns the result shown in Listing 1.
Listing 1. Results of server level check
Product name: "InfoSphere Federation Server" License type: "CPU" Expiry date: "Permanent" Product identifier: "wsfs" Version information: "9.5"
Define the delimited flat file and the positional flat file to Optim using Optim Connect Studio file-table mapping
There are many cases where you might need to mask flat file data. To take advantage of the extensive algorithms available in Optim Data Privacy Solution, the file must be accessed as if it were a relational table with columns. You can then select the columns to mask within the Optim tool and take advantage of the functions and algorithms within Optim. To mask flat files, you need two files: one for input to Optim and one for output from Optim as masked data.
Follow these steps to define the flat file that needs masking (DELIM1.txt) and the output file (DELIM2.txt) using Optim Connect Studio.
- Create a flat file for this example that includes two rows, as shown in
Listing 2. Example flat file
- Define a positional flat file that needs masking POSIT1.txt
and an output file POSIT2.txt, using Optim Connect Studio. The
flat positional file is shown in Listing 3.
Listing 3. Example flat positional file
- Open Optim Connect Studio, and begin the process of defining
the flat file definition and the positional flat file definition. The
default view is Configuration, but if it doesn't appear, click the
menu option Window > Show view >
Configuration, as shown in Figure 6.
Figure 6. Optim Connect configuration screen
- Expand localhost, then Bindings, and then NAV.
- Right-click Data Sources, and click New
Data Source, as shown in Figure 7.
Figure 7. Start creating a new data source
- To add a delimited type file to the data sources, enter
DELIM in the name field, highlight
Delimited Text Files, and click
Next, as shown in Figure 8.
Figure 8. Create new data sources
- Enter the file system where the files are located, and
click Finish, as shown in Figure 9.
Figure 9. Fill in file location
- Expand the DELIM data source, right-click Show
Metadata View, and expand DELIM, as
shown in Figure 10.
Figure 10. Create table
Create a new table
Follow these steps to create a new table.
- Right-click the table, and click New table, as shown
in Figure 11.
Figure 11. New table wizard
- Enter the table name (LIM1), and click Finish.
- On the right
pane, enter the data file location (don't use the browse button),
change the organization to sequential, and press
CTRL+S to save, as shown in Figure 12. Check
to make sure your changes are saved.
Figure 12. Table parameter wizard
- Click the Columns tab at the bottom.
- Click Insert in the upper right-hand
corner. The Insert Column wizard appears, as shown in Figure 13.
Figure 13. Table column wizard
- Fill in the first column name that you want for the table (NAME), and click OK.
- Pull down the data type, and scroll until you get to the
type you want (varstring) as a data type for the first column, which
DAN, as shown in Figure 14.
Figure 14. Table attribute wizard
- Enter the size 10 in the next column. Repeat this for
all the columns in your file, using
the Insert button with Insert after as the location of
each new column.
The final table and attributes are shown in Figure 15.
Figure 15. Results of table attribute wizard
- Press CTRL+S, or click the X next to *LIM1 to
close the screen and to save the definition. To display the XML,
right-click the table name, and click Open as XML.The XML representation
is shown in Listing 4.
Listing 4. XML representation of table attribute
<table datasource="DELIM" description="TEST" delimited="," fileName="C:\Flatfiles\delim1.txt" name="LIM1" organization="sequential" quoteChar="0"> <fields> <field datatype="varstring" name="NAME" size="10"/> <field datatype="varstring" name="IDNUM" offset="12" size="10"/> <field datatype="varstring" name="WORK" size="10"/> </fields> <keys/> </table>
- Make sure that the
delimited=”,”shows up in the XML. If it is not there, add it and save the file.
- Repeat the steps to create a new table for a second table called LIM2. Check to make sure the location and sequential attributes are saved.
Validate the new tables
Follow these steps to validate the new tables.
- Validate that you defined the table on the flat file
correctly by right-clicking LIM1, choosing
Test, and clicking Next and Finish.
If the table is defined correctly, you will see a sampling of the
table data, as shown in Figure 16. You can also use the query tool to select the data in the
file as if it were a relational table.
Figure 16. Table validate (test) wizard
- Repeat the validation for LIM2.
- Close all the delimited file definitions.
Create a definition for the positional flat files
Follow these steps to create a definition for the positional flat files.
- Click the Configuration tab on the left, and right-click Data Sources.
- Click New Data Source, and fill in a name (POSIT1).
- Highlight Flat files, and click
Next, as shown in Figure 17.
Figure 17. Data source wizard
- Enter the location, and click Finish, as shown in
Figure 18. New positional flat file
- Repeat the steps to create a definition for the positional flat files for a second data source named POSIT2.
Complete the data table
Follow these steps to complete the data table.
- Right-click POSIT1,
and click Show Metadata View,as shown in Figure 19.
Figure 19. Optim Connect Studio
- Click to expand POSIT1, and right-click on Table.
- Click New Table, enter POSIT1 for the table name, and click Finish.
- Enter the complete data file location and name (don’t browse; type the location).
- Change the organization to sequential, and press
shown in Figure 20.
Figure 20. Table definition
- Click the Columns tab on the bottom of the pane.
- Add the column names and characteristics: NAME string 7, IDNUM
string 3, WORK string 10, and CRLF string 2,
as shown in Figure 21. Do not change any other
Figure 21. Add columns
- Click the X next to
- To ensure that
the file and location are saved correctly, right-click
POSIT1, click Open as XML, and check to see that the entire
file name and location is there, as shown in Listing 5. If it is not, edit the XML, and make
Listing 5. XML representation of POSIT1 datasource
<table datasource="POSIT1" fileName="C:\flatfiles\posit1.txt" name="POSIT1" organization="sequential"> <fields> <field datatype="string" name="NAME" size="7"/> <field datatype="string" name="IDNUM" size="3"/> <field datatype="string" name="WORK" size="10"/> <field datatype="string" name="CRLF" size="2"/> </fields> <keys/> </table>
- Repeat the steps to complete the data table for a POSIT2 table definition. The resulting
metadata view should look like Figure 22.
Figure 22. Final metadata view output
Use nav_util to test the connection
Test the connection to the primary data source with nav_util before proceeding. Make sure you can select from the primary LIM1 and secondary table. LIM2 is empty, but you should be able to see that it exists. The following steps describe how to use the nav_util tool to test.
- Go to
C:\IBM Optim\Connect\Server\bin(or wherever you have it installed).
- Enter nav_util execute DELIM. The resulting screen is
shown in Figure 23.
Figure 23. List tables output
- Enter select * from LIM1; and the resulting screen is shown in Figure 24.
Figure 24. Select LIM1 output
- Repeat the steps to test the connection for the positional files.
Define ODBC sources
Follow these steps to define the ODBC data sources so that the InfoSphere Federation Server can access them.
- Go to Start > Control Panel > Administrative Tools > Data Sources (ODBC).
- Click System DSN, and click Add,
as shown in Figure 25.
Figure 25. ODBC Data Source Administrator
- Highlight Optim Connect Driver for OBDC 2.5, and click
Finish, as shown in Figure 26.
Figure 26. Optim Connect Driver
The Optim Connect Driver for ODBC 2.5 setup wizard appears, as shown in Figure 27.
Figure 27. Optim Connect DSN properties
- Enter the name on the first screen (DELIM), leave the server location blank, and click Next.
- Pull down the User profile field, and click NAV.
- Enter nav (lower case) in the
password field, and check Save password (unsafe), as
shown in Figure 28.
Figure 28. Optim Connect 2.5 setup
- Test the login, and click Next.
- For the binding, leave it as NAV. Pull down the Default datasource (catalog) field, and select your data source (DELIM).
- Check Single datasource (for schema), test the
datasource, and click Next, as shown in Figure 29.
Figure 29. Optim Connect binding properties
- Take all the defaults for the tracing and debugging, click
Next, and note the entries, as shown in Figure 30.
Figure 30. Final entry for data source DELIM
- Click Finish.
- Repeat the steps to define ODBC sources for both POSIT1 and POSIT2, keeping in mind that you grouped the two files with DELIM and separated the two files for POSIT1 and POSIT2 to show an alternative way of defining the files.
Define the data sources to InfoSphere Federation Server
Now you will define the two delimited files and two positional files to InfoSphere Federation Server. (This article assumes that the InfoSphere Federation Server is installed and working before proceeding with the next steps.) You'll then associate those files with the ODBC wrapper so that InfoSphere Federation Server will use ODBC to extract from and insert to those files. It is very important to be consistent and use the same AUTHID throughout the federated definitions and the Optim definitions. This article uses db2work for all connections in the federated hub and for all Optim directory and aliases.
Follow these steps to define the data sources to InfoSphere Federation Server.
- From the DB2 Control Center, expand your database and your federated
objects, right-click Federated Database Objects, and
click Create Wrapper,
as shown in Figure 31.
Figure 31. Control Center base
- Choose the ODBC as the data source for the wrapper, and enter the name for your wrapper
(DELIM), as shown in Figure 32.
Figure 32. Create wrapper
- Click the Settings tab.
- Change the value of DB2_FENCED to Y, and click OK to
save, as shown in Figure 33.
Figure 33. Wrapper settings
- Expand the DELIM wrapper, right-click Server Definitions,
and click Create, as shown in Figure 34.
Figure 34. Create server definitions
- Click Discover, as shown in Figure 35.
Figure 35. Create server definitions
- De-select all the check boxes for the objects currently selected, and
select the check box for the data source you want (DELIM), as shown in Figure 36.
Figure 36. Server selection
- Click the Properties button, and click the Settings tab.
- Be sure the check boxes for PUSHDOWN and
DB2_MAXIMAL_PUSHDOWN are selected as the server,
and change those values of
N to Y, as shown in Figure 37.
Figure 37. Server settings
Map the local user IDs with the NAV user ID
Follow these steps to map the local user IDs with the NAV user ID used in Optim Connect Studio.
- From the Object View, right-click User
Mapping, and click Create, as shown in
Figure 38. Create user mapping
- Click the user you want for the local login (DB2WORK), and move it to
the second pane by clicking on the right arrow (>) button, as shown
in Figure 39.
Figure 39. User definition
- Click the Settings tab, and enter the user ID and password that you
will use for Optim Connect. For the example, the user ID and the password
are both nav, as shown in Figure 40.
Figure 40. User settings
- Click OK to finish.
Create a nickname
Follow these steps to create a nickname. Note that you can instead skip ahead to create nicknames using an alternative method.
- Right-click Nicknames,
and select Create, as shown in Figure 41.
Figure 41. Create nicknames
- Click Add, enter the remote schema
(public) and remote
table name (DELIM1), and click OK,
as shown in Figure 42.
Figure 42. Add first nickname
- Repeat the steps to create a nickname
for DELIM2. The resulting screen is shown in Figure 43.
Figure 43. Add second nickname
Create nicknames using an alternative method
You could instead use discovery to find and create nicknames. Follow these steps to create nicknames using discovery.
- Right-click Nicknames, and click Create.
- Click Discover on the right side.
- Enter the user ID (nav) and password
(nav), as shown in Figure 44.
Figure 44. Discover nicknames
- Click OK.
- Check the nicknames you want to define, and click OK,
as shown in Figure 45.
Figure 45. Select nicknames
Define data sources for POSIT1 and POSIT2
Repeat the steps to define the data sources to InfoSphere Federation Server for both POSIT1 and POSIT2.
Extract and de-identify data
This section describes how to use IBM Optim to define an extract (using an access definition file), then extract the data from DELIM1 and the insert from the extract file into DELIM2 using the masking algorithm you define. You will also do the same for POSIT1 and POSIT2 (non-delimited flat files).
- Start Optim.
- Select Options > Product, as
shown in Figure 46.
Figure 46. Start Optim
- Enter your Optim password (optim), and select
OK, as shown in Figure 47.
Figure 47. Optim password
- Click Database.
- Select the check box for Allow OPTIM PK/Rels on DB
Views (Allow Primary Keys and Relationships
on Database Views) to enable Optim to access views in the
click OK, as shown in Figure 48.
Figure 48. Optim database options
Complete the extract process
Follow these steps to complete the extract process.
- To start the extract process, select File > New >
Extract, as shown in Figure 49.
Figure 49. Start extract
- Enter a description, which is useful for remembering what this is used for, in case you want to do it again.
- Enter a name for the new extract file (DELIM1.XF), and select the Named radio button for the access definition.
- Enter an access definition name (DELIM1.AD), and click the
icon for the access definition editor, which is located two icons to the right of the running
man icon, as shown in Figure 50.
Figure 50. Extract Request Editor
- Click the Start Table browse button, as shown in the
middle of Figure 51.
Figure 51. Browse start table
- Highlight your primary delimited file table (LIM1), click
Select, close the access definition, and save the
file, as shown in Figure 52.
Figure 52. Select start table
- Click the running man icon to extract the data.
- Click Proceed if the NO RUNSTATS message screen appears. You
should see a screen similar to Figure 53 that has the statistics from
Figure 53. Statistics report
- Close the extract window, and save the extract name as DELIM1.
Complete the insert with masking
Follow these steps to do the insert with masking.
- Select File > New >
Insert, and enter the following fields of the
insert, as shown in Figure 54.
- Description is Insert DELIM2
- Source File is DELIM1.XF (the source file is what you just extracted)
- Control File is DELIM1.cf
- Table Map Name is DB2WORK.DELIM2 (the table map is new, and you will be editing it next)
Figure 54. Insert Request Editor
- Edit the table map by clicking the icon for the table map editor (two to the right of the running man icon), as shown in Figure 54. The table map is used to define how you will mask the columns. You can pick from the base functions within Optim (SUBSTRING, for example), or you can use the algorithms for more complex masking of credit card numbers.
- Enter the qualifier (FEDERATE.DB2WORK) and a new column map in the table map.
- Enter the Column Map ID (DELIM2) and a column map name under Column Map or LOCAL (DELIM2).
- Right-click the column map, and click Open Column
Map, as shown in Figure 55. Make sure you change the destination table to your new
Figure 55. Table Map Editor
- Mask column 3 by entering SUBSTR(WORK1,3) in the
Source Column field that contains
WORK, as shown in Figure 56.
Figure 56. Mask column
- Close and save the column editor, and close the table map editor.
- Click the running man icon to do the insert to the second delimited
file, as shown in Figure 57.
Figure 57. Run insert request
The statistics screen appears, detailing what was inserted, as shown in Figure 58.
Figure 58. Insert statistics
- Select from the second delimited file, or browse for the file, to see
if the masking has occurred. Note the truncation due to the substring
in the column map, as shown in Listing 6.
Listing 6. Check for masking
C:\flatfiles>type delim2.txt DAN,222,PRI
High-quality test data based on production data is an important component of a test strategy. However, the consequences of using private data for testing purposes, even inadvertently, can be significant. Optim Test Data Management with Data Privacy Option provides an excellent solution to produce right-sized test databases that reflect business objects and can be de-identified using a variety of algorithms.
This article shows how you can extend those capabilities to flat files by using InfoSphere Federation Server to map flat files to tables so that any private data in those files can be extracted and de-identified using the powerful capabilities of the Optim Test Data Management and Data Privacy Solution.
The nav_util program in the directory of c:\ibm optim\connect\server\bin is a great diagnostic tool to use if you run into errors during the definition of the tables. Some helpful functions include the following:
Nav_util check license
Nav_util execute delim
Once you are in the tool, there are many different utilities to check your definitions and test.
NavSQL> > describe <ds_name:>table_name <full> <index> to describe a table or a view > describe <ds_name:>@procedure_name to describe a procedure > help > list catalogs <catalog-mask> > list columns <table-mask> <column-mask> <case-sensitive> > list procedure_col <procedure-mask> <column-mask> <case-sensitive> > list procedures <procedure-mask> > list statistics <table-mask> > list synonyms <synonym-mask> > list tables <table-mask> > list views <view-mask> > native_describe <ds_name:>table_name <full> <ndex> to get a native description of a table or a view > query_describe sql_statement to describe an sql statement > rollback > any select, call or update statement > show datatype datatype_name > show driver driver_name > tdp-default ds_name > explain ['<file_name>'] <sql statement> NavSQL >
A special thanks to Tim Smith for technical guidance.
- See the Optim Test Data Management product page to learn more about Optim Test Data Management.
- Peek at the Optim Data Privacy product page to learn more about Optim Data Privacy.
- Join the Integrated Data Management community space to find information and conversation about Integrated Data Management products, technologies, and best practices.