Mask archived and extracted data directly into CSV, XML and ECM formats using InfoSphere Optim Data Masking Solution

Increase the interoperability of your data

Learn how file format types that are now supported by IBM® InfoSphere® Optim® Data Masking Solution will help you to increase the interoperability of your data and support data privacy and security requirements. This article describes options to convert your data to XML, CSV, or ECM format types, and explains how various threshold limits can be applied. In addition, you'll learn how to apply data privacy modules for these file format types and how to manage LOB and XML data types during this process.

Share:

Gaurav Rohila (garohila@in.ibm.com), Software Test Specialist, IBM

Gaurav Rohila photoGaurav Rohila has more than eight years of experience in software testing. For the last four year, he has worked as a software test specialist on the Optim QA team at the India Software Lab in Bangalore. In the past, Gaurav has participated in testing financial and insurance domain applications, such as Peoplesoft AP and GL modules.



10 January 2013

Also available in Chinese

Introduction

InfoSphere Optim Data Masking Solution provides a way to mask personal information used in a data source. It gives you a methodology for using realistic but fictional data for testing purposes. In previous releases of InfoSphere Optim, you could convert or mask extracted data (.XF) into another data set or extract to a Business Objects (CSV) file. (The Business Objects (CSV) file format enables all tables data to be stored as a single record in the CSV file.) Starting with Optim 8.1 and later, not only can you convert extracted data into CSV, XML, or enterprise content management (ECM) data formats, but you can also mask the data when converting to these formats. This new capability benefits users who are considering CSV or XML as a source of data for their various needs including directly accessing these data sources through their current applications.

These data formats (CSV, XML and ECM) come with several easily configurable parameters. In the sections below, you'll learn more about how to configure these parameters.

The following scenario illustrates how to use this new masking capability.


Scenario

In our example scenario, the "ABC" company wants to send data to another company, "XYZ." Company ABC has an Optim environment, but company XYZ does not. In such a scenario, company ABC will not be able to send data in the form of extract/archive files, as company XYZ does not have the same environment. To solve this problem, company ABC can transform the data stored in extract or archive files to CSV, XML, or ECM formats using the database load utility. It can then be imported to a database at XYZ. In addition, using these formats, the data can be masked to meet data privacy requirements and to secure the data.


Differences between Business Objects (CSV) file and CSV file formats

  • When the source file is converted to a Business Objects (CSV) file, it can be opened in a text editor with all selected columns listed as a single table and single file, whereas under comma separated value (CSV) format, each table within the source file is converted to a separate CSV file.
  • Using the Business Objects (CSV) file format, the source file cannot be masked, whereas a CSV source file can be masked.
  • Under Business Object (CSV) file format, you do not have the option to control the destination file data with threshold limits such as file size, number of files and row count, whereas under CSV file format, you can control the destination file data with these thresholds limits.
  • Using Business Object (CSV) file format, you do not have option to control the destination file data with metadata options such as beginning/end label, and header/field/string delimiter, or the option to create separate destination files for XML/LOB data types if they exist in the source data. Under CSV file format, you can create destination files with metadata options such as beginning/end label, header/field/string delimiter, and also creating separate destination files for XML/LOB data types if present in source data.

Differences between Business Object (XML) and XML file format

  • Under Business Object (XML) file format, the source file is converted to an XML file, which can be opened in XML format with all selected columns listed as a single table and single file. Under XML file format, all the tables under the source file can be converted to separate XML files for each table under the source file.
  • Under Business Object (XML) file format, the source file cannot be masked, whereas under XML file format, the source file which is converted to an XML file can be masked.
  • Under Business Object (XML) file format, you do not have the option to control the destination file data with thresholds limits such as file size, number of files, and row count, whereas under XML format you have the option to control the destination file data with these thresholds limits.
  • Under Business Object (XML) file format, you do not have the option to control the destination file data with creation of separate destination files for XML/LOB data types if present in the source data, whereas under XML file format you have the option to create separate destination files for XML/LOB data types if present in source data.

Understanding the ECM file format

Enterprise Content Management (ECM) supports only archive files as a file while converting data, whereas the CSV file format supports both archive and extract files as source files while converting data. The ECM support mimics the external features of CSV file format with the addition of the following metadata values of archive files:

  • Archive file name
  • Archive file GUID
  • Archive request description
  • Creation time

ECM format has the following features:

  • Under ECM format all the tables under the source file can be converted into separate ECM files for each table under the source file
  • ECM file format supports data masking.
  • Under ECM format, users have the option to control the destination file data with thresholds limits including file size, number of files, and row count.
  • Under ECM file format, you have the option to create destination files with meta data options such as beginning/end label, header/field/string delimiter, archive file name, archive file GUID, archive request description,creation time, and also creating the separate destination files for XML/LOB data types if present in source file.

Use case #1

In the following example, the source data extract/archive file has 5 tables. The customer wants to convert this data to XML/CSV/ECM formats, where each table is to be created in separate CSV/XML/ECM files or else into a single file.

In the steps below, you can see how to convert a source file called convert_file_formats.xf into CSV file format. The source file has five tables called optim_details, optim_customers, optim_items, optim_sales, and optim_orders. The user wants to create five separate CSV files for each of these tables. Also there is a need to mask the EMAIL_ADDRESS column of the optim_customers table.

Steps to follow for use case #1

  1. Navigate to File > New > Convert under the Convert Request Editor window. Select Comma Separated Value (CSV) option under the File Format Types drop down, and enter valid paths for control, source, and destination file as shown in Figure 1.
    Figure 1. Convert Request Editor window
    This figure shows convert request editor window which can be navigated by File < New < Convert
  2. Under the Convert Request Editor window, navigate to Tools > Edit Table Map.
  3. Under Table Map Editor window, enter a valid destination qualifier as shown in Figure 2 (in the Destination box) and enter Local keyword for optim_customers table. Right click and select Open Column Map as as shown in Figure 2.
    Figure 2. Table Map Editor window
    This figure shows Table map editor window which can be navigated by convert request editor window < Tools < Edit Table Map.On this window all the Tables available in source file are displayed
  4. Under the Column Map Editor window enter data privacy functions for the email_address column of the optim_customers table. Enter TRANS EML function under source column email_address as shown in Figure 3.
  5. Under Table Map Editor window click File > Update and Return.
    Figure 3. Column Map Editor window
    This figure shows Column map editor window and this window shows all the columns available under respective Table
  6. Now under the Convert Request Editor window, click on Comma Separated Value (CSV) tab and enter the destination paths for all the five tables under the "File Name" field as shown in Figure 4. After entering the paths for all the tables, click Run now button.
    Figure 4. Convert Request Editor window showing values under CSV tab
    Under this window which is displayed by clicking the csv tab, user can enter threshold limits for each table or default thresholds limits for all the tables and also can enter various file names for all the tables
  7. Review the convert.csv request results as shown in Figure 5. A CSV file for each table is created under data_file folder located in C:\. Also, a collective_data.csv file is created, which is mentioned under destination file path under Convert Request Editor window of convert.csv request. But this file is 0KB in size, as all the source data was converted into five different destination CSV files, leaving this file (collective_data.csv file) empty.
    Figure 5. Results "data_file" folder showing CSV files
    This figure displays all the csv files generated after the convert request is executed
  8. Also shown in Figure 6, the email_address column is masked for the optim_customers table.
    Figure 6. Customers CSV file details
    This figure displays all the csv values generated in one csv file called customers.csv after the convert request is executed

Notes for use case #1

  • If you want to have the converted data in a single csv file for all the source data, then the File Name column as shown in Figure 4 should remain empty and no path should be given for any of tables. In this case, all the data for all the five tables will be copied into single a csv file called collective_data.csv file mentioned under the destination path of convert.csv request.
  • Let's say you want to have the converted data in the file mentioned in the destination path under Convert Request Editor window of convert.csv request for some of tables, say three tables. For the remaining two tables you want the data to be in separate csv files. Then in that case, in the File Name column as shown in Figure 4, paths should be entered for these two tables only. For the other three tables, the File Name column field should be left as empty, and no path should be listed for these three tables. In this case source data for two tables will be converted and copied into two separate csv files, and for the other three tables for which path is NOT listed in the File Name column, the converted data is copied to a single file which is listed under "Destination path" of the "Convert Request Editor" window.
  • Using the same steps, you can also create XML and ECM file format types, just by selecting the XML and ECM file formats from File Format Types drop down under the Convert Request Editor window.

Use case #2

Perhaps you want to control the converted or destination data for CSV/XML/ECM file formats, based on file size or number of files or row counts. Follow the steps below to learn what parameters need to be set.

In this example, the source file has five tables as follows:

  • Optim_details with 21427 rows
  • Optim_customers with 3520 rows
  • Optim_items with 510 rows
  • Optim_sales with 110 rows
  • Optim_orders with 9321 rows

You want to create an XML file for each table separately, but do not want to have more than 10,000 records in each file also do not want to have number of files more than 5 for each table with files size limit set to 5MB.

Steps to follow for use case #2

  1. Navigate to File > New > Convert. Under the Convert Request Editor window select XML option under File Format Types drop down. Enter valid paths for control, source, and destination file as shown in Figure 7.
  2. Under the Convert Request Editor window navigate to Tools > Edit Table Map.
  3. Under Table Map Editor window, enter valid destination qualifier and click File > Update and Return.
    Figure 7. Convert request editor window showing convert.xml request
    This figure shows convert request editor window which can be navigated by File < New < Convert
  4. Select the XML tab under the Convert Request Editor window and under the Default Threshold grid, enter the number of files as 5, row count as 10,000, and file size as 5 as shown in Figure 8. Also enter file names for each table as shown in Figure 8 and then click the Run now button.
    Figure 8. Convert request editor window showing options under "XML" tab
    This figure shows all the values such as threshold limits and file names which can be entered for one or more tables under xml tab of convert request editor window
  5. Review the output under data_file directory as shown in Figure 9:
    • Optim_customers has 3520 rows, so only 1 file is created as CUSTOMERS. XML.
    • Optim_items has 510 rows, so only 1 file is created as ITEMS.XML.
    • Optim_sales has 110 rows, so only 1 file is created as SALES.XML.
    • Optim_orders has 9321 rows, so only 1 file is created as ORDERS.XM.L
    • Optim_details has 21,427 rows, so three files are created: DETAILS.XML with 10,000 rows, DETAILS_1.XML with 10,000 rows and DETAILS_2.XML with the remaining 1427 rows.

Note: The collective_data.xml file name given under the destination file path under Convert Request Editor window will have 0 rows after the convert.xml request is executed. As for all the tables under the Table Map Editor, a path has been assigned with a distinct file name for each table under File Name data grid which is under the XML tab of Convert Request Editor window of convert.xml request.

Figure 9. Result of convert.xml request showing XML files
This figure displays all the xml files displayed under the data_file folder after the convert.xml convert request is executed

Possible error message in use case #2

If there is a contradiction in the values given for threshold limits, then the behavior of Optim will be as follows.

For example, let's say you have given the same threshold values for file size and row count as given in above example, but the number of files value is given as 2 instead of 5 as shown in Figure 10.

Figure 10. Default threshold limits
This figure shows all the values such as threshold limits and file names which can be entered for one or more tables under xml tab of convert request editor window

Figure 11 shows the convert.xml process report.

Figure 11. Convert.xml process report
This figure displays convert report generated after convert.xml is executed

Scenario 1

In this scenario, the threshold limits are provided for any of tables or all the tables under the data grid. For example, let's say the optim_customers table value is entered as 1000 for row count, 4 for number of files, and 4 for file size, as shown in Figure 12. In this case, the threshold limit provided in the default threshold grid will be ignored for that table. While converting, Optim will entertain only the threshold limits that are provided in the data grid which is besides the File Name field for that table or tables.

Figure 12. Screen showing threshold limits
This figure shows all the values such as threshold limits and file names which can be entered for one or more tables under xml tab of convert request editor window

So based on Figure 12, optim_customers table will now have four files created, as this table has 3520 rows, with three files using row limit as 1000 and the fourth file with the remaining 520 rows, while other tables with follow default threshold.

Figure 13. Figure showing XML files under data_file result folder
This figure displays all the xml files generated after convert.xml is executed

Scenario 2

Threshold values are taken into account by Optim while converting data only when file names are provided under File Name field for the respective tables. If these fields are kept as empty as shown Figure 14, but values are provided in default threshold grid, in this case the default threshold values are totally ignored when Optim converts data.

Figure 14. Figure showing "File name" and "Threshold values" column
This figure shows all the values such as threshold limits and file names which can be entered for one or more tables under xml tab of convert request editor window

Use case #3

This use case explains the creatiion of CSV, XML, and ECM convert file types, and shows the importance of Include LOB/XML columns field. The example below example demonstrates step-by-step how values for LOB data type column called credit card_history (which exists in the optim_customers table under convert_file_formats.XF source file) can be transferred to a location that is different from the destination file location given under Convert Request Editor window.

Steps to follow for use case #3

  1. Navigate to File > New > Convert. Under the Convert Request Editor window, select the Comma Separated Value (CSV) option under the File Format Types drop down, and enter valid paths for control, source and destination file as shown in Figure 1.
  2. Under the Convert Request Editor window navigate to Tools > Edit Table Map.
  3. Under Table Map Editor window, enter a valid destination qualifier and click Update and return.
  4. Click on Comma Separated Value(CSV) tab and then click CSV options button as displayed in Figure 15.
    Figure 15. Convert requert editor window showing "csv options" button under "csv" tab
    This figure shows convert request editor window which can be navigated by File < New < Convert and also displays the csv options push button
  5. On clicking CSV options button, a window will be displayed as shown above in Figure 16.
    Figure 16. Comma separated value options window
    The figure displays 'comma separated value options' window where all the meta data options, xml/lob options and null field options can be entered
  6. Check the Include LOB/XML Columns option and enter the respective path under the LOB/XML Columns field.
  7. Click OK and run the Convert.CSV request.
  8. Review the Convert.CSV results under C:\data_file directory, as shown in Figure 17. Under the LOB data type column called credit card_history in the optim_customers table, a path is displayed where LOB/XML data will be stored instead of actually displaying the XML/LOB data.
    Figure 17. Results under "data_file" folder showing Comma separated value file
    This figure displays values under 'collective_data.csv' file which is generated after convert.csv request is executed
  9. Review Figure 18 with Convert.CSV results under C:\XML_LOB_DATA dir. Each value of LOB data type column for every row is displayed in a single file. As there are 3520 rows in "optim_customers," so 3520 files will be created for each value of LOB data type column called credit card_history of the optim_customers table.
    Figure 18. Results under "xml_lob_data" folder
    This figure displays all the xml and lob data types files displayed under xml_lob_data folder after convert.csv request is executed

Use Case #4

This use case explains creating the CSV and ECM convert file types, showing the importance of various CSV/ECM options such as field delimiter, escape character, string delimiter, and header delimiter, and various metadata options such as beginning label and end label.

Steps to follow for use case #4

  1. Navigate to File > New > Convert. Under the Convert Request Editor window select Enterprise Content Management (CSV) option under File Format Types drop down, and enter valid paths for control, source, and destination files as shown in Figure 19.
    Figure 19. Convert request editor window showing Convert.Ecm request
    This figure shows convert request editor window which can be navigated by File < New < Convert
  2. Under the Convert Request Editor window, navigate to Tools > Edit Table Map.
  3. Under Table Map Editor window, enter valid destination qualifier and click Update and return.
  4. Click on Enterprise Content Management (CSV) tab and enter respective file names for each table under File name and also threshold limits as shown in Figure 20.
    Figure 20. Convert request editor window showing "Threshold" and "File name" fields
    This figure displays values such as file name and threshold limts which can be entered by clicking the 'enterprise content managemant 'csv'' tab under 'convert request editor' window
  5. Now click on ECM options button. The Enterprise Content Management (CSV) options window will be displayed as shown in Figure 21. Enter valid values for all the fields this window. Click OK and run the ECM Convert request.
    Figure 21. Enterprise Content Management(csv) options window
    This figure displays 'Enterprise content Managemant 'csv' options' window where all the meta data options,archive meta data options,null field options, xml/lob options can be entered
  6. The Enterprise Content Management (CSV) options are as follows:
    1. Generate Meta Data: This will write column names at the beginning of the first record for a given table.
    2. Beginning Label: This would prefix the specified label before first column name. This field will be used only when "Generate Meta Data" option is enabled else it will be ignored. The maximum length of the label string is 10 (ten) ASCII characters.
    3. End Label: This would suffix the specified label after last column name. This field will be used only when "Generate Meta Data" option is enabled else it will be ignored. The maximum length of the label string is 10 (ten) ASCII characters.
    4. Header Delimiter: This would separate column names with the value specified. This field will be used only when "Generate Meta Data" option is enabled else it will be ignored. This should be ASCII printable character and maximum of one character.
    5. Field Delimiter: This would separate column values. This should be ASCII printable character and maximum of one character.
    6. String Delimiter: This would put specified delimiter around string (not numeric) column values. This should be ASCII character and maximum of one character.
    7. Escape Char: If specified and the escape char specified is encountered as a part of string data then Escape Char would be pre-pended. This should be ASCII character and maximum of one character.
    8. Value for Null Columns: If specified, the string will appear in the column(s) where value is NULL. This should be ASCII character and maximum of 15 (Fifteen) characters.
  7. Review the results as displayed in Figure 22. As threshold limit for number of rows for each table is 2000, so each file for each table will have 2000 rows. For example "optim_details" table has 21427 rows, so ten files are created with 2000 rows each and last file "details_10" will have remaining 1427 rows. Also review the data for "sales.ecm" file in Figure 22 displaying all the meta data values. The first line has "Beginning Label" called "Begin" and "End Label" called "End" with all the column names displayed and each column name is separated by "Header Delimiter" called "%". The values for columns are separated by "Field Delimiter" called "," and "String Delimiter" called " " " around each column value.
    Figure 22. Data_File folder showing results of ecm files
    This figure displays all the .ECM files displayed after convert.ecm request is executed
  8. Review the convert report displayed (Figure 23 and Figure 24) after "convert.ecm" request is executed.
    Figure 23. "Convert.Ecm" process report
    This figure displays the convert report generated after convert.ecm is executed
    Figure 24. "Convert.Ecm" process report
    This figure displays the convert report generated after convert.ecm is executed

Conclusion

This article has described the various convert file format types which can be generated with the same request called "Convert". Also besides converting source files such as archive and extract file into XML, ECM and CSV file formats, data privacy modules can be applied while converting these source files and in this way the converted data in either XML, CSV, or ECM formats, can also be masked.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Information management on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Information Management
ArticleID=854402
ArticleTitle=Mask archived and extracted data directly into CSV, XML and ECM formats using InfoSphere Optim Data Masking Solution
publish-date=01102013