John_Haldeman 2700048G9H 572 Views
This blog's author has moved to a new home:
From there you can learn about Optim and Guardium not just from me but from all of the experts at Information Insights. If you subscribe to this blog, I appreciate the support and hope you will subscribe to the new one as well.
Lets say you have been tasked with implementing entitlement reports for all of the databases in your environment. You have 20 DB2 and 10 Oracle data sources. You would like to associate them with the standard Guardium entitlement report definitions.
Guardium has about 10 tables in it's database related to Oracle entitlement reports. It also has about 6 tables for the same purpose for DB2. This means that you will need to create (20 * 6) + (10 * 10) = 220 datasource associations in the Guardium GUI. That's a lot of clicking!!!
A well known and less infuriating way to do this is to use the grdAPI to create the data sources. That involves a lot less clicking and, if you have a list of data sources, it should be easy to generate the required grdAPI calls. Those grdAPI calls look like the following:
grdapi create_datasourceRef_by_name datasourceName=<datasource_name> objName=<guardium_def_name> application=<CustomTables/SecurityAssessment/Classifier>
Here, <datasource_name> would be the data source you created in Guardium and you want to get entitlement information for. You can get a list of them by exporting the DataSources report in the Daily Monitor tab of the admin user interface. The <guardium_def_name> parameter would be the name of the table (or vulnerability assessment, or classifier process) you want to associate the DataSource with. So, for example, one DB2 entitlement report table is "DB2 Column Level Privs" and one Oracle entitlement report table is "ORA Accnts of ALTER SYSTEM". The <CustomTables/SecurityAssessment/Classifier> indicates if this is for entitlement reports (or another custom table), vulnerability assessments, or a classifier process.
I recently had to do this at a customer with quite a few data sources, and so I created a ruby script that takes as input a list of data sources, a list of Guaridum definitions, and an application name. It then outputs the grdAPI calls. You can find it with an example here. Below is a brief explanation on how to use it:
1) Download and install a ruby interpreter from http://www.ruby-lang.org
2) Use the script as follows:
Once you have the grdAPI calls created you can run them through the Guardium CLI to associated them to the Guradium definitions.
John_Haldeman 2700048G9H Tags:  system_events windows siem syslog guardium 1 Comment 4,146 Views
In a previous post I discussed an alternative to the traditional method of having Guardium monitor windows system events with CAS and a home grown script. The alternative was to build out a lightweight syslog infrastructure that would log the windows events to be imported into Guardium. The alternative used two open source products called Snare for Windows and Backlog. I still maintain that if you can afford it, a better practice would be to couple Guardium with an enterprise grade SIEM system like QRadar but having said that, let's look at how to configure Snare, Backlog and Guardium to work together.
Step 1: Install Snare and Backlog
You can find the Snare for windows installation package here and the Backlog installation package here (scroll down to the Download sections). The installations are relatively straight forward, so let's get right to the configuration. If you like, you can install the components on different servers so that you do not have to turn the windows server you are monitoring into a syslog receiver.
There is a blog post here that explains the installation and configuration process a lot more thoroughly. It is well worth a watch if you're interested in trying this yourself.
Step 2: Configure Snare for Windows
Snare for Windows' configuration utility is a small web interface hosted by the same service that gathers and sends the event log information. You can access it using the start menu. Log in with the username and password you configured during installation. From there, switch to the "Network Configuration" menu. In that menu there is a field called "Destination Snare Server address". That field controls where the syslog messages are sent by the snare. You need to direct the syslog messages to where you have installed Backlog. I have it all installed on one server in my example, so my destination address is localhost (127.0.0.1):
Snare Network Configuration (Click to Enlarge)
That's all there is to it. You can change the Objectives (similar to logging policies in Guardium) to send less information to syslog. By default Snare is going to send everything, which is probably more than you need.
Step 3: Configure Backlog
The next step is to configure Backlog to receive the syslog messages sent by Snare and write them to disk. To do this, open the Backlog program in the start menu, and click on Setup then Snare Backlog Configuration. Specify a log directory to save the data to disk and then choose how you want to split up the log files.
You should be aware that Backlog is a very simple syslog receiver. So you'll need to manage your own logs and ensure that you delete older logs so that you don't run out of space on the syslog server. When you finish configuring backlog and make sure both Snare and Backlog are started, you can start to see a tab separated output file in the directory you specified earlier.
Tab Seperated Syslog Data Saved by backlog (Click to Enlarge)
Step 4: Import the data into Guardium
It is a fairly well known process for importing external data into the Guardium appliance. Many people use it for change ticketing or for importing database catalog information into the appliance (for more information, see "How to correlate data from external databases" in the How-To Manual). I don't want to rewrite the manuals here, so I will instead describe the specifics required to import the windows system event data being capture by Snare and BackLog.
When you import data into Guardium, the first thing you define is a custom table. Unfortunately there is no database to connect to in order to get the table definition, so you will have to create a definition manually. You can use mine if you like. The definition is in the screen shot below.
Guardium/Snare Custom Table Definition (Click to Enlarge)
I did not have to figure out the table definition all by myself. There's an rsyslog developers wiki entry that explains the format fairly well.
Once you have the table created in Guardium, you can import data into it. You do this by first creating a data source. I chose a SAMBA data source (a Windows share) because it was convenient, but it could also be a FTP, SCP, or even an HTTP file server.
A SAMBA text data source in Guardium to Import Syslog Data (Click to Enlarge)
An Import Definiton to Pull in Data From the Samba Datasource (Click to Enlarge)
Finally, all you need to do is add the table to it's own custom domain and build a custom query on top of that (again, for more information on the general process see "How to correlate data from external databases" in the How-To Manual).
A Windows Event Log Report in Guardium Highlighting User Login Failures (Click to Enlarge)
With those steps you can get nice, clean, Windows system events in Guardium using Snare and Backlog. Remember, this isn't limited to Windows System Events. There are a variety of Snare agents, not to mention the possibility of pointing any device that uses syslog to the Backlog server and importing the events into Guardium for a consolidated view of you Environment's security events. This could get unmanageable very quickly for large environments, but for our small non-profit customer with Guardium and no enterprise SIEM, it seems like a good alternative.
I ran into a problem that I think isn't documented anywhere else online, so I thought I would post it here with the resolution.
When running a delete request on DB2 for System z using Optim Distributed V7.3.1, the following occurs:
1) The process terminates saying no errors, no warnings, but no data is deleted
2) When you inspect the Optim trace file, the following error appears:
SQLText: delete from <table_name> where <where_clause>
ErrData: SQLCode=-104 SQLErrd3=-1
SQLCA Error Tokens=ErrTxt1: SQLState=42601 NativeError=-104 Row=-1 Col=-2
ErrMsg1: [IBM][CLI Driver][DB2] SQL0104N An unexpected token " " was found following "".
Expected tokens may include: "<END-OF-STATEMENT> ". SQLSTATE=42601
<table_name> will be the table Optim attempts to delete from, and the <where_clause> corresponds to the primary key you have set up for that table.
The problem is with Optim and can be fixed by applying Optim 7.3.1, patch 29. This patch is currently not available on fix central, so you'll have to call up IBM Support to get it. I am going to try and convince them to post it to fix central and will update this post if I am successful.
We recently had a customer ask us if Guardium can import
Windows system events. Guardium can do this. The traditional method is to use
the Configuration Audit System (CAS). You build a script and let the CAS call
it. One of the Guardium Forum posts here has a PDF attachment that explains how to do it.
This does fulfill the requirement, but there are a couple of problems with this idea. First, it requires you to build a custom script that you now have to maintain. Second, the CAS will send the complete output of the script to the collector where it is stored and can be pretty awkward to query. It’s not that the CAS is bad per se, this use case is just outside of the original intent of the CAS. The CAS' primary purpose is to monitor database configuration changes that don’t involve client/server communication (eg: a change to Oracle’s listener.ora configuration file).
Essentially you are telling Guardium to do something it was not designed specifically to do. Instead of doing this, you should consider using a Security Event and Information Monitor (SIEM) or a log aggregator to bring in these events as well as all the other events that occur at the operating system and application level. This is, after all, why they were built. For the same reason people prefer to use Guardium for database activity monitoring, it’s probably a better idea to let these other systems manage your server logs. You can even stick with IBM Software and pick up QRadar or Tivoli’s SIEM to help you do it.
That’s nice to say but what if you have Guardium, you don’t have a SIEM, and cannot purchase a SIEM? Our customer is a small non-profit and has this problem. If you are in that situation, by all means, use CAS. But I want to present an alternative. If you are a small company like our customer, you might be open to using an open source product called the Snare Agent for Windows by a company called InterSect Alliance. Snare, which is licensed under GPL, takes the Windows System events and publishes them to a syslog feed. From there you capture the feed using the (also free) Backlog syslog receiver from InterSect and dump the data to a tab separated text file. At that point you can import the event files into Guardium on a regular basis to get consolidated reporting in one environment.
I know what you are thinking. That sounds even more complicated than custom scripting for CAS. But this approach has some advantages:
In the next post I will explain how you can configure Snare and Backlog to put the data in a format that Guardium can easily consume.
John_Haldeman 2700048G9H Tags:  java integration alerts application_user_translat... guardium blocking correlation_alerts 3 Comments 2,772 Views
I had a very interesting conversation with a potential customer of ours today. I don't want to give away the customer or what specifically they are trying to do, so let me talk about an equivalent hypothetical to our customer's situation:
Imagine that your are in charge of security at a high profile hospital in California. You have heard about Guardium being able to help you with compliance, and in the process you heard about Guardium even being able to block access to the database using a feature called Data Level Access Control (Or if you're speaking to someone who's been working with Guardium for awhile, they might call this S-GATE Termination).
You are interested in blocking access to a very particular kind of data in a very particular situation. You have hundreds of health care professionals accessing your databases every day through an application. Some of your patients are celebrities. The fear is that some celebrity health care records may be looked up unnecessarily and abused similarly to how they were at UCLA a few years ago. Now, accessing a celebrity's health care records is part of the job for many users of the system at the hospital. What you want to detect is abuse. One way to do this is by detecting an unusual amount of access to celebrity health information. So, while it might be normal to look up the patient records of 2 or even 3 celebrities in a day, anything more would be highly unusual and should be investigated. In fact, it's so unusual you would like to block access to the application if it occurs in order to give you time to investigate.
So, can Guardium do this? Let's break apart the requirements:
R1) Detect the username of the end user and correlate it with the SQL statements that result from accessing the application. We want the username of the person logging into the application, not the database user name
R2) Detect access to four or more celebrity patient health care records within a one day time period by an application end user
R3) Block access to the data for the user when the situation in R2 occurs
Detection of the end user name accessing the data through the application, requirement R1, should be possible in Guardium using one of the five methods of application user translation. A creative use of correlation alerts could be used to fulfill requirement R2.
On the surface, R3, sounds just as straight forward. If you detect that the situation is occurring, block it. If you are using correlation alerts, your best bet is to use a quarantine. There is even an example of how to do this in the Guardium How-To guide (see "How to use threshold alerts to terminate connections"). The problem is that if you do this, you are going to block access to the entire application because Quarantine acts based on a database user and IP. Your entire application is going to get blocked, not just your end user that is causing mischief.
Okay! So what about real time termination (S-GATE and S-TAP terminate)? Well, it would be difficult to devise a real-time rule that models requirement R2, but even if you could, you will probably run into trouble. There are two kinds of termination: Open mode and Closed Mode. Open mode (sometimes called S-TAP terminate), has no latency associated with it, but prevents further execution occurring once it has been triggered. The problem with this is that if the application handles it's connections gracefully, it will just reconnect the disconnected session and retry the transaction. You could see a situation where the application encounters a hiccup but the user's session continues.
So, how do we fulfill sticky requirement R3? The big problem with the methods described earlier is that they are not blocking at the right place. Guardium's blocking occurs at the database, which is great for use cases where you block database administrators from going in and running queries to view the sensitive health care information, but less useful when trying to block an individual application user accessing the application and looking at celebrity health care records. So, the idea we have come up with is to work with the application instead of against it. We let Guardium track and detect potential abuse of the system, and let the application disconnect the session and lock out the application user.
To do this, we would use Guardium's ability to execute custom alerts. Custom alerts allow you to build a Java class which contains a method that is called every time a specific alert is triggered in Guardium. We would write some code to access the application's API (probably through a web service), tell it to invalidate the web session, and then lock out the user until their activity could be investigated.
We haven't built it yet, but we are excited enough about it to build a prototype. Hopefully it works as as well in practice as it does in theory!
Happy Belated New Year! My resolution is to make the time to write more. Maybe planes and airports are the right venues for that.
Anyway, our team at Information Insights was recently made aware of HASH_LOOKUP’s algorithm for determining a value to retrieve from a lookup table. The algorithm is proprietary so I do not wish to share it, but we can explore it using experiments that anyone with Optim can run. We were made aware of the algorithm because an Optim customer (not one of ours, but we heard the story anyway) was getting some strange results when looking up values with HASH_LOOKUP.
The situation was that the customer had a table they were masking whose number of records was lower than the lookup table. Even though this was the case, they were getting a great deal of repeated values in the lookup table. This was counter intuitive because the natural instinct is to hope that Optim would try and attempt to maximize the likelihood of values not repeating. If the lookup table was greater in size than the input table, uniqueness is possible.
As you may have guessed by the name, HASH_LOOKUP looks up a value in a lookup table by hashing an input value. The result of the hash is a very large number. That number is then normalized to the number of records in the lookup table so that it always falls in the lookup table’s domain (we weren’t told how, but the easiest way to do that would be to apply a modulus function to the result of the hash). If the lookup table and the input table were at least the same size, theoretically, if you have a good enough hash algorithm, you could get a unique value for all your inputs.
Uniqueness like that doesn’t happen in practice. Hash algorithms are not unique and are instead built under the assumption of you having a very large input domain (eg: all the possible names for people) and a very small output domain (eg: a list of 10,000 names from the US Census). Even so, collision rates for the output are an interesting measure of a hash algorithm’s quality. So, let’s put HASH_LOOKUP’s hash algorithm through its paces.
The idea of this experiment was to understand how many collisions result from HASH_LOOKUP. The experiment was run with Optim for Distributed Systems V7.3.1 GA (we're using the the traditional non-java based solution for this).
The Input Table:
To set up these experiments with HASH_LOOKUP, I took a contrived input table that had two columns. One column was a simple integer column with a sequence number starting at 1 and incrementing to just less than 130,000 (the number of rows in the table). The second column, which held the result of the lookup, was empty. It would be replaced with the index of the value that would be looked up from the lookup table.
The Lookup Table:
This table was a simple table of incrementing indexes. It varied in size but started with the same number of rows as the input table. Six separate runs were performed keeping everything in the experiment constant, but halving the size of the lookup table each time. So, the first time the lookup table was of equal size to the input table. The second time, the lookup table was half the size, and so on until the sixth run where the lookup table was 1/32nd the size.
I used HASH_LOOKUP which took in the sequence number of the input table as the value to be hashed. The function replaced the value in the result column with the index value that was looked up in the lookup table. So, the result column’s value had the following function definition:
HASH_LOOKUP(SEQUENCE_COL, LOOKUP_TABLE(LOOKUP_INDEX, LOOKUP_INDEX))
You would probably never use this definition in real scenario. This was just to explore which row index results from the hash algorithm that HASH_LOOKUP uses.
Two measures were taken. The first was to determine how good HASH_LOOKUP is at keeping values unique. The second measure was to graph the resulting values in order to get an idea of distribution. In the graphs, the x axis is the input value to be hashed. The y axis is the index number that results from HASH_LOOKUP.
Input Table Size = Lookup Table Size:
Percentage of unique values: 8.1%
(click image to enlarge)
Input Table Size = 2 Times the Size of the Lookup Table:
Percentage of unique values: 8.1%
Input Table Size = 4 Times the Size of the Lookup Table:
Percentage of unique values: 8.1%
Input Table Size = 8 Times the Size of the Lookup Table:
Percentage of unique values: 7.1%
Percentage of unique values: 5.0%
Input Table Size = 32 Times the Size of the Lookup Table:
Percentage of unique values: 3.1%
Analysis and Suggestions:
The good news is that Optim will cover the lookup table very well when the lookup table is several times smaller than the input domain. But, note that a relatively consistently increasing sawtooth pattern appears in the data. Also, note that two values relatively close together may return the same values because of the jagged sawtooth distribution. As such, here are some suggestions for using HASH_LOOKUP:
1) Clearly it’s inappropriate to use HASH_LOOKUP on fields that require unique values: HASH_LOOKUP is not unique. In fact, you are quite likely to get at least some repeat values in large enough data sets. The good news is that lookup algorithms are only really appropriate for sensitive information that doesn’t require uniqueness anyway. Fields like names and addresses which are most often used in conjunction with lookup masking functions are also hardly ever required to be unique
2) Choose a key for your HASH_LOOKUP that has a non-incrementing and large input domain. If you do not do this, you might not only see repeating values, but repeating values near eachother, decreasing the realism of your test data set
3) Consider using RAND_LOOKUP instead if you don’t require consistency in your data. Watch out though, RAND_LOOKUP will not be unique either
Could this be improved?
It’s an interesting question to consider whether this could be improved. So, let’s look at what would happen if you used a well known HASH algorithm like MD5 instead. So, let’s say you created your own HASH_LOOKUP using MD5 instead of the one that Optim uses. Here are the results:
Percentage of unique values: 63%
If you compare that to the traditional HASH_LOOKUP’s distribution, you could expect more random feeling returned values and better coverage for large lookup tables. So why didn’t they use MD5 HASH instead? Well, aside from the MD5 algorithm being more difficult to maintain, it’s a lot more expensive in terms of computation. Besides, if you understand how it works and its modest limitations, HASH_LOOKUP is still very effective in masking data. If you choose the right column to HASH (not a sequence number) the data still feels relatively random and the coverage is pretty good. We use it every day to help our customers do just that. So, if that’s the case why would you spend the cycles on a more expensive algorithm when the cheaper one works well enough? This especially makes sense when you take into account the fact that Optim’s heritage is on the mainframe where CPU cycles are a matter of direct cost.
John_Haldeman 2700048G9H 1,098 Views
We had a customer recently who was very interested in Guardium's Application User Translation capabilities. This is where instead of just showing the database user name, Guardium also reports the application user name that the end user logs in with. This username is probably going to be different from the database user name because, for performance reasons, the application logs in with a single database user and then accesses the database using a connection pool.
I wrote an article on this with Benjamin Leonhardi at the beginning of this year. It was published in May:
I wanted to supplement this article with some of the questions I asked the customer in order to help them make the right decision for which of the five methods to use. Here is what I asked and why I asked it. These are a little biased to DB2 and WebSphere as that is what our client is using:
- Is this an application built in house?
- Are you using any application frameworks in your code other than the standard ones provided by java and the application server
You could use built in application user/translation if they are using a well known and supported app (Siebel, JD Edwards, EBS, PeopleSoft, etc.). Also, if it’s not custom made, many application frameworks that aren’t mentioned as “supported” have had this kind customer requirement before, and will have areas where you can configure them to make identity propagation calls (eg: Cognos, and Business Objects).
Some application frameworks even do this transparently without the customer ever knowing it happens.
Also, watch out for applications that use the same framework as an application that comes supported out of the box. For example, An application written on Oracle Forms has a decent chance of working with Oracle EBS Application User Translation (because Oracle EBS is built on top of Oracle Forms).
More details on this method are here.
- Do you do any database auditing at this time?
- Do you track your application users through that auditing framework?
- When a user switches, do you use the standard JDBC setClientInfo API to indicate a user switch?
- Does you application use some other form of Native DB2 identity propagation. For example: DB2 Trusted Contexts?
Guardium tends to pick up and support the standard auditing calls built into databases. DB2’s setClientInfo API and Trusted Contexts are examples.
More details on this method are here.
- Are your connection pools managed by WebSphere and refrenced via JNDI?
- Is Authentication of your users managed by WebSphere (and JAAS)?
If the application satisfies both of these requirements, they can use WebSphere's user propagation functions with DataStoreHelpers. I recently built a prototype Guardium DataStoreHelper for WebSphere it seems to work quite well. Weblogic has an equivalent I think, but I am much less well versed in it.
More details on this method are here.
- Is your application structured in a way that a single class manages the interface to the Connection Pool?
Our customer picked it up right away that this effectively means overloading the JDBC interfaces. If you overload the get connection calls instead of interacting directly with JNDI, Application User Translation with the Guardium Application Event API this is a cinch. With WebSphere Data Store Helpers you are effectively doing the same thing, just the application doesn't know about it.
More details on this method are here.
- Does your application use a large number of stored procedures?
- Can you identify a stored procedure that is always called when a user switches?
To see if Stored Procedure Identification could be used.
More details on this method are here.
- Is your application externally or internally facing?
- When your users log in, is it done through an SSL encrypted page?
- Is your web server separated from your application server or are they on the same machine(s)?
- When your web server passes messages to the application server, are those SSL encrypted?
I asked these to see if an Application Server Based STAP is possible. Application Server Based STAPs hate encrypted connections. You might be able to get away with it though if your web server forwards to an application server and those calls are made through unencrypted connection.
More details on this method are here.
John_Haldeman 2700048G9H 745 Views
This month has been crazy with a new project starting up, so instead of providing new content, I will instead point to two new Optim references that have recently arrived on the scene.
The first is a fantastic red book written by a team of Optim experts on the Data Growth Solution. It goes into a great deal of detail and is a very good read:
The second is part one of a two article series that I have written on the Optim Complete Business Object. Part 1 introduces what complete business objects are and a couple of problems with them. Part 2 does a survey of the IBM tools developed for definition of business objects. Part 1 can be found here:
I will update this posting with the link to part 2 when it is published. It's being edited right now.
Update: Part 2 is now published:
John_Haldeman 2700048G9H Tags:  attunity odm data_growth optim archive federation 1 Comment 2,275 Views
The topic of this month's post is how to use a little known, but extremely useful, feature of Optim. That is the ability to create views using the Open Data Manager (which uses the Optim Connect infrastructure, which until recently was named Attunity Integration Server). I decided to write this post after an Information Insights customer asked me if there was a way to manipulate data coming in from one of Optim LUW's extended data sources, VSAM. You can use this same concept for that use case as well.
Normally when you try to create a view for an Optim Connect data source, it will tell you that the data source type does not support "CREATE VIEW". What it is really telling you is that it cannot run the CREATE VIEW statement on the underlying data source. You can create virtual views in the Optim Connect Middleware layer by creating a special data source called a virtual data source.
Once you have a virtual data source, you can switch to the metadata explorer and add views to it. The Virtual Data Source is aware of all the other data sources and data source shortcuts located on the Optim Connect Server. You can reference those external data sources by adding a prefix to the table names in your SQL statements. So, to select all the records in the OPTIM_CUSTOMERS table located in the ARCHIVE data source, you would connect to your virtual data source and execute:
SELECT * FROM ARCHIVE:OPTIM_CUSTOMERS
Now, the same concept applies for views, except you wrap your SQL statement inside a CREATE VIEW statement. You can either create the views inside Optim Connect Studio, or by connecting to the data source with the command line utility, nav_util and executing the CREATE VIEW statement that way.
There is also space to create synonyms and even stored procedures. That's all there is to it. Any user or application that wants to take advantage of the new views on the data can do so by connecting to the virtual data source instead of the original data sources. It adds some flexibility to Optim Connect, and can help prevent you from needing to add an additional layer of middleware (like InfoSphere Federation Server).
John_Haldeman 2700048G9H Tags:  sensitive_data masking discovery guardium infosphere_discovery 2,031 Views
In the news this week, InfoSphere Discovery came out with a new Fixpack. We are now at InfoSphere Discovery 4.5.1 Fixpack 1. Unlike some other IBM products, Discovery incorporates new features into fixpacks. You can find out what's new in this fixpack release by taking a look at the release notes here. The release notes are always pretty good for Discovery. For instance, we were recently able to trace back the entire feature history for Discovery since version 4.1.0 for an upcoming Information Insights engagement using them.
One new feature that caught my eye in 4.5.1 Fixpack 1 was:
"You can now export part or all of a project to Guardium® and import part or all of a project from Guardium"
Looking through the updated manual, there's almost no information on what this actually means, so let's take a closer look.
UPDATE: There is some information in the manual now! In Guardium V8.2, a chapter on this is also included in the HOW-TO guide.
If you apply the fixpack, you will see this new menu item:
Before you can export this data, you will need to make sure that some columns in Discovery are classified. This makes a lot of sense in the Guardium context because you are likely to be using Discovery to classify sensitive data (rather than discover relationships, or build transformation rules, or something like that). Some of the Guardium people reading this might think: "Wait a minute, the Guardium Classifier already finds sensitive data. How does Discovery provide any additional value to Guardium?". This is somewhat true, but there are some things that Discovery can detect that Guardium's Classifier does not do as accurately. Here are three examples:
Classified Columns in Discovery. This can either be done manually (as in this example) or have Discovery try to automatically classify the data based on common formats
Once you have some columns classified, you can export this data in the following formats:
When you export the data as something delimited (the most likely case), this database object list can then be very easily imported into Guardium. All you do is upload the file to an FTP or HTTP server and create a custom domain, table, and then query in Guardium:
The InfoSphere Discovery Data Imported into Guardium
Once the query is built, you can populate your groups with the data from that query:
The Imported Members to be Added to the Group
Going the other way, from Guardium to Discovery, is even easier. The best use case I can think of for that is to help you build those sometimes illusive Sensitive Data Repository (SDR) Databases using results that come from the Guardium classifier. If you don't know where at least some of your sensitive data is to start with, this can be very helpful.
John_Haldeman 2700048G9H Tags:  optim_exit exit optim masking data_privacy sensitive_data personal_information 1 Comment 2,470 Views
On a recent Information Insights engagement we encountered a requirement to mask Canadian Social Insurance Numbers (SIN). For those unfamiliar with it, Canadian SIN numbers act as Canada’s National ID. They can be thought of as similar to the United States’ Social Security Number (SSN), and Brazil’s Cadastro de Pessoas Físicas (CDF). United States customers are lucky; they have a Social Security Number transformation function that comes with Optim. These customers can generate random but valid social security numbers by executing TRANS SSN inside their column maps.
For Canadian SIN numbers, until recently, we gave the standard answer we give for any data element involving an unimplemented data privacy function. That is, we suggested doing one of the following depending on your requirements, the sensitivity of the data, and the customer environment:
1) Understand your application’s check sums, and try and incorporate RAND or SEQ functions instead. This may be enough if your application is relatively lenient in its checks
I said that this was our answer for Canadian SINs until recently because we found an undocumented data privacy function inside Optim that generates social insurance numbers. You can access it by placing EXIT PX0SIN in your column map:
This will generate valid but fictional Canadian SIN Numbers to replace the real ones. It behaves very similarly to TRANS SSN. You should take care to note though that this is undocumented. It was probably left out of the documentation for a reason. I would not expect it to have been put through the same kind of rigorous testing that the other documented data privacy functions have been put through. As such you should fill in this gap by being more rigorous in the testing of the function in your specific customer environments.
What about if you are not this lucky? What if you need a transformation function, but none is provided by Optim? I will spend the rest of this post explaining what the EXIT call shown above means and how you can build your own exits.
Optim Exits written in C or C++ effectively provide the same functionality as Optim Column Maps which are written in BASIC and coded within the Optim environment. Optim Exits have a couple of main advantages which make them my personal preference. The first is that you can code them in any development environment instead of being limited to Optim’s column map procedure IDE. The second is that they are more extensible and flexible than column map procedures because it is very easy to link to the numerous external C/C++ libraries and frameworks. It’s possible but more awkward to do this in Optim Column Map Procedures written in BASIC within the Optim Environement.
Let’s take a look at how Exits work. The first step is to set up your environment. You will need a C/C++ compiler. I recommend downloading something like Visual C++ and using the Microsoft compiler that comes with it. The reason is because the Optim team seems to use the Microsoft Compilers to build Optim. GNU based compilers like MinGW also work, but I find I get strange conflicts when calling certain external libraries inside my Opim Exit. Once you have your compiler, you will need the header files located in <OPTIM_HOME>\RT\SAMPLES\CMExit. Now, let’s examine one of the simplest exits you can write below:
// Replaces a value in Optim. Simple convenience function for the pPSTPutColValue callback
void replaceValueInOptim(PST_STRUCT_CM_EXIT_PARM * pInputParms, void * value, int type, short precision, short scale, long size)
int rc = 0;
rc = (*pInputParms->pPSTPutColValue)
// The function that Optim calls: Where the magic happens__declspec(dllexport) short PSTColMapExit ( PST_STRUCT_CM_EXIT_PARM * pInputParms, PST_STRUCT_CM_EXIT_COL_LIST * pSrcColList, PST_STRUCT_CM_EXIT_COL_LIST * pDstColList )
char hello = "Hello World!";
replaceValueInOptim(pInputParms, &hello, PST_C_CHAR_SZ, 0, 0, sizeof(hello));
return( PST_CM_EXIT_SUCCESS );
This exit writes “Hello World!” repeatedly in the column it is referenced in. The #include lines import the header files you need to interact with Optim. The PSTColMapExit function is the function that the Optim Convert process looks for to call each time it needs to mask a value. In order for Optim to call the function, it needs to be visible to Optim. You can do this by including the __declspec(dllexport) indicator so that it becomes a DLL export. The replaceValueInOptim function is a convenience function to make it easier to call pInputParms->pPSTPutColValue. pPSTPutColValue is a callback used to replace values inside the column. Note that the function is incomplete as you will likely want to take different actions based on what the return code (rc) is after the callback. For instance you could do some error handling if Optim was unable to replace the value in the column or if the value was truncated.
That’s all there is to it. You can use this code as a skeleton to build your own exits. Just remember to compile as a DLL file including the headers and drop the resulting DLL file into <optim_home>\RT\BIN. Once you have the DLL in that directory you can call it very easily by reference EXIT <FILENAME> in your column maps where <FILENAME> is the name of the DLL without the DLL extension. For example if you create TEST.DLL you would enter EXIT TEST in your column map.
The header files are a great reference for extended/advanced functionality as well as the sample exit provided by the Optim team inside the Optim installation location and/or installation media. A lot of people are hesitant to implement C/C++ based Optim Exits. I see no reason why as long as you are careful about what you are doing and keep track of memory allocation. Happy Coding!