Integrated Data Management: Managing data across its lifecycle

Featuring Optim solutions for Integrated Data Management

Welcome to Integrated Data Management by IBM

IBM has embarked on a strategic initiative to deliver an integrated, yet modular, data management environment to design, develop, deploy, operate, optimize and govern data, databases, and data-driven applications throughout the entire data management lifecycle. IBM calls this Integrated Data Management. By focusing across the lifecycle and enabling different roles to collaborate, you can increase organizational productivity, agility, and effectiveness, while improving the quality of service, cost of ownership, and governance of diverse data, databases, and data-driven applications.

Individual products provide powerful capabilities that target specific data management roles and tasks; more importantly, the components interoperate, enabling cross-role collaboration and cross-product synergy. And to deliver a true lifecycle solution, integration extends beyond Optim to include a broad range of IBM offerings.

This article takes a look across phases and roles to articulate how IBM’s solutions for Integrated Data Management can help you get more value from your information and help your team be more aligned, productive and effective.

Diverse, distributed, and interrelated environments

Have you noticed that it’s hard to find a standalone application anymore? Everything is interrelated. One system feeds another and all must come together to present a common view to the customer, whoever your customer might be. Thus it is increasingly important to look at how to manage your information assets more holistically and strategically. Organizations need help to inventory and understand what assets they have and how they are related. A common definition of customer or patient or citizen or supplier across the organization is required. You need to understand data movement and data lineage. And that’s going to mean discovering and sharing the information that you have about your data assets cross-role, cross-solution, and cross-lifecycle.

End-to-end management of the data lifecycle

Today most organizations have myriad products in-house from many vendors supporting different roles and tasks. Each focuses on providing rich task-specific value, but puts little emphasis on linkages with the preceding or next phase in the lifecycle. Wouldn’t it make life easier to define access or retention policies when the data is first designed and let the tools propagate that information from phase to phase and tool to tool? IBM software can support each phase of the lifecycle with robust offerings for data-centric tasks and roles, as well as provide support for designing and implementing key cross-phase linkages. The following list defines the key phases in a data-centric software lifecycle:

Discover and design
Discover, harvest, model, and relate information to drive a common semantic understanding of the business and to identify key information that must be protected.
Develop and test
Code, generate, test, tune, and package data access layers, database routines, and data services. Configure, change, and promote applications, services, and databases into production.
Manage performance
Administer databases to meet service level agreements and security requirements while providing responsive service to emerging issues. Provide pro-active planning and optimization for applications and workloads, including trend analysis, capacity planning, and growth planning.
Archive and retire
Separate historical data from current transactions, and safely remove historical data to a secure archive to lower costs. Improve performance of current transactional systems, and comply with regulatory requirements.
Audit and protect
Establish, communicate, execute, and audit policies and practices to standardize, protect, and retain data in compliance with government, industry, or organizational requirements and regulations. Not limited to a single phase, these data governance concerns must infuse the entire lifecycle.

Cross-organizational collaboration

Maintaining alignment is about communication, collaboration, and clarity across organizational roles. Users and business analysts need to articulate requirements. Architects are responsible for designing the process, application, and data models. Developers must produce effective and efficient code using those models. Administrators must understand security and retention policies established by compliance officers and work with their network and systems administration colleagues to achieve compliance and service objectives. It’s critical to the agility, effectiveness and alignment of the organization as a whole that the role-specific capabilities can adapt to multi-role contributors as well as to distributed global teams.

Comprehensive portfolio--Emerging integration

Supporting Integrated Data Management is, and will always be, a multi-branded proposition. Today the IBM portfolio encompasses various offerings including Rational®, Information Management, Tivoli, and WebSphere offerings. IBM offers broad and deep capabilities for every phase in the lifecycle. But over time what will increasingly differentiate IBM offerings is the value-added integration across the portfolio (either in current product or in roadmaps) with common user interfaces, common components and services, and shared artifacts.

Common user interfaces
Whether Eclipse-based or Web-based, IBM is adopting a standard and integrated approach to user interfaces that makes moving between roles easy and intuitive. The portfolio includes an Eclipse-based user interface for tasks requiring rich object manipulation e.g. design and development. Here, the offerings complement and extend the IBM Rational Software Delivery Platform. The integrated nature of IBM Optim and Rational software simplifies collaboration among business analysts, architects, developers, and administrators. Users can combine products within the same Eclipse instance, providing seamless movement between tasks, or can share objects across geographically distributed teams to make it easier to maintain alignment and work more efficiently.

Operations support requires the ability to monitor and respond from anywhere at any time. The Web-based user interface supports operations-oriented administration. Adopting a common approach with Tivoli software for Web delivered dashboards and portlets provides the greatest flexibility for monitoring, management, and coherent information across the operational stack to improve an organization’s ability to meet service level agreements. And sharing all of these capabilities across data servers reduces overall skills requirements and costs. For z/OS users, existing 3270 interfaces continue to be supported and extended.

Common components and services
Sharing components and services across offerings helps organizations achieve cost, productivity, and consistency objectives. When the products share components, such as the Data Source Explorer and database connections, learning new tools becomes easier. For example, sharing a common connections repository saves time for team members. Shared services, such as data privacy policies, mean personal identification numbers are handled consistently whether creating test data or sharing research data.
Shared policies, models, and metadata
This is the glue that truly holds everything together. The ability to express policies for machine interpretation, to associate policies with data models or data workloads, and communicate both through shared metadata is the crux of the challenge as well as the critical lynchpin for greatest value. For example, shared configuration information between database administrators and application server administrators can vastly reduce deployment costs while improving quality of service. Shared privacy policies together with the services that implement them can improve security and compliance.

Heterogeneous flexibility

Recognizing the heterogeneity of most organizations, the vision spans IBM and non-IBM databases. While the strategy is to deliver first on DB2® and Informix® Dynamic Server databases, most integrated data management tasks are already supported across a range of heterogeneous databases. The roadmap includes expanding performance management offerings in that direction as well.

Data-centric roles

In the following sections, explore some of the key offerings and their value to various key roles that Integrated Data Management supports.

Data architect benefits: better data quality and enterprise consistency

Most projects don’t get to start from scratch. They need to leverage data already resident in the enterprise, which is seldom well documented. Discovering what data is available and how they relate to one another is a common and cumbersome task. InfoSphere® Discovery helps architects and DBAs get an understanding of the data from the inside out. By interrogating the metadata as well as the data itself, InfoSphere Discovery creates a model of the data, gives designers statistical profiles of the data, and derives relationships: not only direct primary and foreign key relationships, but also business object relationships central to proper functioning of test data generation and data archival processes.

InfoSphere Data Architect is a key tool for data architects to model, relate, and standardize data. Like any good data-modeling offering, InfoSphere Data Architect supports logical modeling, physical modeling, and automation features for diverse databases that simplify tasks. Such tasks include reverse engineering from existing databases, generating physical models from logical models, generating DDL from physical models, and visualizing the impact of changes.

Figure 1. InfoSphere Data Architect for modeling an image
InfoSphere Data Architect screenshot
InfoSphere Data Architect screenshot

But beyond core data modeling, InfoSphere Data Architect also helps data architects:

  • Integrate information by discovering and identifying mappings between models. InfoSphere Data Architect’s metadata-driven discovery is complemented by data-driven capabilities in InfoSphere Discovery and InfoSphere Information Analyzer. Data models can then be delivered to InfoSphere Information Server or InfoSphere Warehouse.
  • Implement best practices based on naming standards enforcement, business glossary integration, and industry model integration.
  • Achieve architectural alignment across process, service, application and data models with built-in transformation between models and clear linkage to business requirements. Built-in integration with the Rational portfolio offerings simplifies model interchange and alignment.
  • Facilitate governance practices regarding privacy standards for test data generation by capturing privacy policies and business objects for downstream tasks. Share privacy policies with developers and publish extract scripts for the Optim Test Data Management Solution and the Optim Data Privacy Solution.

InfoSphere Data Architect reprebsents a key integration point among the Rational, InfoSphere, and Optim portfolios. For example, it serves as the foundation for the Optim Designer component, which provides a common design interface that enables end users to design, deploy, and manage Optim data privacy, test data management, data growth, and application retirement processes independently from their runtime environments.

Developer benefits: better productivity and better application performance

Optim Development Studio, Optim Query Tuner, and Optim pureQuery Runtime offerings target data-centric developers and application DBAs.

Optim Development Studio provides an Eclipse-based integrated development environment to speed data-centric development targeting DB2, Informix, and Oracle databases. Customers and partners have reported a 25% to 50% productivity improvement with the product. And the data-centric development capability seamlessly extends functionality within the Rational Software Delivery Platform, such as Rational Application Developer for WebSphere Software. In particular, Optim Development Studio delivers:

  • SQL content-assist integrated with the Java editor
  • Stored procedure development (both SQL/PL and PL/SQL)
  • Data access layer generation
  • Web services tooling
  • SQL hot spot analysis, including before and after performance comparisons
  • Impact analysis
  • Tooling to bind packages
  • Many more capabilities to help DBAs and developers collaborate effectively

A key design point for Optim Development Studio and pureQuery Runtime is to help move performance sensitivity earlier in the development cycle where it is easier and less costly to fix. For example, developers can visualize SQL hot spots within the application during development. Adding Optim Query Tuner helps developers tune SQL for DB2 based on expert guidance to build skills and to reduce query tuning needs in production where risks and costs are much higher.

And Optim Development Studio makes it easy to compare performance data before and after making a change. When performance problems are found in production, developers and DBAs can spend considerable time isolating performance issues: first to a specific SQL statement, then to the source application, then to the originating code. Three-tier architectures and popular frameworks make this isolation more difficult, because the developer might never see the SQL generated by the framework. Optim Development Studio makes it easier to isolate problems by providing an outline that traces SQL statements back to the originating line in the source application, even when using Java frameworks such as Hibernate, OpenJPA, Spring, and others.

Figure 2. Outline view in Optim Development Studio for impact analysis and traceback
Optim Development studio screenshot
Optim Development studio screenshot

For those developers who want to use the data access layer generation capabilities pureQuery provides, pureQuery includes support for the standard Data Access Object (DAO) pattern. The data access layer leverages the pureQuery API, which is an intuitive and simple API that balances the productivity boost from object-relational mapping with the control of customized SQL generation. The layer also simplifies the use of best practices for enhanced database performance. Optim pureQuery Runtime is used with pureQuery data access layers.

Portfolio integration helps developers to be conscious of sensitive data. Developers can readily identify sensitive data based on the privacy metadata capture in InfoSphere Data Architect. Developers can provision test databases directly from fictional data, or they can generate extract definitions for Optim Test Data Management and Data Privacy to create customized test databases.

As more and more abstraction is introduced into the application architecture, developers and DBAs have become increasingly isolated from one another. And developers have less and less involvement, or even control, over the SQL that is executed to manage database access and persistence. Optim Development Studio supports collaboration between the developer and DBA, giving them an easy way to capture, share, review, optimize, and restrict SQL that will be put into production.

Tester benefits: better-quality test data without revealing sensitive information

The key role of the tester is to assure application quality. Historically, testers have used clones or extracts of live customer data to attempt to provide contextually accurate data, but a simple extract may not be sufficient and full clones can quickly break the budget. The test data needs to be reflective of application processing constraints as well as error and boundary conditions. IT staff are also challenged to protect confidential data and personally identifiable information ("PII") like bank account numbers and national identifiers.

The Optim Test Data Management Solution together with the Optim Data Privacy Solution create a right-sized, production-like test environment that accurately reflects end-to-end business processes while de-identifying sensitive information, which provides the perfect option for test data creation. The two Optim solutions offer built-in knowledge of packaged application business objects.

The Optim solutions support an iterative testing model that simplifies specification of error and boundary conditions and simplifies the comparison of test results to baseline data. Determination of errors is difficult, especially when you don’t know if, who, or how the data has changed. The Optim Test Data Management Solution enables comparison between data before and after the test to determine data inconsistencies and to identify errors earlier in the lifecycle. The Optim Test Data Management Solution offers built-in knowledge of packaged application business objects and pre-defined masking algorithms for common, sensitive information. Privacy attributes can be defined and managed consistently in InfoSphere Data Architect and can be used to generate test definitions directly from the Data Architect’s desktop or from Optim Development Studio, which helps organizations to assure compliance.

Database administrator benefits from more control and efficient problem isolation

The portfolio of products supporting the DBA are too numerous to mention individually, but you can find more information at Tools for z/OS and Tools for DB2 for Linux®, UNIX®, and Windows®. So rather than individual offerings, check out the strategic priorities and look at examples of particular tools that illustrate those priorities.

Give the DBA more control

Over time, the DBA's ability to control database performance has eroded, or at least become much harder, as additional layers emerge in the application stack. SQL is generated by frameworks not programmers, database connections are managed by systems administrators not DBAs, and dynamic SQL complicates security management.

Many DBAs like the added control they can gain from using static SQL, and now it is possible to gain that control easily over existing Java and .NET applications by using client optimization technology delivered in Optim pureQuery Runtime. This is an innovative approach to performance optimization that focuses on how to optimize database access from the database client rather than only looking within the database engine.

Client optimization captures SQL from executing applications and enables administrators to bind the SQL to DB2 for static execution without changing a single line of application code. You get all of the gain of static SQL, including making response time stable, reducing security risks, increasing throughput, improving manageability, and none of the pain. What’s more, pureQuery can alleviate novice programming errors, for example, by consolidating common SQL statements that use literals and converting them to parameter markers or by enabling DBAs to replace poor SQL generation by frameworks with optimized SQL. Now frameworks are a little less scary for conservative DBAs.

Future enhancements include plans to give DBAs control over performance knobs in the application server and to make client configuration manageable.

Bring the information together

Ever spend three or four days just isolating a performance problem to a particular query, and then spend another few days isolating it to the application? Performance issues are particularly difficult to isolate given that the problem could be in the application, the application server, the database client, the network, the database server, or the operating system. Each of these layers has performance information, but none have the information in aggregate. A key objective is to give administrators the ability to aggregate and correlate information enabling fast problem isolation not only to the offending SQL statement, but also to the originating application source.

With the Extended Insight capability in Optim Performance Manager Extended Edition (OPM EE) for both Java and DB2 Call Level Interface (CLI) applications, database monitoring is extended across the database client, the application server, and the network. DBAs have immediate insight into where database workloads, transactions, and SQL requests are spending their time. DBAs can readily identify the SQL statement and the application that is nearing or exceeding an alert threshold. OPM EE provides predefined application views for WebSphere® Application Server, SAP, Cognos®, InfoSphere DataStage®, and InfoSphere SQL Warehouse.

When the Optim pureQuery Runtime is used, Java application-development metadata is also available from the OPM EE console, making it easier to provide developers with the source code line number of a problematic SQL statement. The DBA can even tune the query immediately by launching into Optim Query Tuner directly from the OPM Web client.

OPM EE also integrates with Tivoli monitoring solutions, extending the existing Tivoli application monitoring with deep database drilldown, all from a single point of control.

Roadmap enhancements include bringing extended insight capability to the z/OS database monitoring environment.

Figure 3. SQL end-to-end response time factors and pureQuery application metadata in Optim Performance Manager Extended Edition
shows graph of avg end to end response time, selected SQL statement and associated java source code information for selected statement
shows graph of avg end to end response time, selected SQL statement and associated java source code information for selected statement

(View a larger version of Figure 3.)

Provide task-specific flows and context

Contextual information is a critical aspect of Integrated Data Management. Through contextual information, problem resolution is streamlined, and standards and policies can be enforced. Explore the following DBA scenarios that are enhanced with the contextual information available in Optim solutions.

The Health Summary in Optim Performance Manager provides an example of task-specific flows. (This Health Summary is very similar in appearance to the Health Summary included in the Data Studio Administration Console download.) Administrators need to be able to establish objectives, then leave the system to alert them when something is awry and provide relevant context to manage the alerted condition. Click on any alert to get more details about the alert condition and to go into relevant diagnostic dashboards that contain more details about sorting, transaction throughput, and much more.

Figure 4. Optim Performance Manager Health Summary
Health Summary of Optim Performance Manager with locking alert opened
Health Summary of Optim Performance Manager with locking alert opened

(View a larger version of Figure 4.)

Similarly, context-specific information is provided for DBAs and other operational staff. For installations that have both IBM Tivoli Monitoring for Composite Applications (ITCAMS) and Optim Performance Manager Extended Edition, operators can now access detailed database performance metrics from within the Tivoli Enterprise Portal.

DBAs can go back up the stack to see contextual information about the application environment (transaction topology) or about operating system and computer details by launching into the relevant ITCAMs workspace, which opens in context of the computer and operating system that is running the data server.

DBAs can configure the DB2 workload manager (WLM) feature from the Optim Performance Manager Extended Edition. Workload management enables DBAs to prioritize activities according to business priority, to achieve performance objectives for key applications, and to protect against rogue queries that could cause delays to business-critical applications.

To make it easier for DBAs to perform WLM configuration, only the metrics that are most useful for configuring WLM are presented. The metrics are presented only in the context of where they will be used. For example, if you are defining workload connection attributes, in the context of that task you can also see the connection attributes for all currently running activities and the workload to which the activities are assigned.

Make tools smarter

The journey toward autonomic operations continues by integrating best practices and advisory functions into the products. Optim Database Administrator increases productivity and reduces application outages through task automation. Optim Database Administrator does the following:

  • Facilitates impact and dependency analysis to mitigate risk
  • Generates customizable deployment scripts to automate and accelerate changes
  • Supports object, data, and authorization migration in support of database migration scenarios.
Figure 5. Identifying dependencies with Optim Database Administrator
Optim Database Administrator screenshot
Optim Database Administrator screenshot

Another example is found in the products that comprise the Optim Query Tuning solution. Optim Query Tuning solution offers a comprehensive set of tools and expert advisors that can help identify and improve problematic queries for DB2. Optim Query Tuning solution provides support for single query tuning as well as workload tuning. (Workload tuning is available only for DB2 for z/OS.) The advisors provide a rich set of recommendations for the type of statistics needed to improve performance, for the new indexes to improve query response time, and for the query and access path recommendations. Optim Query Tuning solution can shell share with Optim Development Studio, providing a single workspace for DBAs to optimize and revise application SQL without changing the application.

Figure 6. Optim Query Workload Tuner for DB2 for z/OS
Optim Query Workload Tuner screenshot
Optim Query Workload Tuner screenshot

(View a larger version of Figure 6.)

Planning for strategic growth

Overgrown databases can impair the performance of your mission-critical ERP, CRM and custom applications. Optim Data Growth Solutions solve the data growth problem at the source by managing your enterprise application data. Optim enables you to archive historical transaction records, storing them securely and cost-effectively. With less data to sift through, you speed reporting and improve responsiveness of mission-critical business processes.

But data archiving isn’t just about performance and cost improvements. Data archiving also facilitates application upgrades, consolidations, and retirement. Why consolidate all the data when only 20% is actively used? Archiving before an upgrade or consolidation speeds up the process, reduces risk, and reduces cost. If you find that some of the archived data should have been active, you can easily restore it to active status.

Have you been afraid to retire an application because you fear you may someday need the underlying data? Optim makes application retirement easier and safer by providing the capability to archive data from decommissioned applications while providing ongoing access to the data for query and reporting. You reduce risk and cut cost, without jeopardizing data retention compliance.

Data steward (or someone by any other name) benefits: greater consistency for reduced risk

The role of data stewardship is often a role in the line of business reporting directly to senior executives, but the implementation of data steward functions typically come down to a Security Administrator, Compliance Administrator, or Database Administrator.

Data governance has many facets: availability, security, privacy, quality, audit, and retention to name a few. These tasks are split across many roles with few offerings that really aggregate the compliance story. IBM has a portfolio of robust data governance offerings that span the facets mentioned above. Key portfolio goals here are:

Compliance-savvy tools

More than the brute force to implement compliance initiatives, the tools themselves should be providing intelligence regarding how best to comply with specific regulatory requirements. An example is Optim Data Privacy Solution that comes with prepackaged intelligent data masking routines to transform complex data elements such as credit card numbers, email addresses and national identifiers required to comply with HIPAA, GLBA, DDP, PIPEDA, Safe Harbour, PCI DSS and others.

A significant step in this direction was taken with IBM's acquisition of Guardium® in 2009. The Guardium Database Activity Monitoring offering monitors access to any data that is subject to regulatory compliance oversight, such as the fields that contain financial information subject to Sarbanes-Oxley (SOX) compliance or fields that contain credit card information of interest to Payment Card Industry (PCI) rules. Guardium can automatically generate reports customized to these regulatory compliance mandates, offering you insight into your compliance status at any time, and offering you quick visibility into any compliance exceptions that might have occurred.

Cross life-cycle consistency

You want the ability to define governance policies once and have them implemented across the portfolio stack where appropriate. The first step in this direction is the model-driven governance, mentioned above. With the data model as a key architectural hub, privacy and retention attributes should be able to be propagated to other model-based tools such as Optim Data Privacy Solution or Optim Data Growth Solution.

Protection from threats

The use of advanced access control techniques within databases, such as label-based access control, multilevel security, and trusted context, are fundamental to protecting the data from misuse within the database. However, there are more and more attacks on sensitive data from outside the database. These can be attacks from outsiders, internal privileged users, and inadvertent data loss.

Guardium solutions monitor and protect high-value database assets from internal and external threats. Guardium solutions use policy-based controls to quickly detect and report unauthorized or suspicious activity. The controls also provide standards-based vulnerability assessments that can help you identify potential weaknesses.

Guardium Real Time Database Monitoring Platform’s non-invasive technology works with a variety of database and application environments. And it complements and extends other offerings. Guardium Real Time Database Monitoring Platform does the following:

  • Extends Test Data Management solutions by monitoring sensitive data access in test environments
  • Extends Data Privacy and encryption protection solutions, enabling consistent governance and compliance with regulatory mandates such as PCI, HIPAA, DPP.
  • Offers greater levels of data protection through defense in depth.
  • Extends capabilities to automatically locate all databases, in both production and test environments, for monitoring and protection.

To ensure that sensitive data is protected, one best practice is to encrypt all sensitive data. IBM Database Encryption Expert and IBM Database Encryption for IMS and DB2 for z/OS provide robust and application-transparent encryption to ensure data is secure. The products enable compliance with many industry and government regulations governing the protection of sensitive data.

Coherent auditability

Gathering audit data is largely a manual process across most enterprises. DB2 and IDS databases offer comprehensive audit facilities to capture all the information your auditors might need to ensure compliance with business controls.

But auditors are interested in more than what can be provided by traditional logging solutions in the database. The Guardium solution for audit and compliance provides automated auditing that secures high-value databases and enables consistent execution of governance policies. The Guardium solution provides the granular detail that auditors require, without slowing down performance.

Plus, the solution supports separation of duties, because it operates independently of database-resident utilities that are managed by DBAs.

The Guardium solution automatically generates compliance reports on a scheduled basis and distributes them to stakeholders for electronic approval. These reports—including escalations and sign-off reports—enable organizations to demonstrate the existence of an oversight process.

Something for everyone, but more together

Whether a data architect, developer, tester, administrator, or steward, the Integrated Data Management portfolio by IBM has capabilities that that can help you be more effective and efficient. But more importantly, the portfolio and roadmap are delivering a collaborative environment that will deliver organization productivity and efficiency to make organization more responsive to opportunities, improve the quality of service, mitigate risk and reduce costs for diverse data, databases, and data-driven applications.

Downloadable resources

Related topics

Zone=Information Management
ArticleTitle=Integrated Data Management: Managing data across its lifecycle