to know how many ways are there to move data?
Curcio explains the different integration styles and how InfoSphere Information
Server’s support for different integration styles gives the IBM offering a distinctive
edge over competitor products.
Click here to read the blog.
IT systems can be broadly classified into OLTP (On-line Transaction Processing)
and OLAP (On-line Analytical Processing) environments. ETL tools are predominantly
used to move the data from OLTP to an OLAP system. Why do we need to move the
data between them?
Namit Kabra blogs about the finer details of these systems
with a good number of examples. Read the blog post to get an overview of
OLTP and OLAP and the key differences.
Intrigued by all the hype around Big Data? Well, Big Data is
for real. Just consider these two examples:
- Facebook handles more than 250 million photo
uploads and the interactions of 800 million active users with more than 900
million objects (pages, groups, etc.) – each day.
- More than 5 billion people are calling, texting,
tweeting and browsing on
mobile phones worldwide
Do you get an idea about the opportunities that Big Data ushers
Namit’s blog on Big Data where he introduces the concept with some great
Modified on by Praveen_Hosangadi
On November 2, 2013 IBM’s Information on Demand General Conference kicks off in Las Vegas, NV. Whether you are new to data integration and quality, or have been working with the software for years, there will be something there for you. For beginners, there are sessions highlighting the basic features and planned future enhancements including:
IIG 1073: Enterprise Information Integration in the Cloud Era
IIG 2170: What’s New in Data Integration – InfoSphere DataStage’s Latest Features
IIG 1432: Getting Started with InfoSphere Information Server for Data Quality
More experienced users might enjoy deeper dive content such as:
IIG 3442: InfoSphere DataStage Master Class – Dynamic ETL with end-to-end Data Lineage
IIG 1144: Information Virtualization in the Big Data Fabric
IIG 3246: Continuous Availability Using Replication and GDPS Active-Active Sites
In all, there are over 40 presentations, hands on labs, expert exchanges, and birds of a feather sessions touching on all aspects of IBM’s data integration and quality offerings. There is truly something for everyone! Visit the IOD site for more information
Modified on by NitaKhare
No reason can be justified for the delay of information delivery to the end user!
Speed is very critical in today’s ever growing competitive market. Any organization which cannot match its completion’s pace will run into risk of missing out opportunities and will struggle to survive and grow. This has resulted in need of data warehouses which are dynamic and real time / near real time in nature, hereby providing updates at the reporting front almost real time.
Read the article @
Data Quality Management (DQM) framework is one of the important factors in the overall Enterprise data management (EDM)!
Data cleaning and standardization process ensures systems deliver accurate, complete information to business users across the enterprise. A comprehensive DQ framework guarantees quality data to the business user.
DQ checks can be applied at multiple places in the Overall Architecture. This article will help you to tactically choose the correct component for applying DQ.
Read the complete article @
"Netezza is gaining popularity in Datawarhouse appliance and analytics arena.Besides Big data and analytics some of other most popular use cases are
Database migration,application migration to Netezza,building new datamarts in Netezza.
But Can one really get away from ETL tool,re-architetcure the application and recode it in Netezza and unix script which could result in better performance and can save on ETL cost?Would it be feasible approach?
Shed your inputs on the approach of converting the functionality of ETL to a Datawarehouse appliance.Does anyone has experience in eliminating entire ETL tool and building the application in netezza database which also includes database migration to Netezza.
Share the success strories,lessons learnt,challenges,approach used in such application migration.Success stories on pure database migration are also welcome."
Point Solutions or Customized solutions have been popular in most of the enterprises which
facilitate daily archival needs but how about using IBM Optim as Enterprise Archival Solution ?
When ETL-Batches and scripts are able to achieve the cause how would IBM Optim bring in value ?
Topic to kickoff Netezza architecture discusssion...
Just curious with the rising trend of Data virualization/federation area.
Is it the newer self-service BI tools trend that is tricking down to increase the demand in federation??
(This blog is posted on behalf of Leslie Wiggins, Product Marketing Manager at IBM. This blog was originally published on LinkedIn and is being re-posted here to ensure wider access.)
On Friday, February 21, 2014, IBM expanded its data integration and governance platform, IBM InfoSphere Information Server v9.1.2, to include a fully supported Hadoop distribution - IBM InfoSphere BigInsights. With this release, IBM helps clients make Big Data projects faster and easier, providing a cost-effective and scalable Big Data Analytics + Data Integration solution.
IBM is the first vendor to provide a market-leading data integration platform combined with the analytics power of a fully supported Hadoop distribution. This new packaging provides tremendous value to clients looking to get the most from big data. Jim Lee, Vice President of Integration and Governance at IBM, notes:
“With this release, IBM customers can reduce the time and cost to delivering business value as part of their modern data warehouse environments. In combining these technologies, organizations can easily offload their data integration staging area with a low cost alternative approach that also jumpstarts analytics discovery early in the information lifecycle.”
The combined capabilities in this release equip clients with a new way to approach data integration, speeding the creation of the Hadoop landing zone, which is a central location to collect and manage a wide variety data.
Using IBM capabilities, clients have the ability to shift ETL and data integration workloads into a Hadoop infrastructure that is capable of simultaneously handling analytics, as well as the staging, preparation and transformation of data while using the same simple design interface developers leverage for all other data integration patterns. InfoSphere Information Server manages the ingesting, mapping, metadata, quality, etc., while data scientists and business users can immediately begin exploring and analyzing the landing zone data through InfoSphere BigInsights’ intuitive BigSheets interface. Additionally, BigSheets allows users to visualize both traditional and new data types, such as JSON, that InfoSphere Information Server has seeded into Hadoop. Most importantly, data governance is built right in to the platform when using InfoSphere Information Server with InfoSphere BigInsights.
This new offering helps clients quickly and easily deploy a whole, massively scalable solution to power the next generation of analytics. For more information about how IBM supports Big Data Integration requirements check out our ebook, Integrating and Governing Big Data or for more information go to: http://www.ibm.com/software/data/integration/products.html
Zeroing in on the Right Analytics Products for your Enterprise Needs
By – Rupal Mudholkar / Nita Khare, Solution Architects, TCS
CAMS - Cloud, Advanced Analytics with Big Data, Mobility, Social Collaboration are the current Market Trends…
Clients are exploring the option of cloud based offerings to understand how it can minimize the IT cost and gain productivity especially for standard analytics like Campaign Management, regulatory analytics like Basel etc. Customers understand the importance of Advanced Analytics models for more accurate decision making and forecasting. They are also realizing the benefits of mobile channel for anywhere information access for better management productivity and ongoing collaboration.
In addition to these buzzwords, Self Service BI is the demand of the business users who want agility in decision making based on the information which is sanitized and gives them the flexibility of getting information at their disposal.
In this blog, let us explore how to “Right-Fit” IBM Technology Products for your Business Analytics needs.
Modified on by KWWX_Mary_Monahan
DataStage has had NLS (National Language Support) for a very long time. In the IBM InfoSphere Information Server versions, it is referred to as Globalization Support in the installer, but it is still called NLS in the DataStage clients. For the purpose of this discussion, I will call it NLS since our legacy users know that term and configuration after install uses it. When you are planning an upgrade to a new version of the software, you need to decide what NLS support you had and what you need for the new system.
The first thing you have to know is that you cannot change the NLS configuration of your engine after installation without doing a reinstall. Consequently, it is critical that you know how your current DataStage software is configured and use that information in conjunction with your processing requirements to ensure you configure NLS correctly for your needs.
The second thing you should know is that exporting from a system with NLS "on" to a system with NLS "off" is not supported. The DSX or isx will contain information that a system without NLS will not know how to handle. It might work. It may even work most of the time. But it is not a tested path and we have not put in the code to be sure it does work. Exporting from NLS "off" to NLS "on" is supported.
In general, IBM recommends enabling NLS. It allows you to handle data from and to different character sets. You need NLS enablement to be able to correctly parse multibyte characters. In the current global economy, systems that only have to deal with single byte data are becoming rare.
If you are primarily using the parallel engine, there is no performance impact for NLS enablement. However, if you are primarily using server jobs, going from NLS off to NLS on will impact performance and will usually require job changes. More information about the trade offs are below.
Determining if NLS is enabled on your source system
There are a number of ways to determine your NLS configuration. The one that works for all versions is to check your <installpath>/Server/DSEngine/uvconfig file. If NLSMODE is set to 1, NLS is enabled. While you are in the file, also note what your NLSOSMAP setting is.
If NLSMODE is in the file and set to 0, it could indicate that someone “turned off” NLS. Turning off NLS by setting NLSMODE to 0 does not work correctly and if you never noticed data corruption, you were lucky. Whether you choose to turn NLS on or off on your target system, you are going down an untested path.
If NLSMODE is not defined in the file, NLS is not enabled.
NLS is not enabled on your source system
If NLS support is not installed on your source system, extra consideration is needed depending on the type and composition of the existing jobs, and the likelihood of needing NLS features now or in the future.
If your operating system language is not English, installation will automatically enable NLS. If your operating system language is English, you will see a globalization panel with a check box titled “Install globalization support”. The box is checked by default. If you do not want NLS support you must uncheck it.
For existing jobs, the primary effect of enabling NLS support is on server jobs, and on parallel jobs which use server shared containers or the Basic transformer. The main areas of behavior which change in DSEngine with NLS enabled are:
Server jobs run internally in UTF8 rather than in host characterset bytes; this can have a performance impact due to conversion (if used) and due to higher overheads for internal manipulation.
Jobs which process non-ASCII characters may not work the same way. This includes extended ASCII, non-character bytestreams and EBCDIC characters.
Specifically, string handling functions with NLS enabled will interpret data as UTF8 characters of length one to three bytes. Jobs which assumed byte-based offsets with NLS not enabled will not necessarily work the same way with once enabled, unless they are processing just strict (7-bit) ASCII data. This applies to all character based functions, including string indexing constructs of the form string[start,count]
Locale behavior changes from the DSEngine defaults to locale-specific functions. This includes date and time representations and conversions (eg Iconv and Oconv), alphabetic sort orders, numeric representation, currency, and character classifications, including case handling.
NLS is enabled on your source system
If NLS support is enabled on your source system, you should also enable it on your destination system.
If you had NLS enabled on your source system, you should check the default character set defined by NLSOSDEF. If you are migrating from 8.0.1 or earlier, or if you are changing platforms, your new system may have a new default character set. If the default character set changes, you may need to change the default character set in the Project or jobs if you do not want the jobs' behavior to change. In addition, since sorts can be done at the OS level, a character set change could result in small order changes. If you decide you want to use the same character set as the source system, you need to set it in your operating system prior to installing. See https://www-01.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.productization.iisinfsv.install.doc/topics/wsisinst_upgrade_utf8.html
Modified on by Praveen_Hosangadi
Extracting data from one system and loading in to a local data mart / data store, especially when there are hundreds of tables will definitely take time to develop & execute. How do you cut down the development time and load data more quickly?
Read this blog from Lucid Technologies and Solutions to understand how to use the Runtime Column Propagation (RCP) feature in DataStage to quickly load hundreds of tables of data to your local data mart or data store: http://bit.ly/1sTDK8K
Modified on by Praveen_Hosangadi
Extend your definitions (terms) to be more dynamic or link to external systems instead of being static in nature. Read this blog from Lucid Technologies and Solutions to know how to define Custom Attributes for terms (or category), that Stewards often use to accommodate additional static attributes for a term - http://bit.ly/1pXdjiD
Which Architecture is best suited to help manage growing volumes of data, and enable self-service data access to everyone in the organization. Is a data lake sufficient for managing and analyzing disparate sources of data?
What are the capabilities needed to enable centralized pools of data to deliver value and provide self-service access to to data in a sustainable manner? What are the gaps? Join us in a twitter chat to get answers for these questions.
Special guests for the chat are R Ray Wang (@rwang0), Principal Analyst, Founder, and Chairman of Constellation Research, Inc.; David Corrigan (@dcorrigan), Director of Product Marketing for IBM InfoSphere; Paula Wiles Sigmon (@paulawilesigmon), Program Director of Product Marketing for IBM InfoSphere; and James Kobielus (@jameskobielus), IBM big data evangelist, speaker and writer. Twitter handle @IBM_InfoSphere will be moderating the chat.
You can join the discussion using the hashtag #makedatawork, on Wednesday, October 22, 1:00 p.m. - 2:00 p.m. ET.
Working on extending the Metadata workbench product for Basel Compliance for non- supported external products...
Interesting to see how extension mappings work ...
This white paper discusses the best practices that can help
• Teradata Workload Managers to effectively manage workload.
• Datastage Administrators to effectively configure the environment, thereby, reducing any costly configuration and impacted design changes at later stage.
• Datastage Developers to design jobs with logically correct set of parameters and configurations.
• Customers across the globe to add the same into their best practice repository so that the issues related to Workload Management be resolved before it becomes a problem and, thereby, reducing costly changes.
Read the complete White Paper @
Report migration is a common scenario where organizations are looking for more than more flexibility and functionality to meet today’s complete enterprise reporting requirements.
It is very critical for organizations to establish a concrete migration plan and strategy for migrating from one reporting platform to another effectively and smoothly.
Some of the migration options are mentioned below highlighting the key benefits and shortfalls of each approach.
Read the complete article at:
IBM's premier mobile and cloud conference is less than two weeks away. With 20,000+ attendees expected at the conference, ‘This is your opportunity to InterConnect’. With more than 1,500 sessions and 100s of hands-on labs, building your agenda for the conference is certainly not an easy task.
The biggest challenge facing developers of cloud and mobile applications is access to reliable data. It is difficult for developers to verify the underlying data as they create new cloud-based apps. Therefore, cloud based data access and refinement services, readily available through APIs, can streamline application development and improve user satisfaction. A cloud based data refinery provides the critical capabilities that help developers to build apps with easy access to relevant data, improve data governance and quality, and automate test data management with best-of-breed data services available through APIs.
If you are considering cloud and mobile programs for your organization, you must attend the sessions mentioned below. These sessions cover a range of topics including data refinement, cloud and hybrid integration. In addition, you’ll also get to know IBM's point of view on solutions and architectures for cloud and hybrid integration. Here’s the list:
Session# 2978 - Making Data Work: Providing Relevant Data to Cloud and Mobile Applications with IBM DataWorks
Cloud and mobile apps require data. But often the biggest challenge facing users of those apps is access to relevant data. In this session, you'll learn about IBM DataWorks and data refinement capabilities offered as data services like data provisioning, cleansing, shaping, security, and governance. You’ll also hear how others have leveraged a data refinery to unlock the value.
Session# 2184 - Cloud Data Integration 101
Attend this session for an overview and introduction of current and future of cloud data integration, data quality and governance services. In addition, learn about the capabilities of IBM DataWorks for data refinement on the cloud.
Session# 5844 - Cloud Integration Technical Deep Dive
In this session, Steve Cerveny and Tony Curcio of IBM, will give an in-depth look at the demands and use cases for cloud integration platforms, and how IBM technology enables you to make the most of your hybrid applications.
Session# 5962 - Bringing Application and Data Integration Together
Application integration and data integration have existed within in their own silos in the past. The shift to cloud is challenging that traditional separation. Attend this session to know what IBM is doing to bring together its integration capabilities across these areas for consumption in cloud scenarios.
Session# 5845 - A Business View on Cloud and Hybrid Integration
The speakers at this session will give you an overview of the market drivers that are making cloud integration, one of the critical domains in enterprises today. You’ll also get to know IBM's point of view on solutions and architectures for cloud and hybrid integration.
Developers should also visit the Big Data and Analytics Booth #104 at the Solutions EXPO and get a free demo of IBM DataWorks.
You can register for IBM InterConnect here, if you haven’t already done so. Use this agenda builder on IBM InterConnect website to mark the above sessions and get ready for the event.
Have a great time at IBM InterConnect 2015!
InfoSphere Quality Stage is a powerful tool to find duplicates in the data. But often the users lack the following information:
1. Steps involved to find the duplicates
2. How to fine tune the engine to get best match or in other words the best practices
3. Template that can be used as a starting point to find duplicates in the data.
The following developer works article on Identifying duplicates in your data using InfoSphere QualityStage bridges these gaps. It gives the overview of matching process, provides the industry best practices and provides templates for four countries to jumpstart creation of the jobs to find duplicates in the data. Hope this would be helpful.
Modified on by KunjavihariK
New enhancements for InfoSphere Information Server 11.5
On May 3rd, we released our second data governance roll-up, containing enhancements and new functionality for areas of metadata, data quality, governance, and stewardship for Information Server 11.5.
We’re excited for you to start benefitting from all of the new features we introduced in roll-up 2 (RUP2). Here are a few of our favorite improvements.
InfoSphere Information Analyzer thin client
View, edit, and understand analysis results
A lightweight, browser-based companion to the Information Analyzer workbench, the thin client makes it easy for data analysts to view and edit analysis results for data sets. Use the new data quality score, only available in the thin client, to get quick feedback on a variety of common data quality dimensions, like missing values, duplicate values, or format violations.
Monitor data rules
Even with a robust data quality score, you’ll want to apply data rules to your data sets, so we also allow you to view data rule results at both a workspace and column level.
Onboard and analyze flat files
In addition to support for HDFS files, we’ve also included support for onboarding and analyzing delimited flat files from LUW systems. The thin client simplifies the handling of flat file delimiters and character encoding when onboarding flat files, which saves time and frustration.
InfoSphere Information Governance Catalog
Data lineage reports for compliance
Compliance reporting can be a huge time sink for even the most organized governance teams. In RUP2, we introduced incremental improvements to our new data lineage reports to ease some of that pain. You can design lineage reports templates to focus on the critical data attributes and facets that matter to your organization. You can also customize the information contained in your report for an external audience.
Export information asset values
To further simplify your governance work, we added the ability to modify information asset values with a simple export and import. Values associated with your Information Governance Catalog information assets include things like terms, steward assignments, and custom attributes. Export those values, make the necessary changes, and import the new values to update catalog.
InfoSphere Metadata Asset Manager
We’ve introduced the ability to filter schemas using regular expressions during metadata import, so you can instruct the system to find only the selected schemas that you’re interested in seeing. For systems with a huge amount of schemas in a given database, this saves a lot of time and makes your imports efficient.
Find out more about all of the new features we’ve included in RUP2 here, or head here to download the patch.
We are delighted to offer a tech talk on Information Analyzer thin client.
Offering Management and Product Development team will be on hand to respond to questions during the tech talk
InfoSphere Information Analyzer thin client - Tech Talk
Event Date: June 23, 2016
Event Time: 11:00 AM - 12:00 (Eastern Time)
Presented By: Dan Schallenkamp (IBM)
IBM developerWorks Tech Talks
Event is 'Free' however you must register at: https://www.eventbrite.com/e/ibm-tech-talk-is-information-analyzer-thin-client-tickets-25840004148?aff=estw
Password is: Governance
This presentation will provide a comprehensive overview of the Information Analyzer thin client, a new user interface designed with the business user and analyst in mind. We will begin the session by reviewing key feature of the thin client with focus on what is new compared to 11.5 GA. Then we will discuss details around installing and configuring Information Analyzer, how to on-board data source metadata, analyze data and work with the analysis results. Along the way we will compare and contrast differences between the IA thin client and the IA Workbench.
Who should attend this session? - For all skill levels of current and prospective Information Analyzer users from both IT and line of business.
This topic will be presented by Dan Schallenkamp Offering Manager for Data Quality products within Information Server. Dan has extensive experience in data quality and works closely with IBM customers to help them implement their data quality initiatives. Dan is located in Charlotte, NC.
New enhancements for InfoSphere Information Server 11.5
Every eight to ten weeks, we’re releasing new improvements and features to make data quality and governance simpler. On June 30th, we released our third governance roll-up patch (RUP3) for InfoSphere Information Server, focused on providing more flexibility when working with your data. Here are just a few of the highlights.
InfoSphere Information Analyzer thin client
The thin client is a lightweight, browser-based companion to the Information Analyzer workbench that allows data analysts to view and edit analysis results for data sets.
With the first iteration of the thin client, we introduced a comprehensive data quality score that provides quick feedback on a variety of common data quality dimensions like missing values, duplicate values, or format violations. We identify many of these common quality dimensions for you out of the box.
With RUP3, we introduced the quality rule, which allows data quality teams to customize and identify new quality dimensions in data sets. Using pre-existing rule definitions, you can create quality rules and apply them across multiple columns at once. There is no need to name and define output for each individual rule you create, which makes analysis simpler and faster. The results of a quality rule execution are rolled into the quality score calculation and displayed with the data quality results in the thin client interface.
In RUP3, quality rule creation is restricted to single variable rule definitions. We’re aiming to expand to multi-variable rule definitions in the near future.
InfoSphere Information Governance Catalog
A key feature of the governance catalog is custom attributes. In RUP3, we introduced new type of custom attributes to allow customizing relationships. You can also use these new custom attributes to define relationships between assets. You can learn more about custom relationships in Information Governance Catalog here.
Query Development Glossary
Customers rely upon the query framework for advanced search capabilities. With Rollup 3, a customer can run queries against development glossary. This provides ability to query for terms that are in draft or pending approval state
InfoSphere Metadata Asset Manager
With Governance Rollup 3, we are refreshing a number of bridges. All non-IBM import bridges are updated. We introduce two significant updates including support for importing metadata from QlickView and Powerplay transformer cubes based on csv files, Cognos Framework Model or reports using the Cognos content manager bridge.
InfoSphere Metadata Asset Manager
We have integrated with Apache KAFKA ® to provide an open event management framework, opening up opportunities to integrate information server events into your workflows and portal integrations
Find out more about all of the new features we’ve included in RUP3 here, or head here to download the patch. And don’t forget to follow us on Twitter to stay on top of the latest content @IBMIIG.