5 Things to Know about IBM DataWorks
vasfi 1200008QSY Visits (9251)
Data is continuing to grow at an overwhelming fast rate, by 2017 data is projected to grow by 800%. Data seems to be everywhere except where it is needed and when it is needed. Accessing the right data is not only an IT problem. Consumers are becoming more independent and want to be empowered all the time with the data they need and when they need it.
Today, there is a bottleneck and a delay in getting to information. For example, if a business analyst wants to create a business report, analysts needs to draw up requirements, go to IT team, and make their request. Then, they must wait until IT team can deliver the right reports and data. Analyst must perform their analysis and might need additional details and has to go through the process again, they may have questions if the data is current and there is always a dilemma on the analysis done with stale data. Then there are application developers who develop and support business applications and don’t have access to data to improve and support these business application.
IT Teams act as data waiters and serving data to various teams asking for data, getting overwhelmed with these activities and having limited time to perform IT activities adding real value to the business. It’s a pervading problem and causing a lot of frustration on all the sides of business.
IBM DataWorks tries to provide solution to some of these challenges. It provides fast, self-service access to relevant, easily consumable data. IBM DataWorks is a collection of service that can be embedded inside business application, allowing business users to access the best and most relevant data. IBM DataWorks offer capabilities to refine data on the cloud, utilizing services for loading data, cleansing, profiling, classifying, matching and masking.Five things which you should know about IBM DataWorks.
1. Load data – IBM DataWorks provide API and user interface to easily move data between cloud data stores, such as SQL Database, Object Storage, and IBM Analytics for Hadoop. IBM DataWorks supports following source and target copy and load data.
2. Provision masked data – IBM Dataworks provide data load API and user interface to mask sensitive data at the source while it is moved. This allows blocking the movement of sensitive data from the source to the target. For e.g.A retailer does not wants its customer and transaction information to be visible to anyone accessing the source data.
3. Securely load on-premise data to the cloud – IBM DataWorks accesses on-premise or cloud data using a Secure Gateway that protects the enterprise from security intrusions. The IBM Secure Gateway service in Bluemix provides a secure way to access on-premises or cloud data from the application running in Bluemix over a secure passage that is the gateway.
4. Cleanse addresses – IBM Dataworks helps in validating and improving the accuracy of the location data by standardizing USA addresses. Use this API to enrich partial addresses, such as when ZIP codes or state abbreviations are missing.
5. Profile and classify data – Gain new insights about the data in your application. Use this API to gather information about the data, such as column value distributions or data types. Or, identify higher value data attributes --for example e-mail addresses, Social Security Numbers, National IDs or credit card numbers-- so that your application can take action, such as masking sensitive data for HIPAA compliance. Additionally, you may classify each field in a data domain to identify fields that contain sensitive data or fields to use for statistical or predictive analysis.
The era of self-service data carries the promise of big benefits. For example, without data access and refinement services, business analysts spend too much time searching for the right data, and then validating and matching it—doing everything except performing the high-value analytics they need. Data access and refinement services help reduce the need to search for the right information, leaving more time for analysts to actually analyze it.
Business users also get broader access to more data than before, providing a sound basis for analysis. They can gain easy access to a wide range of information, no matter where it is located: in files, in cloud applications such as Salesforce.com or in social media sources such as Twitter. Instead of basing their analysis on data that is outdated, inaccurate, incomplete or questionable in source and lineage, business analysts can now be assured that their conclusions draw on data that is timely, accurate, consistent, complete and well understood.
If you wish to discuss this blog post further, connect with Rahul Gupta on his twitter handle @rahulguptaibm.
See the Hybrid Cloud Data and API Integration: Integrate Your Enterprise and Cloud with Bluemix Integration Services IBM Redbooks here: htt
Also, you can also refer to the following link: http
Rahul Gupta is an Advisory IT Architect with IBM Global Technology Services® (GTS) in the US. He is a Certified Service-Oriented Architecture (SOA) Architect with ten years of professional experience in IBM messaging technologies. At his current assignment, he works as a middleware architect for various clients in North America. His core experiences are in lab testing, performance tuning, and Level 3 development for IBM Integration Bus. Rahul has been a technical speaker for messaging-related topics at various WebSphere conferences. He is a recognized inventor by the IBM innovation community.