Deliver high-quality, real-time data for improved insights
IBM InfoSphere® Information Server Enterprise Edition is an industry-leading, end-to-end data platform that provides a complete suite of capabilities. These capabilities include automated data discovery, policy-driven governance, self-service data preparation, data quality assessment and cleansing for data in flight and at rest, and advanced dynamic or batch data transformation and movement. It helps you deliver trusted business-ready data to your key business initiatives such as big data, data lakes, data warehouse modernization and master data management — either on premises, private cloud, public cloud or hyperconverged systems like IBM Cloud Pak® for Data.
Drive innovation with improved trust
Set up cloud environments quickly for ad hoc development, testing and productivity for your IT and business users.
Business glossary capabilities
Reduce the risks and costs of maintaining your data lake by implementing comprehensive data governance, including end-to-end data lineage, for business users.
Deployment and runtime flexibility
Build your DataStage® ETL jobs once and deploy your runtime anywhere: on premises, public cloud, private cloud or AI-ready platforms such as IBM Cloud Pak for Data using containers.
Modernize and consolidate your systems
Improve cost savings by delivering clean, consistent and timely information for your data lakes, data warehouses or big data projects, while consolidating applications and retiring outdated databases.
Machine learning based in-line quality
Eliminate garbage in, garbage out reporting and analytics by implementing comprehensive and scalable data quality processing.
Fast time to value as a fully managed service
Get started on the IBM Cloud® and realize faster time-to-value by significantly reducing administration and management burdens.
Key features
Intuitive browser-based user interface (UI)
The IBM DataStage Flow Designer features automatic schema propagation to speed up the job generation, type ahead search, backwards capability and allows you to design once and execute anywhere. Create data integration flows, enforce data governance and quality rules with a cognitive design that recognizes and suggests usage patterns.
Classify unstructured data sources
Classify email messages, word processing documents, audio or video files, collaboration software, or instant messages by integrating IBM Watson® Knowledge Catalog with IBM StoredIQ®.
Supports a wide range of connectors
Supports a wide range of out-of-the-box native connectors such as Google Cloud Storage, Azure, Cassandra, HBase, Hive, Kafka, Amazon S3, Cloudera and more.
Integration with Watson Knowledge Catalog
Use Watson Knowledge Catalog to let business users exploit data assets in a governed and secure way.
Supports data integration across multicloud environments
Collect, transform and distribute large volumes of data with built-in transformation functions in DataStage that reduce development time, improve scalability and provide for flexible design. Deliver data in real-time to your business applications through batch-based data delivery styles.
Business glossary and lineage for data governance
Improve visibility and information governance by enabling complete, authoritative views of information with proof of lineage and quality. Views can be made widely available and reusable as shared services, while rules are maintained centrally.
Assess, analyze and monitor data quality
Load cleansed information into analytical views to enable you to monitor and maintain data quality with IBM QualityStage®. Reuse these views across the enterprise to establish data quality metrics that align with business objectives, allowing your organization to quickly uncover and fix data quality issues.
Explore relationships between business assets
Find assets in your enterprise, explore their relationships and collaborate using enterprise search and Watson Knowledge Catalog.
Integrate with Hadoop
Enables data integration, data cleansing and data profiling and analysis workloads to run on the data nodes of a Hadoop cluster, where your big data is stored, to minimize data movement.