Deliver high-quality, real-time data for improved insights

IBM InfoSphere® Information Server Enterprise Edition is an industry-leading, end-to-end data platform that provides a complete suite of capabilities. These capabilities include automated data discovery, policy-driven governance, self-service data preparation, data quality assessment and cleansing for data in flight and at rest, and advanced dynamic or batch data transformation and movement. It helps you deliver trusted business-ready data to your key business initiatives such as big data, data lakes, data warehouse modernization and master data management — either on premises, private cloud, public cloud or hyperconverged systems like IBM Cloud Pak® for Data.

Drive innovation with improved trust

Set up cloud environments quickly for ad hoc development, testing and productivity for your IT and business users.

Business glossary capabilities

Reduce the risks and costs of maintaining your data lake by implementing comprehensive data governance, including end-to-end data lineage, for business users.

Deployment and runtime flexibility

Build your DataStage® ETL jobs once and deploy your runtime anywhere: on premises, public cloud, private cloud or AI-ready platforms such as IBM Cloud Pak for Data using containers.

Modernize and consolidate your systems

Improve cost savings by delivering clean, consistent and timely information for your data lakes, data warehouses or big data projects, while consolidating applications and retiring outdated databases.

Machine learning based in-line quality

Eliminate garbage in, garbage out reporting and analytics by implementing comprehensive and scalable data quality processing.

Fast time to value as a fully managed service

Get started on the IBM Cloud® and realize faster time-to-value by significantly reducing administration and management burdens.

Key features

Intuitive browser-based user interface (UI)

The IBM DataStage Flow Designer features automatic schema propagation to speed up the job generation, type ahead search, backwards capability and allows you to design once and execute anywhere. Create data integration flows, enforce data governance and quality rules with a cognitive design that recognizes and suggests usage patterns.

Classify unstructured data sources

Classify email messages, word processing documents, audio or video files, collaboration software, or instant messages by integrating IBM Watson® Knowledge Catalog with IBM StoredIQ®.

Supports a wide range of connectors

Supports a wide range of out-of-the-box native connectors such as Google Cloud Storage, Azure, Cassandra, HBase, Hive, Kafka, Amazon S3, Cloudera and more.

Integration with Watson Knowledge Catalog

Use Watson Knowledge Catalog to let business users exploit data assets in a governed and secure way.

Supports data integration across multicloud environments

Collect, transform and distribute large volumes of data with built-in transformation functions in DataStage that reduce development time, improve scalability and provide for flexible design. Deliver data in real-time to your business applications through batch-based data delivery styles.

Business glossary and lineage for data governance

Improve visibility and information governance by enabling complete, authoritative views of information with proof of lineage and quality. Views can be made widely available and reusable as shared services, while rules are maintained centrally.

Assess, analyze and monitor data quality

Load cleansed information into analytical views to enable you to monitor and maintain data quality with IBM QualityStage®. Reuse these views across the enterprise to establish data quality metrics that align with business objectives, allowing your organization to quickly uncover and fix data quality issues.

Explore relationships between business assets

Find assets in your enterprise, explore their relationships and collaborate using enterprise search and Watson Knowledge Catalog.

Integrate with Hadoop

Enables data integration, data cleansing and data profiling and analysis workloads to run on the data nodes of a Hadoop cluster, where your big data is stored, to minimize data movement.