Delivers advanced enterprise ETL

IBM InfoSphere® DataStage® is a leading ETL platform that integrates data across multiple enterprise systems. This scalable platform provides robust features and capabilities:

  • A high-performance parallel framework, available on-premises or in the cloud
  • Extended metadata management and enterprise connectivity
  • Integration of heterogeneous data, including big data at rest (Hadoop-based) or big data in motion (stream-based), on both distributed and mainframe platforms
  • Support for IBM Db2® Z and Db2 for z/OS®
  • Application of workload and business rules
  • Real-time data integration and a platform that’s designed for easy use

Hands-on lab: Transforming your data with InfoSphere DataStage

Hands-on lab: Transforming your data with InfoSphere DataStage Take the tutorial


Graphic suggesting deployment

Deploys on-premises or in the cloud

Rapidly provision new ETL environments on cloud or on-premises, as your project needs dictate.

Graphic suggesting connectivity

Delivers new connectivity support

Leverage new data sources more efficiently with HBase and Hive connectors along with Amazon and MongoDB support.

Graphic suggesting business rules

Enforces workload and business rules

Optimize hardware utilization and prioritize mission-critical tasks.

Graphic suggesting ETL efficiency

Boosts enterprise ETL efficiency

Improve speed, flexibility and effectiveness to build, deploy, update and manage your data integration infrastructure.

Graphic suggesting powerful ETL platform

Provides a powerful ETL platform

Collect, integrate and transform large volumes of data, with data structures ranging from the simple to the complex.

Key features of InfoSphere DataStage

Enhance your enterprise ETL

End-to-end ETL capabilities enable you to understand, cleanse, monitor, transform and deliver your data. Bridge the gap between business and IT. Ensure the data that drives your business and strategic initiatives is trusted, consistent, shareable and governed — from big data and analytics to master data management and data warehousing.

Integrate cloud applications

Provides quick and easy data integration for cloud environments. This IBM offering supports direct integration with Amazon Simple Storage System (S3) to load data from and into the cloud. Once data is integrated within S3, it can be integrated with other cloud database technologies. The solution also includes a hierarchical stage that supports interaction with REST application APIs, enabling support for XML and JavaScript Object Notation (JSON) messages.

Provide trusted ETL data anytime, anywhere

Integrate information quickly and efficiently. Apply sophisticated rules and use the open architecture to govern your ETL data. Empower users with trusted information available anytime and anywhere across the enterprise.

Use the power of Hadoop

Run connectivity, transformation and data delivery features natively in Hadoop. Gain simplified access to HDFS files in various formats and character sets, including security features such as Kerberos and secure gateways.

Solve complex big data challenges

Provides scalability and high performance for fast access to trusted data. Use the massively parallel processing engine to run natively in Hadoop and access data where it resides. Enable a rich set of integration and governance features. Improve data connectivity, and improve how your data is transformed, cleansed, enriched and delivered.

Other data integration products

InfoSphere DataStage on Cloud

Employ flexible data integration in hybrid cloud environments.

IBM InfoSphere Information Server for Data Integration

Extract and transform data in any style and load the data into any system.

IBM BigIntegrate

Integrate Hadoop big data more easily.

Next Steps

See how it works

Talk with a DataStage expert