IBM InfoSphere® DataStage® is an industry leader in ETL and provides a multi-cloud platform platform that integrates data across multiple enterprise systems. This scalable platform provides robust features and capabilities:
A high-performance parallel framework, available on premises or in the cloud
Provides an easy and fast deployment of integration run time on your chosen cloud environment
Extended metadata management and enterprise connectivity
Yields tremendous gains in productivity over coding by transparently handling endpoint individuality
Integration of heterogeneous data, including big data at rest (Hadoop-based) or big data in motion (stream-based), on both distributed and mainframe platforms
Support for IBM Db2® Z and Db2 for z/OS®
Application of workload and business rules
Provides a rapid development cycle, using design automation and prebuilt patterns
Real-time data integration and a platform that’s designed for easy use
Hands-on lab: Transforming your data with InfoSphere DataStage
Collect, integrate and transform large volumes of data, with data structures ranging from the simple to the complex.
Deploys on-premises or in the cloud
Fast and easy deployment of integration runtimes on-premises or across multiple clouds
Reduced cost and latencies due to data locality
Lowered costs by running workloads directly within the cloud, without moving data into and out of a platform.
Automated design tooling
Machine learning eliminates the need for users to focus on the surrounding infrastructure of the data so you can instead focus on designing the business logic.
Real-time data integration and synchronization
Populate a data lake in real-time with fully built-in change data capture technology running on containers.
Productivity gains and increased resiliency
Transparent handling of endpoint individuality for data from any source yield tremendous productivity gains versus hand-coding.
Key features of InfoSphere DataStage
Multicloud support with integrated data quality and governance
Provides quick and easy data integration with IBM Cloud Pak for Data for on-premise or multi cloud environments. Includes comprehensive data quality and governance capabilities which include data discovery, profiling, classification, validation and curation. Data quality is performed at the time when data is ingested.
Broad range of integration styles and transformation capabilities
Provides traditional data delivery styles (data replication, batch processing), or complex, data delivery styles (including data synchronization and stream data integration) using a rich set of prebuilt connectors. Also supports combinations of traditional and modern data integration styles, such as data replication, data virtualization and stream data integration for real-time analytics.
Place the integration logic and execution in close proximity to the location of your data sources, using microservices-based integration components with IBM DataStage for IBM Cloud Pak for Data — or push the logic directly into the data source.
User friendly design and development capabilities
Datastage Flow designer UI with infused Machine learning capabilities, built-in search and Automatic metadata propagation allows you to easily create, edit, load, and run DataStage jobs.
Built-in data replication capabilities, using change data capture technology, allow for low-impact capture and fast delivery of data changes for key information management initiatives, such as dynamic data warehousing, master data management, application consolidations or migrations, operational business intelligence (BI).
Provides scalability and high performance for fast access to trusted data. Use the massively parallel processing engine to run natively in Hadoop and access data where it resides.. Gain simplified access to HDFS files in various formats and character sets, including security features such as Kerberos and secure gateways.
IBM Cloud Pak™ for Data ready to support DataOps practices