This post is contributed by Mark Simmonds, Senior Product Marketing Manager, IBM System z software.
Here are 5 things to know about Infosphere System z Connector for HaDoop. For much more, read the IBM Redbooks Solution Guide: Simplifying Mainframe Data Access with IBM InfoSphere System z Connector for Hadoop
1. You got to Move it Move it - no programming
IBM InfoSphere System z Connector for Hadoop enables you to move z/OS data sources such as DB2 for zOS, IMS, VSAM, QSAM, SMF, RMF logs and more into the InfoSphere BigInsights for Linux on z Systems or off platform and in fact man other Apache Hadoop implementations. No programming is required : Point at the data source your want to copy, then click on the Hadoop cluster you want it copied to - and bingo - it's done - no data staging. Conversion from EBCDIC to ASCII is handled automatically. Source data is reformatted as it is being copied into an HDFS format.
Did I say no programming is required ?
2. Ahhh.....But I only one that bit of data
When transferring data from the mainframe, you are often interested only in a subset of the rows or columns from a given data source. Rather than transfer the whole data set and filter it on the receiving Hadoop cluster, the System z Connector allows filtering of data dynamically. With IBM InfoSphere System z Connector for Hadoop data transfers can be configured to select individual data columns to be transferred or to filter rows based on certain criteria. This improves flexibility by ensuring that you are transferring only the required data.
3. You get to Choose it Choose it
IBM InfoSphere System z Connector for Hadoop runs on Linux hosts (on or off the z Systems environment). Supported Hadoop environments that can be targets for the System z Connector for Hadoop include the following list :
IBM InfoSphere BigInsights for Linux on z Systems (version 2.1.2)
IBM InfoSphere BigInsights on Intel Distributed Clusters
IBM InfoSphere BigInsights on Power Systems
IBM BigInsights on Cloud
On-premises Cloudera CDH clusters
On-premises Hortonworks HDP clusters
Apache Hadoop for Linux on z Systems
Veristorm zDoop (open source Hadoop offering for z Systems Linux)
Veristorm Data Hub for Power Systems (Hadoop distribution for Power Systems)
4. Data when you want it - where you want it.
IBM InfoSphere System z Connector for Hadoop allows the analytics teams or data scientist to move data on an adhoc basis or schedule data sources to be copied in to the Hadoop clusters on certain days wither one-off or to set up calendar repeats. So Data transfer can be interactive or scheduled for automated transfer. For example, mainframe logs or customer transaction data could be moved daily to the Hadoop cluster for downstream processing. By using automatic transfers, workloads are reduced, as is the opportunity for human error that often occurs with manual processes. Furthermore, the GUI may be used to define and test a particular type of data transfer. After it is defined, transfers can be invoked outside of the GUI. Transfers can be scripted or can run under the control of a mainframe job scheduling system. This is an important capability for sites with complex requirements that might need to transfer thousands of files daily. The capabilities to configure data sources and targets and to specify how the data is configured help make it easy to transfer data from various mainframe sources to multiple destinations.
5. Super fast, super secure, no MIPS.
IBM InfoSphere System z Connector for Hadoop enables teams to move large quantities of data very efficiently using IBM HiperSockets™ interface (in memory data transfer) or high speed networks. It integrates with the IBM Resource Access Control Facility (RACF®) to ensure the necessary controls to data access and encryption of in-flight data are in place. And finally no MIPS were hurt in moving of this data because The System z Connector reads directly from the binary DB2 data source and streams the binary data from the mainframe to a Linux node where the binary data stream is converted in memory. This approach consumes minimal z/OS MIPS and no mainframe DASD.