Analytics is increasingly an integral part of day-to-day operations at today's leading businesses, and much of this analytics makes extensive use of data sources on IBM z Systems. If most of the data that will be used for Apache Spark analytics, or the most sensitive or quickly changing data is originating on z/OS, then an Apache Spark z/OS based environment will be the optimal choice for performance, security, and governance.
Here are five things to know about Apache Spark for the enterprise:
1. Apache Spark is an analytics framework and platform that IBM supports in a big way
Apache Spark is an open source, in-memory analytics computing framework that provides libraries for commonly used analytic methodologies for data access, manipulation and application of various algorithms. In June 2015, IBM announced a major commitment to Apache Spark, including plans to put more than 3,500 IBM developers and researchers to work on Spark related projects worldwide.
2. Putting your analytics platform on the same platform as the corporate data that feeds it makes sense
An estimated 80% of corporate data is stored or originates on the mainframe. By putting the Apache Spark environment on IBM z/OS and Linux on IBM z Systems allows this analytics framework to run on the same enterprise platform as the originating sources of data and transactions that feed it. Additionally by running Apache Spark on z Systems, organizations can apply the same qualities of service to their business-critical analytics as they do to their transactional systems.
3. Apache Spark is at the core of the enterprise analytics platform
Apache Spark constitutes an integral component of the IBM Open Platform with Apache Hadoop and IBM InfoSphere BigInsights for Apache Hadoop for Linux on z. Across the IBM Analytics Platform product and tools portfolio, almost every major software offering has integrations with Hadoop and Spark.
4. Benefit from federated analytics across your enterprise
One of the main advantages of Apache Spark lies in its ability to perform federated analytics over a heterogeneous source data landscape. For example, the IMS DataFrame facilitates access to IMS data through a standard JDBC connection. Transactions running in CICS Transaction Server for z/OS can request analytic results from Apache Spark while processing requests. Other sources including DB2, the DB2 Analytics Accelerator, and non-mainframe data sources can also be accessed and analyzed.
5. Apache Spark brings native analytics to IBM z Systems to produce real-time business insight
Apache Spark running on IBM z Systems can use both structured and unstructured data in-place as part of Spark analytics processing. In one demonstration developed by IBM, Spark is running natively on z/OS and assessing various data sources without moving any of the data. For DB2 z/OS and IMS, Spark access was through JDBC drivers. For VSAM, the demo leveraged the optimized Rocket Mainframe Data Service for Apache Spark z/OS. For Cloudant, Spark accessed the data via REST APIs.
Find out more in the IBM Redbooks publication Apache Spark for the Enterprise: Setting the Business Free.