Share this post:
To set the context, let’s examine the evolution and challenges of big data platform
Over the past few years, the interest and excitement around Big Data has been on the rise. Businesses have already begun their Big Data journey with pilots and full blown implementations. However, the fact is that the Big Data opportunities are currently underutilized and most businesses lack the skill-set, tools and capabilities to effectively access, process, and analyze the data available to them in a timely manner.
Mining petabytes of data for value is essential for businesses today. This calls for a scalable storage and distributed approach. Most of the big data systems (e.g. Hadoop) available today have batch oriented nature for querying data, which is time consuming and requires a significantly different skill-set (scala, R, python etc..). Hadoop vendors find it challenging to provide analysis tools that resonate well with all the personas in the business user community.
Customers storing cold data or unstructured data in Hadoop may ask, “How can I integrate cold data with my current hot data to mine for insights without incurring significant development costs?”.
Further questions that are in the minds of customers include:
“Can I use my SQL developers to work on Hadoop data?”
“Can my data scientists access Hadoop data without needing to worry about data integration issues?”
“Can I run queries on Hadoop data without waiting hours for the result?”
Clearly there is a need for a platform that can remove data latency and quickly analyze more types of data and vast volumes of data in a single user friendly environment. There is also a need for a tool that provides a unifying layer to work with big data. This will enable users to seamlessly integrate different temperatures of data and give all business personas access to big data without requiring a significant skill-set upgrade.
The next step in this evolution? SAP HANA Vora – Agile Platform for Big Data Analytics
HANA Vora is an in-memory query engine which leverages an extended Spark execution framework and provides OLAP capabilities on top of Hadoop data.
To put it simply, Vora is technically a plugin for Apache Spark. Similar to the HANA concept where the tables are loaded into memory, Vora produces accelerated results by processing Hadoop data in memory. Vora is projected as an OLAP tool for distributed data frameworks.
Technical Features of Hana Vora:
- Enables OLAP Modeling on Hadoop & boosts SQL performance – Compared to 3rd party tools that exist today for building queries on Hadoop data, Vora provides a simple graphical interface to model data and build star schemas. Businesses can leverage their existing developers to build models in Vora.
- Builds Hierarchies & Enables Drill down on Hadoop Data – Vora provides the capability to build hierarchies and drill down on Hadoop data, which is very difficult to realize with the current tools available in the market today.
- Use Case 1: Hadoop for social media and email for fraud detection – Using Vora, financial institutions can build fraud detection models to integrate social media data and email data stored in Hadoop with transaction data stored in SAP ERP systems. Traditionally this type of integration takes months to complete with significant costs, but with Vora, the integration is quick and seamless; data scientists can directly build off the models built on Vora.
- Use Case 2: Predictive maintenance for automobiles using sensor data in Hadoop – Companies can track automotive performance though the continuous monitoring of sensor data. With Vora, companies can integrate real-time streaming data from devices with customer master and transaction data stored in HANA/ERP to help improve vehicular safety. The ability to infuse enterprise data with up-to-the-moment data from external sensors allows business to make contextually savvy decisions and improve processes.
- Use Case 3: Track lost airline baggage using RFID – With RFID tracking enabled for all airline baggage, the data stored in Hadoop can be queried within minutes using Vora’s boosted SQL performance. This helps airlines improve the lost baggage metric and reduce costs. Vora developers can quickly build queries on Hadoop tables, which enables businesses to keep pace with up-to-date operational scenarios.
How can customers benefit from Vora?
According to recent trends, most businesses are moving towards a tiered data architecture (Hot + Cold). Hot data is stored in in-memory databases such as HANA and Netezza, while cold data is stored on Hadoop (Apache Hadoop, Cloudera, Hortonworks etc.). Vora addresses the critical issue of the bidirectional combination of hot and cold data in a meaningful, coherent way without a significant investment from a monetary, time and infrastructure standpoint.
Scenario 1: Consuming Hadoop data in SAP HANA
Vora’s OLAP tools help developers build virtual data models on Hadoop, which can be exposed to HANA via Smart Data Access (SDA) using Vora-Spark controller. Instead of exposing Hadoop tables as-is, Vora gives you the ability to model data on Hadoop, which can in turn be consumed in HANA. It also facilitates HANA connectivity to Hadoop clusters without data replication. Vora ODBC connection, which will be released shortly, will significantly improve performance shortfalls with SDA.
Scenario 2: Consuming HANA data in Hadoop
With Vora, it is now possible to consume HANA data in the Vora OLAP framework and combine it with Hadoop data. This is ideal for data scientists and business analysts mining for data to identify patterns and associations without duplicating the data copies. Leveraging Vora’s in-memory capacity to process data reduces the need to expand the HANA footprint.
Scenario 3: Consuming Hadoop data via SAP Predictive Analytics to build machine learning models
With Vora Connectivity enabled in SAP Predictive Analytics 3.1, data scientists and business analysts can fully integrate Hadoop data. This provides the business a complete scalable environment to work with predictive and machine learning algorithms. SAP’s predictive factory can also be leveraged to maintain and auto re-train predictive models to preserve the model’s life cycle. Vora also enables faster performance by allowing the delegation of predictive algorithms to spark in-memory framework.
Scenario 4: Consume SAP ERP and SAP Business Warehouse(BW) data virtually in Hadoop
Vora can virtually consume data in SAP ERP tables and SAP BW tables via Spark Datastore API (application programming interface). One of many use cases for this is building Vora models to combine virtual data from ERP/BW tables and Hadoop tables to mine for insights. This eliminates the tedious integration and data duplication process and enables analysts to quickly build models.
Recommendation for Customers:
- Customers already embarking on their big data journey with Hadoop: Vora is an ideal OLAP tool as it provides a similar user interface as Hadoop and offers SQL capabilities to slice and dice though big data.
- Customers already on SAP Hana & Hadoop: Vora provides seamless integration between enterprise data (hot) in HANA and social data, IoT, streaming, and clickstream (cold) data in Hadoop.
- Customers planning a tiered temperature-based data storage: Vora offers an accelerated querying capability on Big Data to uncover deep data insights
- Customers using SAP Predictive Analytics: Vora delivers seamless connectivity to Hadoop data and provides the power of SAP’s predictive libraries and capabilities.
Do you have any thoughts about SAP VORA? Ask it in comments section or email us.
Learn more about the IBM and SAP Alliance.
Collaborators: Satish Hiranandani, Sohil Shah, Prakash Nagarajan, Ramesh Pal , Pardha Mohandas, Sainath Kumar, Karl Johnson, Dan Spaulding