May 22, 2018 | Written by: Preetam Kumar
Categorized: Data Analytics | Data Science
Share this post:
The world doesn’t stop, which also means that data never stops pouring in. If you’re in the analytics game, then basing your efforts on a snapshot of historical data always involves a degree of compromise. Did you choose the right data set? One that is an accurate representation of ongoing operations so that it doesn’t skew your analysis? How soon will your insights be outdated? How can you cost-effectively store the data you’re analyzing?
Questions like these led to the birth of Streaming Analytics, in which large volumes of data are captured, processed, ingested and analyzed at high velocity. By analyzing data in motion, companies can seize that all-important competitive advantage, rather than wasting time and resources on misleading analytics activities. Consequently, there’s currently a lot of buzz around stream processing, with a sense that it is time the technology went truly mainstream.
Streaming data calls for non-traditional analytics
Similar to the way data has changed our daily lives, streaming data has changed analytics in several different ways. For one thing, streaming has introduced a lot of new sources to the enterprise, mostly external, including data flowing from sensors, RFID tags, smart meters, live social media, mobile devices and other internet-connected objects. Another important change, quite different from traditional analytics, is that streaming data often has to be analyzed before it’s stored – and in some cases, it’s never stored.
In streaming analytics, more often it’s the data models and analytical algorithms that are stored, and streaming data is continuously queried as it passes through. This is necessary because a key challenge of working with streaming data is acting on its potential insights quickly, before the generated or transmitted real time data loses its value. Streaming analytics attempts to determine the data’s meaning and value, pinpoint relevance and generate instant alerts when there’s an urgency to take action. This analysis also enables enterprises to decide when streaming data should be stored, and therefore subjected to additional management and governance. And the insights gleaned from streaming data analytics may also be discovered to have value as a complement or supplement for other enterprise applications.
Simpler stream processing
As the demand for such solutions has increased, so has the supply: there are more than a dozen open source projects that offer some kind of stream processing service, as well as numerous commercial solutions. But in most cases, setting these solutions up and running them on-premise is a complex task—and experts with the skills to maintain them are in short supply.
IBM Streaming Analytics aims to solve these challenges by providing an easy-to-use cloud-based stream processing service. The solution can be simply connected between the streaming data source and your target systems, and configured to perform any analytics or data processing operations you need while the data is in flight.
Get started with IBM Streaming Analytics
A more flexible architecture
The latest version of Streaming Analytics service on IBM Cloud has been rebuilt to run in Docker containers, instead of bare virtual machines, and uses Kubernetes for container orchestration.
The result is greater availability, because Kubernetes is able to adjust the environment in real time to maintain the desired service levels—for example, spinning up new containers to keep your streams online even when your current instance of the Streaming Analytics service needs to be taken offline for patching or upgrades.
The new architecture also enables a more dynamic approach to resource allocation—you can simply specify the maximum number of nodes that you want your environment to use, and the service will automatically scale up and down within that threshold. This helps to ensure that you only use—and pay for—the resources you actually need at any point in time, while still helping to maintain performance during sudden periods of peak load.
In a stream processing application, as in any complex distributed system, occasional failures are inevitable. Errors must be handled gracefully, and each component of the stream (known as an “operator”) must be able to recover if it gets into an inconsistent state.
Checkpointing is another feature in the new beta of Streaming Analytics, that helps to improve fault tolerance when such errors occur. It works by periodically saving the state of each operator as a checkpoint. Each checkpoint contains a delta of all the state changes that have occurred since the previous checkpoint. If an error occurs, the operator can quickly be restored to the most recent checkpoint, reducing the amount of data it needs to reprocess when it restarts.
The checkpointing feature has been tried and tested in several on-premise versions of IBM Streams, and is now ready for deployment in the cloud—bringing robust “at least once” and “exactly once” capabilities to Streaming Analytics.
Get started today