Data Science

From dreams to streams: turning the vision of streaming analytics into practical business reality with IBM Streams Designer

Share this post:

Today’s web is a much more open place than ever before—most social networks and other web platforms offer public APIs that allow anyone to request and use data on a scale that would have been unthinkable just a few years ago.

These sites don’t just generate large amounts of data—they generate it at an incredibly rapid rate. For example, on average, Twitter logs around 6,000 new tweets per second, or 500 million per day. To get to grips with that kind of data in real time, you need an infrastructure that is built to deal with web-scale volume and velocity.

The desire to master this new world of data is the first big reason why streaming analytics is moving from niche to mainstream on many companies’ data science roadmaps.

The second reason is another huge topic: the Internet of Things (IoT). Instrumentation is becoming more affordable and pervasive in the consumer market, with smart lightbulbs, smart thermostats, smart locks, and other home automation solutions leading the way. And in the corporate world, we’re seeing ever-increasing investment in sensor-equipped production lines, telematics for delivery vehicles, and even wearable technology for worker safety.

Just like modern web platforms, these IoT devices generate huge volumes of data at high speed, and streaming analytics technologies are vital to turn this data into real-time, actionable insight.

Yet while many companies are getting excited about the potential of streaming analytics, there’s often a lack of concrete ideas about what to do with the data, and how to make it happen. To spark some creative thinking around potential use cases, let’s take a quick look at a couple of examples of simple streaming applications that you can build with IBM Streams Designer in IBM Watson Data Platform.

Keeping the critics at bay: real-time sentiment analysis

The web can be a dangerous place for companies’ reputations. If customers have a bad experience, they won’t just write you a strongly worded letter of complaint—they’ll leave bad reviews on a host of third-party review sites. These reviews can have a significant influence on other customers, leading to lost sales and reduced market share.

To minimize the damage, you need to be able to react quickly whenever a bad review appears on an influential site—and that means monitoring those sites 24/7. This can quickly become an overwhelming task for your customer service team, so it’s a prime candidate for automation with streaming analytics.

With Streams Designer, you can quickly build a streaming analytics dataflow, using a drag-and-drop interface to assemble all the different components (known as “operators”) that you need to transform data from a review site’s API into actionable real-time alerts whenever a bad review is posted.

For example, your data flow could start with a source operator such as IBM MessageHub (IBM’s managed Apache Kafka service), which continuously captures the stream of new reviews from the website’s API.

From there, it passes the reviews into a filter operator, which applies some business logic to reduce the total stream to just the reviews that you are interested in—for example, reviews of your business, or of one of your products.

Next, the filtered reviews pass to a custom code operator, which uses algorithms written by your data scientists to analyze the data. In this case, the operators might perform sentiment analysis on the text of the review, and assess whether it is positive or negative.

Finally, when negative reviews are detected, Streams Designer can send an alert to MessageHub, or to any other system that has a REST API. For example, you could use it to raise a notification in your CRM system, highlighting the content of the customer’s complaint and making it easy for your customer service team to respond with an appropriate offer or apology.

In Streams Designer, you can build this data flow in a matter of minutes. With the exception of the custom code operator, all the other elements are available as standard, and can be configured with a few clicks. Even the custom code editor is designed to make life as simple as possible for the user, enabling them to leverage hundreds of Python data science libraries and sophisticated algorithms with just a few lines of code.

More importantly, once you have set up the data flow, Streams Designer will execute it for you, abstracting away all the complexity of building a scalable streaming analytics architecture and integrating the components of the data pipeline. You can focus on the business logic, without worrying about the underlying infrastructure.

Talking to the machine: real-time location tracking

Now let’s look at an IoT use case—and to keep things simple, let’s focus on a type of IoT device that is already pervasive in everyday life: the smartphone.

Imagine you are running an airline, and you want to know whether your customers are going to make it to the airport in time to catch their flight. If they have your app on their smartphone (and they give you permission to do so), you could use location services on their phones to monitor their geographical proximity to the airport in the hours before the flight.

Streams Designer makes it easy to set up this kind of service. Simply configure a MessageHub source operator to feed the location data into your data flow, and use a geofencing operator to check whether customers are within a reasonable distance of the correct airport.

(By the way—the geofencing capability is part of the standard library of operators in Streams Designer, so you can drag and drop it into your dataflow and configure it in a few clicks. There’s no need to write your own geofencing code.)

If the location of the phone is outside a defined zone when the plane is boarding, then you could use MessageHub to prompt a customer service rep to call your customer to ask if they’re having problems. On the other hand, if you can see that they are already in the airport, your team might be able to find them and fast-track them through security to avoid missed connections or flight delays.

Of course, it’s not just airlines that can benefit from this type of streaming analytics. Retailers could use a similar setup to send special offers to customers who are driving near one of their stores. Or you could even use Wi-Fi beacons within the store itself to monitor which department a customer is currently walking through, and send them the latest offers for products in the categories they are browsing.

Again, setting up these types of data flow is very straightforward with Streams Designer’s graphical interface.

Take the next steps

Hopefully these examples will have captured your interest and provoked some new trains of thought about how your business could benefit from streaming analytics.

The key takeaway is that Streams Designer makes it easy to build anything from a simple straight-through pipeline up to a complex, multi-branching data pipeline—which means that you can start small and add more sophistication as you get more comfortable with the technology.

The other important message is that streaming analytics doesn’t need to be daunting. If you have an idea for a streaming use case, it’s not going to require a six-month project, heavy up-front investment and specialist engineering skills to test it out. Just sign up for IBM Watson Data Platform and you can start building your first pipeline in minutes. Even if it doesn’t work out, the cost of failure is minimal—and the potential benefits could be transformative for your business.

If you’d like to take a deeper dive, you can find a detailed walk-through of how to build a geofencing data flow here. Or if you’re ready to take your first steps with IBM Streams Designer, getting started is easy and free. If you don’t have a Watson Data Platform account, sign up here. After you finish registering, just select Streams Designer from the Tools menu, or add it to an existing project.

To learn more about IBM Watson Data Platform:

More Data Science stories

GDPR updates to IBM Analytics Engine

In consideration of GDPR regulations, IBM has made changes to the IBM Analytics Engine Cloud Service to provide improved security and compliance readiness, and needs you to take certain actions to include those changes as part of your applications. All customers are required to do this irrespective of their geography, or the type of data […]

Continue reading

Accelerate decisions with real-time Streaming Analytics

The world doesn’t stop, which also means that data never stops pouring in. If you’re in the analytics game, then basing your efforts on a snapshot of historical data always involves a degree of compromise. Did you choose the right data set? One that is an accurate representation of ongoing operations so that it doesn’t skew your analysis? How soon will your insights be outdated? How can you cost-effectively store the data you’re analyzing?

Continue reading

Flex harder, better, faster, stronger

A few months ago, we reimagined data warehousing in the IBM Cloud with Flex Performance, the flagship tier of our new Flex line of offerings. Flex brings new levels of elasticity, speed, and resiliency to data warehousing on the IBM Cloud, and forms the foundation for our strategy moving forward. We're working continuously to not only strengthen and enhance its capabilities, but to also make them more accessible to you so you can better leverage them and get the most out of your data warehouse. Today, we're proud to announce a significant update to our Flex family.

Continue reading