When we talk about processing data in real time, it is easy to just write a program and be done with it.
The problems start piling up when we add analytics and volume.
A program is easy to write when it can process records sequentially. Once you reach the limit of this sequential processing, you start adding complexity that may represent the bulk of your work: You start by using multi-threading and eventually you need to also go to multi-processing to take advantage of multiple machines. It is much easier to use a framework to reduce those issues.
Still, a framework may give you the ability to distribute your processing but how easy is it to do? Now you want proper tools to assemble the many operations that you want to link together. Then, you also need to have the tools to easily identify bottlenecks so you can parallelize you operations. What about all the standard operations you would expect to be able to do?
This is where a platform comes in. It gives you the foundation for distributed processing but also gives you pre-built capabilities to interact with the outside world (files, message queues, databases, and so on) and also analytics so you don't have to reinvent the wheel.
For a more complete discussion on the subject, take a look at my two articles on the IBM Datamag site: part 1 and part 2.
InfoSphere Streams is starting to engage the open-source community to provide additional capabilities to its real-time analytics platform.
This is still very early in the process and we can assume we'll see evolve quickly. That may also be a way to consolidate
the offering of the most popular open-source toolkits currently available on the Streams Exchange.
One of the projects is under the name resourceManagers.
The current available resource manager that is available to support Streams is Yarn!
Learn more about what is available for Streams on GitHub by looking at the newest page from the InfoSphere Streams playbook:
Streams on GitHub.
Anyone remembers this cartoon? I think the first time I saw it was in the '80s. Still, it keeps coming back.
This used to apply to IT requests. It can also be applied to all sort of things, including how quickly you want to go from data to actionable information.
In Today's world, it seems that we need to get insights now. This is one reason for the rise of the interest in "data in motion".
Real-time analytics apply in many industries including medical, telecommunication, and security. You can find additional examples in the
following article: Big Data in Motion Where? Everywhere.
There is a special need in processing machine data. The data can be generated at such a rate that we need machines to analyze all that data.
You can find more information on machine data examples in the ebook: The Rise of Machine Data: Are You Prepared.
Data in motion processing is here to stay. It is a great approach to solve many business problems. Of course, this approach does not work in a vacuum.
It is a great complement to new and established systems based on data at rest. Here, I mean systems that use data repositories such as operational
data stores, data warehouses, Hadoop (BigInsights) and other NoSQL repositories.
The IBM solution for data in motion is InfoSphere Streams. You can download a free copy of the software to learn about it.
It is called the InfoSphere Streams Quickstart Edition. Visit the streamsdev site to download a copy of it and access an introductory lab (under Docs).
Do you know about IBM Data Magazine? It is the regular newsletter based on ibmdatamag.com that many people receive in their inbox
every few weeks (or is it weekly?).
This online magazine contains articles related to: Big Data and Warehousing, Databases, Information Strategy, Integration and governance.
There are multiple regular columnists and I am now one of them. I am covering Data in motion in a monthly column.
My first article got published on January 31st and is titled: "Getting the big data ball rolling".
You can find it at: http://ibmdatamag.com/2014/01/getting-the-big-data-ball-rolling/
I have put together a plan for a series of articles. When it gets more in depth, I will complement the articles with
my blog entries. I will also continue to cover other subjects and likely more technical subjects in this blog.
Hopefully this will get me to write a blog entry a bit more regularly than I've done lately.
Until next time...
I have to say, these are busy times!
With TimeSeries PoC and multiple activities around Streams, time flies by quickly.
It's been a while since I updated the InfoSphere Streams Playbook. This was overdue. There are new videos, training material and capabilities that were not reflected in the playbook. Here's what I updated:
In this section, I updated the databases supported and support for MQTT
There is now a link that should provide the complete lists of available videos dynamically. Also, I cleaned up the tutorials and added a brand new series of tutorials.
Video use cases
Some new youtube videos that show interesting use of Streams
With the end of the year so close, we can expect everyone to prepare for the new year. Looks like 2014 will be another fun year!
The other day I ran across an article on Infoworld.com: Cloudera pitches Hadoop for everything. Really?”
Of course, the article starts by mentioning the expression about hammers and nails. This is an old story and it appears that it is getting ready to repeat itself. Like it’s been said: “those who forget the past are doomed to repeat it”.
Hadoop has been the biggest star of the big data story. I have to say that it is revolutionizing data processing and for good reasons. Many seem to point to the use of cheap clusters based on commodity hardware. I personally prefer to attribute it to the large amount of data that has different requirements from traditional data processing.
The traditional data processing needs are still there and still growing. Getting rid of “silos” of data has proven extremely difficult. It also relies on getting rid of years of investments and re-writing many proven applications.
Instead of trying to fit everything into Hadoop, it is much better to have an overall strategy that takes into accounts the different needs of different data sets and make sure the overall architecture accommodates exchange of information between all of them.
Cloudera want to become the “enterprise data hub” powered by Hadoop. Like the article mentions, “Hadoop i still seen on all sides as a bucket of parts..”. Maybe it is a bit early to talk about an enterprise data hub based on Hadoop.
Of course, if all you have is a hammer, everything looks like nail
There is now a new resource for Streams: https://www.ibmdw.net/streamsdev/
The Streamsdev site includes articles, blog entries, videos, and intro labs. You can also get to the download the latest quickstart edition of Streams from there. This way, you can download either the product or a vmware image with it and do the lab at your leisure.
This site is put together by developers for developers. Still, if you are new to InfoSphere Streams, you can find something there for you too. Just go to the getting started section under "Docs".
Since the IBM Information on Demand (IOD) conference starts this weekend, you can also find information on the activities (labs, presentation) on Streams during the conference. You can see the next few acticities on the mainpage or a more complete calendar under events.
This site is evolving. You should go look at it at least once a week to see what's new.
Hopefully many of you are going to the IOD conference next week. Enjoy the conference and learn a lot!
Last week, on October 22, IBM announced a new version of InfoSphere Streams: version 3.2.
This follows version 3.1 that was announced on May 21.
The new version includes some nice improvements such as remote development, Rest API for data access, and improved toolkits.
Over the next few blog entries, I'll go into more details on these features. In the meantime, you can find information on
InfoSphere Streams 3.2 at:
If you are interested in trying Streams, IBM provides the quick start edition that you can download as native product or
as a VMWare image. you can download it at:
Of course, you may need more information on how to use Streams. You can start by browsing through the InfoSphere Strreams Playbook at:
If you have questions, don't hesitate to drop me a note or comment on my blog entries.
Until next time!
If you've been following my blog over the last few years, you can notice a few things lately:
I have not blogged in a few months
My blog's name has changed
The significant part is really the name change. It went from "Informix and Computing" to "Big data in motion".
Let me first address the Informix part. Yes, I am still involved with Informix activities. In fact, I am currently working on a proof-of-concept for Informix TimeSeries that involves technologies such as Java, kafka, zookeeper, fastjson, messagePack, and more. So, Informix continues to be involved in "Big Data" and its use with other current technologies.
Will I continue to talk about Informix? Probably. It all depends if I believe I have something interesting to say on the subject. As long as I have activities with Informix I have opportunities to find interesting information.
Now. What about "Big data in motion"?
A while back I decided to go back to my old team: Worldwide Technical Sales and Enablement.
My main focus is now on InfoSphere Streams. This has already been an interesting ride. I've worked on multiple projects that include putting together an extensive training session, work on PoCs, writing DeveloperWorks articles, and more. I've even put together a DeveloperWorks wiki that centralizes all sort of resources related to InfoSphere Streams. I called it the InfoSphere Streams Playbook.
InfoSphere Streams is part of an overall "Big Data" architecture. There are many ties between Streams and the BigInsights platform and any other technologies that help getting big data under control. Yes, that includes Informix. It also includes many other technologies.
My focus may be mainly on "in-motion" data but the entire "Big Data" solution stack eventually interacts with it. That explains the new blog title.
As usual, I want to continue "casting a large net" so I can be free to talk about anything I find interesting.
So, drop me line, post comments. Let's continue a dialog that will help everyone (including me) learn new things and continue to have fun with our technological challenges.
A few years ago, IBM started talking about a smarter planet: Instrumented, interconnected, intelligent.
We are seeing more and more uses of sensors starting from your smart phone ant its many sensors (GPS, proximity, temperature, barometer, etc) to electric meters at your house. Add to that all the other sensors used in many industrial plants and even sensors on rails!
How can we convert this deluge of data into information?
This leads to issues related to two ways to handle data: in-motion and at-rest.
It happens that IBM has a mix of products that can handle these two "states" of the data:
For data in motion, we can use InfoSphere Streams for real-time analytics based on more in depth analysis on historical data (analytics models).
For the data at-rest, there are problems of how fast we can store it and how fast we can retrieve the information, specially when it concerns many users making requests. This would be an operational data store environment. Then, of course, there is the issue of "in-depth" analysis that requires fast access of large amount of data.
Informix has the combined solution with its TimeSeries capabilities and the Informix Warehouse Accelerator.
Learn more about the use of Informix to solve this big data problem in the following webcast:
Solving the Big Data Challenge of Sensor Data
Date: June 26, 2013
Time: 1:00 PM EDT / 10:00 AM PDT
Register at: https://event.on24.com/eventRegistration/EventLobbyServlet?target=registration.jsp&eventid=641115&sessionid=1&key=AA3293E3AC9715CF3D602D0DEAE4D52B&sourcepage=register
The new Informix, version 12.10 was announced last week. It is time to start talking about the new features in TimeSeries.
The Informix team has added a public version of a fast loading mechanism. It allows to load into existing TimeSeries that are defined as part of a container.
This loader API was previously undocumented. It was only available to use as part of the Tooling. A lot of work went into it since its internal implementation. You should not try to use the older internal version since it disappears in 12.10 in favor of this new one.
You can find a description of its use in the "Informix Smart Meter Central" in the page Loading fastest with the loader API
You should also refer to the Informix documentation for more details.
Since the Loader API is an SQL API, it can be used by any clients including InfoSphere Streams.
For more information on how to use Streams with the loader api, please see the Informix Smart Meter Central wiki: Streams and the TimeSeries Loader API
More to come. Don't forget, the IIUG conference is just around the corner. This is the perfect place to learn about all the new features in Informix 12.10: Simply powerful.
We are seeing more and more interest in using both InfoSphere Streams and Informix together.
This is in the context of "Big Data".
InfoSphere Streams is a platform that allows you to add operators as you see fit.
In our case, there are already a few operators that can be used to read from or write to Informix from InfoSphere Streams.
There is a new DeveloperWorks article that describe how this could be done. With these basic examples you should be
able to integrate Informix in a Streams environment (or vice versa) in no time.
I'm always looking for interesting information to stimulate my thinking.
My morning routine usually starts at around 5:30am and I use my tablet to look at news, blogs, tweets, and some web sites.
As part of the tweets I get, it includes some from a site called TED. I've talked about TED before. Take a look at my blog entry for January 2011: Happy new year!
In this blog entry, I recommended no less than four TED presentations.
For people that don't know TED, it is an organization that organizes conferences on all sorts of subjects. The presentations used to be have to be 17 minutes.
Now, you can find presentations that can also be much shorted. TED's tagline is: "Ideas worth spreading".
So, in the morning, I often check what's new on TED to see if there is something interesting to watch during breakfast (of course, when I have breakfast alone...).
I recently came across one that I thought was interesting considering everything we've been hearing over the last 4-5 years about the global economy.
Of course, the fact that it talks about complexity and emergence is just a bonus.
Here is the link to this presentation: Who controls the world?
Happy new year everyone!
The informix team is always hard at work improving the Informix products.
It turns out that, while working on V.next, a feature escaped and made it into version 11.70.xC5 and above (xC6 being the current release as of October 2012).
It concerns loading data into TimeSeries using a relational view of a TimeSeries (also known as VTI interface). To take advantage of this new feature, you simply
use the TS_VTI_ELEM_INSERT (128) flag when you create the relational view with the TSCreateVirtualTab() procedure.
A simple test showed that this feature loads data 3.6 times faster than previously. Of course, your "mileage" will vary depending on your environment. To know more on how you can
use this new feature, consult the following link from the Informix Smart Meter Central wiki:
Another year is coming to an end.
All in all, not a bad year. Informix released 11.70.xC5 and 11.70.xC6 while continuing to work on the next major version of the product. You can find the latest Informix release notes at: Informix 11.70 Information center
We continue to see more acceptance of features like IWA and TimeSeries. The Informix group also delivered many presentations and demo that the IIUG and the IOD conference. We can ad to that support for regional Informix users' group, new redbook, and so on.
Well... stay tuned. 2013 is lining up to be another good one for Informix. But what about ourselves. Are we improving over time like good wine or...
Here are some of my new year resolutions:
- Get back in shape.
In 2012, I neglected this a bit but I am already getting back to it by running regularly and continuing to train in Brazilian Jiujitsu.
- Learn new things
Informix does not operate in a vacuum. It needs an ecosystem. For me, I need to look into what it takes to integrate Informix more with the IBM BigData products. I already started. You can find some information on using Informix with InfoSphere Streams in the SmartMeterCentral wiki.
I'm also slowly started using twitter (@jroy58). I re-tweetted two tweets and I will put in my first tweet as soon as I'm done with this blog entry.
What about your new year resolutions?
For one, are you using the best Informix you could use? Resolve to upgrade to Informix 11.70.xC6 as soon as possible