- Streaming data to Excel
- Easy setup for high-availability
- Resilient processing with the consistent region annotation
- Toolkits enhancements
Big data in motion
JacquesRoy 120000A2MS 1,703 Views
This has been in the works for quite a while but now it’s out!
This new version adds multiple interesting new features including:
Streaming data to Microsoft Excel makes it easy create user interfaces to get real-time feedback on what’s happening in addition to providing all the capabilities from Excel to do additional processing on the data received.
A lot has been done on the high-availability front. It is much easier to setup redundant administrative services and have them failover automatically when needed. In addition, there is no need for a DB2 database. Instead, Streams now relies on Zookeeper to preserve all the state information. Also,to continue to improve on high availability, Streams does not require a shared file system anymore.
There is a new feature that guarantees at least once processing a tuples within a region or a set of operators. It is easy to use. We simply have to add annotations that define the region and set a few parameters.
There has been enhancements to existing toolkits and addition of new ones such as support for Kafka in the messaging toolkit and the new HBase toolkit.
There is more to the new release of Streams. You can find the online documentation in the knowledge center at:
To get an idea of what’s new in this release, the a look at:
JacquesRoy 120000A2MS 1,093 Views
The general session started with an example of context computing and an interview with Captain Phillips.
All that was pretty exciting but what stole the show is the announcement of the partnership
Then I went on my way to attend Streams sessions talking about use cases.
The first one i attended is about a partner, Voci, that has a appliance that converts audio to text.
The next session was a panel of expert on geospatial analytics.
In the afternoon, I attended a session on the features of the new Streams beta that was announced last Friday.
I followed with a session on context computing used to counter fraud. I finished my day
The conference is winding down with the last day tomorrow.
JacquesRoy 120000A2MS 1,231 Views
Another full day.
It started at 7:00 with a breakfast meeting and was followed by a conference call.
"The Power of Now: Real-Time Analytics and IBM InfoSphere Streams"
My afternoon was taken by a Streams and text analytics lab.
I went back to the conference floor and had interesting conversations with many technical people
I'll be able to catch up on some Streams sessions Tomorrow. I can't wait to hear about some customer/partners stories
Also, I heard through the grapevine that there my be a big announcement at the general session.
JacquesRoy 120000A2MS 845 Views
After walking by 3 different Starbucks, I arrived at the conference breakfast hall.
Then it was time to attend the general session that started at 8:15.
Multiple speakers expanded on these themes.
I particularly likes the line: "Geospatial data will become analytics superfood".
There were many interesting sessions to choose from but because of multiple engagements, I only attended
There was so much, if you are not at the conference, you may want to look for InsightGo to be able to attend some general sessions remotely.
Now it's time to move on to Tuesday!
JacquesRoy 120000A2MS 1,171 Views
The event went as planned at the Mandalay Bay convention center with presentation on:
Many people attended and were engaged in the presentations. Overall a success.
The Insight conference officially started with the opening reception.
JacquesRoy 120000A2MS 1,105 Views
We're up and going.
The conference is still being setup but there are events happening this Saturday.
All sorts of other sessions are taking place in other areas of the Mandalay Bay convention center.
If you are already in Las Vegas for the Insight conference, this would be a good use of your time.
Finally, Sunday evening, the Insight conference officially starts with the Solution EXPO Grand Opening Reception
I'll post comments on the conference daily so, stay tuned!
JacquesRoy 120000A2MS 988 Views
We are barely more than two weeks away from the Insight conference.
As you know, Streams is excellent at providing real-time analytics. It can be used with other
It happens that I'll be participating in an IoT deep dive on Sunday October 26.
I'll be joining the main speakers:
The technical section is divided in three parts:
You can register for the event at: http://insight-deep-dive.eventbrite.com
Don't forget to come see me at Insight in my sessions and labs as well as a book signing
The book is: "The Power of Now: Real-Time Analytics and IBM InfoSphere Streams"
See you in Vegas!
JacquesRoy 120000A2MS 1,690 Views
Ok, this is probably not news to you but there is information you should know.
The Insight conference, formerly known as Information on Demand (IOD), is going on Oct 26-30.
For the week, I am particularly interested in the Streams sessions such as:
Just to name a few. I am involved in a few sessions:
The other exciting part for me is that I am coming out with a new book:
I am doing a book signing on Tuesday between 9:30 and 10:30.
The Insight conference provides many excellent learning opportunities on many subjects including Cloud, mobile/Social, security, analytics, and more.
It is also a great opportunity to network with experts from IBM, partners, and other customers.
A while back, I started reading a book called "Thinking, Fast and Slow" from Daniel Kahneman.
Daniel Kahneman is a professor of psychology who won a Nobel prize in economic.
I have to admit, I am not done reading it. I need more "plane" time
Today, I just want to relate some parts of chapter 14 where he put together a test to see how people would classify individuals
"Tom W is a high intelligence, although lacking is true creativity.
After reading the description, the subject was asked to figure out which field of study Tom was most likely in.
The description was actually designed so people should rank computer science among the best fitting
I laughed out loud when I read that part. I immediately though of one of my co-worker, Robert U., that
For those who read this blog, if you make corny jokes/puns and graduated in computer science rejoice.
The book is full of interesting information including the fact that even statisticians can misuse/misinterpret statistics.
"you dispose of a limited budget of attention that you can allocate to activities. . .
My conclusion: if someone tells you he/she's multitasking, they do trivial work.
JacquesRoy 120000A2MS 1,149 Views
When we talk about processing data in real time, it is easy to just write a program and be done with it.
A program is easy to write when it can process records sequentially. Once you reach the limit of this sequential processing, you start adding complexity that may represent the bulk of your work: You start by using multi-threading and eventually you need to also go to multi-processing to take advantage of multiple machines. It is much easier to use a framework to reduce those issues.
Still, a framework may give you the ability to distribute your processing but how easy is it to do? Now you want proper tools to assemble the many operations that you want to link together. Then, you also need to have the tools to easily identify bottlenecks so you can parallelize you operations. What about all the standard operations you would expect to be able to do?
This is where a platform comes in. It gives you the foundation for distributed processing but also gives you pre-built capabilities to interact with the outside world (files, message queues, databases, and so on) and also analytics so you don't have to reinvent the wheel.
JacquesRoy 120000A2MS 1,766 Views
InfoSphere Streams is starting to engage the open-source community to provide additional capabilities to its real-time analytics platform.
This is still very early in the process and we can assume we'll see evolve quickly. That may also be a way to consolidate
One of the projects is under the name resourceManagers.
Learn more about what is available for Streams on GitHub by looking at the newest page from the InfoSphere Streams playbook:
JacquesRoy 120000A2MS 1,577 Views
Anyone remembers this cartoon? I think the first time I saw it was in the '80s. Still, it keeps coming back.
This used to apply to IT requests. It can also be applied to all sort of things, including how quickly you want to go from data to actionable information.
Real-time analytics apply in many industries including medical, telecommunication, and security. You can find additional examples in the
There is a special need in processing machine data. The data can be generated at such a rate that we need machines to analyze all that data.
Data in motion processing is here to stay. It is a great approach to solve many business problems. Of course, this approach does not work in a vacuum.
The IBM solution for data in motion is InfoSphere Streams. You can download a free copy of the software to learn about it.
JacquesRoy 120000A2MS 1,392 Views
Do you know about IBM Data Magazine? It is the regular newsletter based on ibmdatamag.com that many people receive in their inbox
This online magazine contains articles related to: Big Data and Warehousing, Databases, Information Strategy, Integration and governance.
My first article got published on January 31st and is titled: "Getting the big data ball rolling".
I have put together a plan for a series of articles. When it gets more in depth, I will complement the articles with
Until next time...
JacquesRoy 120000A2MS 1,378 Views
I have to say, these are busy times!
With TimeSeries PoC and multiple activities around Streams, time flies by quickly.
It's been a while since I updated the InfoSphere Streams Playbook. This was overdue. There are new videos, training material and capabilities that were not reflected in the playbook. Here's what I updated:
With the end of the year so close, we can expect everyone to prepare for the new year. Looks like 2014 will be another fun year!
JacquesRoy 120000A2MS 1,726 Views
The other day I ran across an article on Infoworld.com: Cloudera pitches Hadoop for everything. Really?”
Of course, the article starts by mentioning the expression about hammers and nails. This is an old story and it appears that it is getting ready to repeat itself. Like it’s been said: “those who forget the past are doomed to repeat it”.
Hadoop has been the biggest star of the big data story. I have to say that it is revolutionizing data processing and for good reasons. Many seem to point to the use of cheap clusters based on commodity hardware. I personally prefer to attribute it to the large amount of data that has different requirements from traditional data processing.
JacquesRoy 120000A2MS 1,339 Views
There is now a new resource for Streams: https://www.ibmdw.net/streamsdev/
The Streamsdev site includes articles, blog entries, videos, and intro labs. You can also get to the download the latest quickstart edition of Streams from there. This way, you can download either the product or a vmware image with it and do the lab at your leisure.
This site is put together by developers for developers. Still, if you are new to InfoSphere Streams, you can find something there for you too. Just go to the getting started section under "Docs".
Since the IBM Information on Demand (IOD) conference starts this weekend, you can also find information on the activities (labs, presentation) on Streams during the conference. You can see the next few acticities on the mainpage or a more complete calendar under events.
This site is evolving. You should go look at it at least once a week to see what's new.
Hopefully many of you are going to the IOD conference next week. Enjoy the conference and learn a lot!
JacquesRoy 120000A2MS 1,266 Views
Last week, on October 22, IBM announced a new version of InfoSphere Streams: version 3.2.
The new version includes some nice improvements such as remote development, Rest API for data access, and improved toolkits.
If you are interested in trying Streams, IBM provides the quick start edition that you can download as native product or
Of course, you may need more information on how to use Streams. You can start by browsing through the InfoSphere Strreams Playbook at:
If you have questions, don't hesitate to drop me a note or comment on my blog entries.
Until next time!
JacquesRoy 120000A2MS 1,292 Views
If you've been following my blog over the last few years, you can notice a few things lately:
The significant part is really the name change. It went from "Informix and Computing" to "Big data in motion".
Let me first address the Informix part. Yes, I am still involved with Informix activities. In fact, I am currently working on a proof-of-concept for Informix TimeSeries that involves technologies such as Java, kafka, zookeeper, fastjson, messagePack, and more. So, Informix continues to be involved in "Big Data" and its use with other current technologies.
Will I continue to talk about Informix? Probably. It all depends if I believe I have something interesting to say on the subject. As long as I have activities with Informix I have opportunities to find interesting information.
Now. What about "Big data in motion"?
A while back I decided to go back to my old team: Worldwide Technical Sales and Enablement.
My main focus is now on InfoSphere Streams. This has already been an interesting ride. I've worked on multiple projects that include putting together an extensive training session, work on PoCs, writing DeveloperWorks articles, and more. I've even put together a DeveloperWorks wiki that centralizes all sort of resources related to InfoSphere Streams. I called it the InfoSphere Streams Playbook.
InfoSphere Streams is part of an overall "Big Data" architecture. There are many ties between Streams and the BigInsights platform and any other technologies that help getting big data under control. Yes, that includes Informix. It also includes many other technologies.
My focus may be mainly on "in-motion" data but the entire "Big Data" solution stack eventually interacts with it. That explains the new blog title.
As usual, I want to continue "casting a large net" so I can be free to talk about anything I find interesting.
So, drop me line, post comments. Let's continue a dialog that will help everyone (including me) learn new things and continue to have fun with our technological challenges.
JacquesRoy 120000A2MS 2,278 Views
A few years ago, IBM started talking about a smarter planet: Instrumented, interconnected, intelligent.
We are seeing more and more uses of sensors starting from your smart phone ant its many sensors (GPS, proximity, temperature, barometer, etc) to electric meters at your house. Add to that all the other sensors used in many industrial plants and even sensors on rails!
How can we convert this deluge of data into information?
This leads to issues related to two ways to handle data: in-motion and at-rest.
It happens that IBM has a mix of products that can handle these two "states" of the data:
For data in motion, we can use InfoSphere Streams for real-time analytics based on more in depth analysis on historical data (analytics models).
For the data at-rest, there are problems of how fast we can store it and how fast we can retrieve the information, specially when it concerns many users making requests. This would be an operational data store environment. Then, of course, there is the issue of "in-depth" analysis that requires fast access of large amount of data.
Informix has the combined solution with its TimeSeries capabilities and the Informix Warehouse Accelerator.
Learn more about the use of Informix to solve this big data problem in the following webcast:
The new Informix, version 12.10 was announced last week. It is time to start talking about the new features in TimeSeries.
The Informix team has added a public version of a fast loading mechanism. It allows to load into existing TimeSeries that are defined as part of a container.
This loader API was previously undocumented. It was only available to use as part of the Tooling. A lot of work went into it since its internal implementation. You should not try to use the older internal version since it disappears in 12.10 in favor of this new one.
You can find a description of its use in the "Informix Smart Meter Central" in the page Loading fastest with the loader API
You should also refer to the Informix documentation for more details.
Since the Loader API is an SQL API, it can be used by any clients including InfoSphere Streams.
For more information on how to use Streams with the loader api, please see the Informix Smart Meter Central wiki: Streams and the TimeSeries Loader API
More to come. Don't forget, the IIUG conference is just around the corner. This is the perfect place to learn about all the new features in Informix 12.10: Simply powerful.