Big data in motion
There is so much going on!
As you surely know, we've been doing a closed beta of the next version of Informix. We have received a lot of great feedback and we keep on working on this release.
We still can't talk about it. It is just a matter of time before we can do so stay tuned.
On other fronts, I am working on a follow up to my Application development short book. I've received a lot of positive feedback on this book and I am excited about continuing on the subject. When will it be ready? I'm hoping sometime this year.
Finally, do you realize that we are barely more than a month away from the Information on Demand (IOD) conference? I hope to see you there.
I ran into a simple problem the other day: I got an error while creating an index because the key was too big to fit in my index. As you may remember, the maximum size of an index key on a standard Unix/Linux system is 387 bytes.
Why do we have this limit?
This is a function of the page size and the way a B-tree index works. With the limit of 387 bytes on a 2K page, we can have at least 5 keys per page. This way, we divide the data in at least 5 parts at each level. the end result is eliminating comparisons to get to our our result faster. If we had only one key per page, it would be the equivalent of doing a sequential scan so the index would be useless.
In IDS version 10.0 (2005), Informix introduced the configurable page size. from that point on, it is possible to create DBspaces with page sizes of up to 16KB in size. the page sizes available has to be a multiple of the basic page size: 2KB or 4KB.
These larger pages can provide better performance when you have a wide table where the row size could be, let say 12KB. This way, you can fit an entire row in a page instead of using page chaining to support these larger rows. The savings in I/O could make a noticeable difference in performance in many situations.
Coming back to my indexing problem, I can fix it by using a larger page size. According to the documentation, the maximum index key size is as follow for each page sizes:
If your key fits in a 2KB page (shorter than 387 bytes), you could still use a larger page size for your index. The difference is that more keys would fit in one page so the index will not be as deep so it could provide additional performance.
Why not simply use the 16KB page size everywhere?
The short answer is that you could waste space on the page used for a table. A page can include a maximum of 255 rows. If your page size is 16KB and your row contains only two integers (2 x 4 bytes), you could, in theory, have over 2000 rows in that page. Since we are limited to 255 rows, we are wasting over 14,000 bytes.
Why not use four or five different page sizes?
Each page size requires its own buffer pool. We have to decide how much memory to allocate for each of these pools. Our decision may not result in the optimal memory allocation. The result is that some pools will have too much memory and others would benefit from more. Bottom line, this would make system administration more complex.
I would suggest to limit ourselves to two page sizes. The default page size and another one. The second page size depends on the environment requirements. I would also look at the size of the I/O on the particular machine and how many requests do multiple I/O on sequential data.
If you haven't looked at the configurable page size in IDS, maybe it is a good time to do so now.
There was a big change for me this year: I left the Informix CTE group to lead a new group. I am now a manager... and architect.
My new group is called Application Development Services. This mean that my group looks at IDS from a programmer point of view. Let me give you an example of what that means. Let's look at the major features included in IDS 11.50.xC6:
I care about these features but I my attention goes to a feature of the new Client SDK that deserved a one line mention in its release notice:
"When you install Client SDK or IConnect, you have the option to install IBM Data Server Driver version 9.7. For more information, see the Client Products Installation Guide."
As you may remember, the long term direction for client applications is to use the DRDA interface to IDS. With this one line statement, I can now write programs using CLI (ODBC) without having to have to figure out where to get the driver. Since IBM has multiple packages available, I could have easily made the mistake of thinking that I need to download the entire DB2 client (about 600MB) to get this functionality.
In addition, this is all I need to build PDO_IBM for PHP applications or IBM_DB gem file for Ruby and Rails development.
As far as what my group will do, we can start by figuring out and prioritizing what features will make Informix more attractive to developers/programmers. It's not just features in the server. It has to consider everything. Even documentation.
I'm sure I'll have more to say about this later this year. Hopefully I'll have interesting results to report by the time I see some of you at the IIUG conference in April.
I'm currently in Paris in the second week of a business trip. For a two-week trip it is pretty common to have some clothes laundered otherwise this makes for a lot of stuff to lug around.
I took a look at what was offered at my hotel: To launder one shirt (men), they charge 8.50 euros (around 12.37 US dollars). As I was leaving the hotel, I saw a hotel employee with a laundry bag in her hands. Looking at the size of the bag, I could just imagine the small fortune spent by the guest.
As I was walking to the IBM office, I passed a dry cleaner that advertized the cleaning and pressing of men shirts for 2.20 euro per shirt for 5 shirts. The price at the hotel was over 3.8 times that price. With a little knowledge a a 5 minute walk, the hotel guest could save a significant amount of money: for 5 shirts the price goes from 42.50 euros to 11 euros. For a company with a lot of employees that use that type of service, this can add up to significant savings.
Of course, that made me think of Informix. It is well known that IDS provides a high level of performance and scalability and require minimal resources for its administration. In some cases, one database administrator can manage thousands of instances. Of course it is much easier to go with a safe choice, use as much hardware as needed, and hire as many employees and consultants as the situation requires for the management of the environment and business application development. This is simply the cost of doing business...
It seems to me that with a little knowledge and a little effort, that cost of doing business could be greatly optimized.
I think Terri is pulling my leg. She is apparently receiving concerned emails about what happened in Brussels. It was a humorous situations that I wanted to relate in a fun way. I guess I have a future in fiction writing :-).
Really, nothing happened. She took a picture, the police courteously told us that the American embassy did not want people to take picture. Terri deleted the picture from her camera while having a pleasant time with the officers. We then left and laughed about it.
So, don't worry, Terri is doing fine and we all had a good time in Brussels. I strongly encourage people to come and visit.
I'd like to come back to the book "The Goal" I mentioned in my last blog entry.
This book focuses on manufacturing environments but the interview at the end of the book mentions that the concepts of the theory of constraints (TOC) can be applied to other fields. Looking back in teh book, I found that they ask three basic questions about the impact of changes:
We can easily see that this makes sense to a financial person in manufacturing. Let's see how we can look at it when our concern is running a database.
Did you sell more?
"The cheapest, fastest and most reliable components of a computer system are those that aren’t there"
Did you reduce the number of people on the payroll?
I've met many customers that have a mixed environments where we see a 10-1 ratio of Informix personnel compared to the personnel for the competitor's platform. Why not bring that up to the appropriate people. I'm sure your local IBM representative will be happy to help.
Did you reduce inventory?
I think these three questions are worth exploring no matter which environment you're in. That can be good for your company, for you, and for all the people that invest their efforts into the Informix products.[Read More]
The machines configurations caused problems in using Data Studio with WAS CE, I already mentioned that yesterday. This also meant that we could not do the web services lab. To work around this problem, I spent a few minutes showing the students what was involved in creating a web service using the vmware image on my laptop. Of course, it took a lot less time than would be required to do the lab since everything was already setup.
The rest of the class went well. It included a review of the enterprise features such as backup, SDS, HDR, RSS, CLR, ER, CDC (Change Data Capture), and MQ integration. I think we should add a lab on shared disk and HDR since the labs appear to be very well received. They are more fun than just sitting there listening to a speaker. The class ended with a prsentation on cloud computing.
I went through the evaluation and found that the class was a success. I know there are a few adjustments but it was a good start. All in all, it was a good few days.
I took the train to Paris. It takes around 2 hours 15 minutes to cover the 500 kilometers between Strasbourg and Paris. That's an average of over 220 km per hour. The ride was so smooth. It is interesting to note that a plane ride would have taken one hour but the train is actually faster since you can get there just a few minutes before departure and it drops you off in the middle of Paris instead of the "far away" Charles De Gaulle Airport. That's a reminder that we should always use the right tools for the right problem :-)[Read More]
You may not know but the Informix lab is extending a helping hand to universities around the world. One example of that was the hosting of university professors at the last Informix conference.
As part of this, I am on my way to the university of Strasbourg (France) to teach a 3-day seminar on subjects related to IDS. I had all the latitude I wanted (and more) to decide on the content. I will be delivering this seminar starting next Monday (June 8). We'll see how it is received. Watch for my blog entries after each day, network access permitting.[Read More]
I came back from the Informix conference Thursday night and woke up thinking about an analogy about why we use Informix Dynamic Server. More on that in a minute.
I've been using databases for a long time. I believe that the first formal database system I used was back in 1984. It was a hierarchical database. I developed an inventory system for the Canadian Coast Guard. Over the following years, I used and supported multiple databases systems some looking more like C-ISAM and others relational. I still remember the good old days where I had to debug Oracle installation scripts :-)
So, why Informix? Isn't a database a database?
I uses to use a car analogy: people buy cars and they are used to what happens to it: If they have to go to the shop to get it fixed or tunes every other month, that's just the way cars are. Who would believe that you could buy a car and only have to put gas in it for years after years without having to waste time in the shop? the car is used to get you from point A to point B day after day. It almost makes it invisible but not quite since you still have to drive it. It's not the same with a database system: it can really be invisible.
I woke up Friday with this thought: You can write just about any application in any computer language you want. Why don't we all use COBOL. Way back, I know a guy that could do EVERYTHING in COBOL. He was even doing system programming! An object oriented version of COBOL has been available for years buy why. Isn't the "vintage" version of COBOL good enough? If I'm not mistaken, the number of COBOL lines of code in production still surpass any other programming language. That should be enough of an argument to standardize on it.
It seems to me that many people apply this line of reasoning to database systems. The trend is to look at databases as commodity. Who cares that one barely requires any attention? Who cares that it provides easy continuous availability? Who cares that it has great storage optimization? The difference is only more overhead. that translates only into more costs. Those significant costs are easy to hide so why worry about them. Everybody does it so no need to be more efficient...
Well, me, I'm old school. I come from an era where memory was measured in kilobytes and disk drives in megabytes. Yes, memory is much bigger now and not that expensive. Disk drives are so much bigger and not very expensive. Computers are so fast now. It seems to me that we should stop the insanity and pay attention to efficiency. Isn't that what cloud computing, virtualization and being green is all about?
No matter how I try to slide it, to me, Informix is number 1.[Read More]
I mentioned the Informix warehouse in my previous entry. There is the chat with the lab coming up. Here's something more: a new tutorial on DeveloperWorks:
Then there are the informix Warehouse product pages: