I was talking earlier about encapsulation and the collection of objects that can be found in another object. Let's look at another possibility:
A corporation has multiple regions, a region has multiple branches, a branch has multiple customers. To summarize:
Let's say that the customers are loans taken by different types of companies. To find out the average amount of the loans given out by each branch, the strict approach would be that each branch has a method (function) that does the following:customer_count = 0
total_loans = 0
for each customer
customer_count = customer_count + 1
total_loans = total_loans + customer.getLoanAmount()
end // for each customer
return(total_loans / customer_count)
We protect the encapsulation of customers by providing a method that returns the loan amount (getLoanAmount). The first problem we have relates to performance: All the customer objects for a branch need to be instantiated (created). That may require quite a bit of memory. The second performance problem is that each customer object instantiation requires one database call.
What about if we want to do this average at the region level instead of the branch level? Then, to preserve the encapsulation, we need to created additional methods to return totals and counts. I'll let you imagine the processing needed. On the performance side, we see that the number of objects instantiated and the number of database calls increase with the number of branches and customer objects processed.
If you can convince the architects and programmers to relax their encapsulation requirements, you could add one method at the branch level, one at the region level, and even possibly one at the corporation level to return the desired average. Considering the average for a region, the method would implement the one SQL statement looking like:SELECT AVG(loan) FROM customers
WHERE region_id = :region_num
GROUP BY region_id;
In this case, I don't instantiate all the customer (and branch) objects, saving processing and memory. It is pretty obvious that the performance of these requests will be greatly improved compared to the "strict" OO approach.
Having a method that uses the database to do the processing is one thing. What about more complex processing like the average risk taken by a branch on their loans?
IDS provides the ability to implement user-defined aggregates. It would be easy to implement the average risk function. The number of lines of code would be less than implementing it in the application and the performance would be better even if it was only because of the significant reduction in the volume of data transferred.
I hope that in the last few blog entries I gave you some things to think about to improve the overall performance of your systems. The bottom line is: get involved in the analysis and design phases of new projects. You can add a lot of value there.
A while back, I started reading a book called "Thinking, Fast and Slow" from Daniel Kahneman.
Daniel Kahneman is a professor of psychology who won a Nobel prize in economic.
I have to admit, I am not done reading it. I need more "plane" time
What I read so far is fascinating. This is the type of book that can be read multiple times.
Today, I just want to relate some parts of chapter 14 where he put together a test to see how people would classify individuals
based on some personality descriptions. Here is the description:
"Tom W is a high intelligence, although lacking is true creativity.
He has a need for order and clarity, and for neat and tidy systems
in which every detail finds its appropriate place His writing is
rather dull and mechanical, occasionally enlivened by somewhat
corny puns and flashes of imagination of the sci-fi type. He has a
strong drive for competence. He seems to have little feel and little
sympathy for other people and does not enjoy interacting with
others. Self-centered, he nonetheless has a deep moral sense."
After reading the description, the subject was asked to figure out which field of study Tom was most likely in.
The description was actually designed so people should rank computer science among the best fitting
because of 'hints of nerdiness ("corny puns")'.
I laughed out loud when I read that part. I immediately though of one of my co-worker, Robert U., that
reminds me regularly that I make corny jokes during my presentations. And yes, I graduated in computer science.
For those who read this blog, if you make corny jokes/puns and graduated in computer science rejoice.
Embrace your nerdiness. You picked the right major
The book is full of interesting information including the fact that even statisticians can misuse/misinterpret statistics.
One I really like is:
"you dispose of a limited budget of attention that you can allocate to activities. . .
You can do several things at once, but only if they are easy and undemanding."
My conclusion: if someone tells you he/she's multitasking, they do trivial work.
First, let me put an end to the rumor that the IIUG conference was moved to San Diego to accommodate me.
It is true, I live in that area. It is also true that I am presenting my fair share of material but I can assure you that not even one passing thought on my location was part of the decision
This being said, the conference is approaching quickly. One more week in March and then a few weeks in April and we're there.
As usual the conference organizers are trying to outdo themselves year after year. This year is no exception. What happened since last year?
For one, Informix 11.70.xC3 was just out then. Since we've seen xC4 come out. Can we hope for xC5 soon?
On my side, I am giving four sessions on various subject:
- Dummies guide to TimeSeries
You want to get started with TimeSeries, come to this session.
- Informix applications uncovered on iOS
Yes, you can teach an old dog new tricks. At least this old dog is trying to prove it.
- PHP and Informix
Web applications have established themselves as mainstream. If you don't know PHP and the web, come see me.
- Update on Infomrix and open-source
Some progress there, come to this session and let's have a discussion.
I think these are interesting subjects. You should look for a lot more interesting sessions at the conference.
Take a look at the list of sessions and hands-on labs at: http://www.iiug.org/conf/2012/iiug/sessions.php
See you there!
I've been silent for quite a while. That does not mean I have not been busy!
A lot of efforts has been put on TimeSeries over 11.70.xC3 and 11.70.xC4 and we are still going full steam ahead. We continue to improve its performance, scalability, usability and functionality.
I wanted to put together a repository of information so people can find it all (or most of it in one place. For this purpose, I put together a wiki on developerWorks that is dedicated to The smart meter support. It is still a work in progress but I believe it is a good start. you can find it using the tinyurl: tinyurl.com/InformixSmartMeterCentral
Let me know what you think.
I listened to a presentation on this subject recently.
What I found interesting is that the research found that good ideas do not come from a Eureka moment. For example Darwin recounts his Eureka moment in his auto-biography. Further study of his personal journals show that Darwin had the full theory of natural selection many months before his stated Eureka moment.
According to research, most good ideas come from discussions:
- The rise of coffee houses is credited for the Enlightenment period in England
- In research labs most ideas come from the weekly lab meeting when people share their mistakes, issues, etc.
Another interesting point was that many good ideas come from the connection of people that share their thoughts to form a complete idea that is worth pursuing.
How can we generate good ideas we can act upon and make our environment better?
We need to interact with people in a situation that is conducive to generating these ideas. We have such an opportunity in just a few week: The Information on Demand (IOD) conference in Las Vegas, October 24-28.
Think about it: we will be with a bunch of people that have technical problems to solve around the use of technology in general and Informix in particular. We'll listen to presentation on new features, solutions in different industries, best practices, bird of a feather sessions, mingling in social settings such as the Informix celbration on Monday night.
Let's take advantage of this great opportunity! See you in Las Vegas!
I just had a need for a function that takes a datetime year to second and returns the number of seconds since January 1, 1970.
That would be easy to do by writing a "C" UDR but I did not want to deal with compiling and installing a shared library so I decided to approach it as an SPL routine.
Not that it is a great thing but I thought I'd share it with whoever needs it. Let me know if you find this useful:
CREATE FUNCTION epoch(dt datetime year to second)
DEFINE dt_varchar varchar(20);
DEFINE mm, dd, yy, days, hh, mi, ss integer;
LET mm = MONTH(dt);
LET dd = DAY(dt);
LET yy = YEAR(dt);
LET days = MDY(mm, dd, yy) - MDY(1, 1, 1970);
LET dt_varchar = dt;
LET hh = substr(dt_varchar, 12, 2);
LET mi = substr(dt_varchar, 15, 2);
LET ss = substr(dt_varchar, 18, 2);
return (days * 86400) + (hh * 3600) + (mi * 60) + ss;
You can use it either in an SQL statement or directly with EXECUTE FUNCTION. For example:
EXECUTE FUNCTION epoch("2008-11-25 08:32:45");
1 row(s) retrieved.
When I was in school I wanted to know why I had to learn something: Why learn about history? It’s about a bunch of dead people, often from far away. I would also ask: Why would I ever learn English. . .
I feel that the computer industry does not only forget about history but is quick to discard what has been done before. Just remember when object databases came out, the trade magazines where trumpeting the death of relational databases.
There is a disconnect between the object-oriented (OO) approach and the use of relational databases. This will be the subject of the next few entries. Lets start with an example:
An object person will look at the employees of a company and see managers, full-time employees, part-time employees and contractors. This will lead to the following model:
With the definition of the multiple types of employees, we can easily see that they will want multiple tables, one per defined object. Of course, for a database person, we see something like:CREATE TABLE employee (
Empno int PRIMARY KEY,
mgrNo int ,
. . .
As you can see, we can already see that a "data access expert" can start some discussions with the OO architects and programmers.
Don’t get me wrong. I like OO. I think it is a wonderful approach but just like anything it can be abused. See what you think of:http://csis.pace.edu/~bergin/patterns/ppoop.html
I'm always looking for interesting information to stimulate my thinking.
My morning routine usually starts at around 5:30am and I use my tablet to look at news, blogs, tweets, and some web sites.
As part of the tweets I get, it includes some from a site called TED. I've talked about TED before. Take a look at my blog entry for January 2011: Happy new year!
In this blog entry, I recommended no less than four TED presentations.
For people that don't know TED, it is an organization that organizes conferences on all sorts of subjects. The presentations used to be have to be 17 minutes.
Now, you can find presentations that can also be much shorted. TED's tagline is: "Ideas worth spreading".
So, in the morning, I often check what's new on TED to see if there is something interesting to watch during breakfast (of course, when I have breakfast alone...).
I recently came across one that I thought was interesting considering everything we've been hearing over the last 4-5 years about the global economy.
Of course, the fact that it talks about complexity and emergence is just a bonus.
Here is the link to this presentation: Who controls the world?
Someone asked me the following question:
"How do I keep passwords in the database so nobody can get them?"
It means that we cannot keep the the passwords in plain text in the database. Informix has a few functions that can be used for encryption: ENCRYPT_AES and ENCRYPT_TDS. It would be easy to create a table and encrypt the column that contains the passwords.
The next statement that came up was: "..but, if someone has the encryption password, he can get all the passwords. We need to protect the passwords from internal access".
This means that we need to use a different password to protect each password in the table. The solution I proposed was to use the password to encrypt itself. Let's look at an example:
CREATE TABLE passwd (
INSERT INTO TABLE passwd VALUES(1, ENCRYPT_AES("Jacques", "Jacques"));
INSERT INTO TABLE passwd VALUES(1, ENCRYPT_AES("Lance", "Lance0"));
INSERT INTO TABLE passwd VALUES(1, ENCRYPT_AES("Daniel", "Daniel"));
INSERT INTO TABLE passwd VALUES(1, ENCRYPT_AES("Umut", "Umut01"));
The values inserted look as follow:
SELECT * FROM passwd
I can now test f someone has the right password for user 1 by using the password value to decrypt itself:
SELECT col1, DECRYPT_CHAR(col2, "Jacques") FROM passwd WHERE col1 = 1;
If I use the improper password, I receive an error:
SELECT col1, DECRYPT_CHAR(col2, "Jacques") FROM passwd WHERE col1 = 3;
26008: The internal decryption function failed
One more thing. Note that the encryption password must be at least six-character long. This is why in the example I padded some encryption passwords. An easy way to work around it would be to always add padding to make sure we meet that minimum size. Keep in mind that the maximum size of an encryption key is 128 bytes.
With this approach, we can keep passwords in the database and keep them secret.
Yes, a new version of Informix is now available: Informix 11.70.
There are a lot of great features in this release. I could talk about the flexible grid that allows you to manage many machines like one and support rolling upgrades. I could talk about the new analytics features where we've seen speed up of warehouse-type queries of around 50%. I could talk about storage provisioning, improved installation and embeddability features. Yes, I could talk about all this but at this time, I want to talk about some features that should interest application developers.
I have to admit I am a little biased since my group is called application development services. However, the features I want to talk about were either requested by customers or have had a very positive reception in early mention under non-disclosure or during the beta period.
The first one will facilitate porting schemas from other databases to Informix. Let me first show an example:
CREATE TABLE tab (
col1 int NOT NULL default 0,
col2 int NULL,
col3 integer REFERENCES tab1(col1) CONSTRAINT tab1_c1
ON DELETE CASCADE
The first improvement is the ability to change the order of constraints and default values. Before Informix 11.70, the col1 definition would have returned an error since the default clause had to be located before the NOT NULL constraint.
The second improvement is the ability to explicitly say that a column can accept NULL values. Before, it was implied if the NOT NULL constraint was not there.
The last improvement shown in the example above shows that we can add "ON DELETE CASCADE" after the constraint name.
Another improvement in the DDL area is the ability to conditionally execute CREATE and DROP statements. Here are two examples:
CREATE TABLE IF NOT EXISTS tab ( . . .);
DROP PROCEDURE IF EXISTS my_proc();
If, for example, you want to make sure a table is re-created, you could always say:
DROP TABLE IF EXISTS tab;
If you want to make sure that you keep the table if it already exists, then don't do the "DROP IF EXISTS" and simply use "CREATE TABLE IF NOT EXISTS".
Finally, here's another DDL feature that was in great demand. It is not really an application development feature but it has been requested a lot: The ability to define the EXTENT size in a CREATE INDEX statement:
CREATE INDEX myidx tab(col1) FIRST EXTENT 8 NEXT EXTENT 8;
Don't forget to read the release notice since there are many other improvements on the INDEX capabilities.
On the DML side, we are now able to use expressions in the COUNT aggregate function. This can be useful if you want multiple aggregates in one statement:
SELECT COUNT(*) total, count(CASE WHEN sex = 'M' then 1 else NULL) males
COUNT(CASE WHEN sex = 'F' then 1 else NULL) female FROM tab;
Without this capability, you would have to solve this problem with three separate statements. For example:
SELECT * FROM
(SELECT COUNT(*) AS total FROM tab ),
(SELECT COUNT(*) AS male FROM tab WHERE sex ="M"),
(SELECT COUNT(*) AS female FROM tab WHERE sex="F");
These are just a small part of the new improvements in Informix 11.70. Make sure you read the release notice to learn more about Informix 11.70 at:
There is so much going on!
As you surely know, we've been doing a closed beta of the next version of Informix. We have received a lot of great feedback and we keep on working on this release.
We still can't talk about it. It is just a matter of time before we can do so stay tuned.
On other fronts, I am working on a follow up to my Application development short book. I've received a lot of positive feedback on this book and I am excited about continuing on the subject. When will it be ready? I'm hoping sometime this year.
Finally, do you realize that we are barely more than a month away from the Information on Demand (IOD) conference? I hope to see you there.
I ran into a simple problem the other day: I got an error while creating an index because the key was too big to fit in my index. As you may remember, the maximum size of an index key on a standard Unix/Linux system is 387 bytes.
Why do we have this limit?
This is a function of the page size and the way a B-tree index works. With the limit of 387 bytes on a 2K page, we can have at least 5 keys per page. This way, we divide the data in at least 5 parts at each level. the end result is eliminating comparisons to get to our our result faster. If we had only one key per page, it would be the equivalent of doing a sequential scan so the index would be useless.
In IDS version 10.0 (2005), Informix introduced the configurable page size. from that point on, it is possible to create DBspaces with page sizes of up to 16KB in size. the page sizes available has to be a multiple of the basic page size: 2KB or 4KB.
These larger pages can provide better performance when you have a wide table where the row size could be, let say 12KB. This way, you can fit an entire row in a page instead of using page chaining to support these larger rows. The savings in I/O could make a noticeable difference in performance in many situations.
Coming back to my indexing problem, I can fix it by using a larger page size. According to the documentation, the maximum index key size is as follow for each page sizes:
max key size
If your key fits in a 2KB page (shorter than 387 bytes), you could still use a larger page size for your index. The difference is that more keys would fit in one page so the index will not be as deep so it could provide additional performance.
Why not simply use the 16KB page size everywhere?
The short answer is that you could waste space on the page used for a table. A page can include a maximum of 255 rows. If your page size is 16KB and your row contains only two integers (2 x 4 bytes), you could, in theory, have over 2000 rows in that page. Since we are limited to 255 rows, we are wasting over 14,000 bytes.
Why not use four or five different page sizes?
Each page size requires its own buffer pool. We have to decide how much memory to allocate for each of these pools. Our decision may not result in the optimal memory allocation. The result is that some pools will have too much memory and others would benefit from more. Bottom line, this would make system administration more complex.
I would suggest to limit ourselves to two page sizes. The default page size and another one. The second page size depends on the environment requirements. I would also look at the size of the I/O on the particular machine and how many requests do multiple I/O on sequential data.
If you haven't looked at the configurable page size in IDS, maybe it is a good time to do so now.
There was a big change for me this year: I left the Informix CTE group to lead a new group. I am now a manager... and architect.
My new group is called Application Development Services. This mean that my group looks at IDS from a programmer point of view. Let me give you an example of what that means. Let's look at the major features included in IDS 11.50.xC6:
Backup from an RSS server
Dynamic listener threads
View event alarms
Basic Text Search enhancements
MERGE statement enhancements
I care about these features but I my attention goes to a feature of the new Client SDK that deserved a one line mention in its release notice:
"When you install Client SDK or IConnect, you have the option to install IBM Data Server Driver version 9.7. For more information, see the Client Products Installation Guide."
As you may remember, the long term direction for client applications is to use the DRDA interface to IDS. With this one line statement, I can now write programs using CLI (ODBC) without having to have to figure out where to get the driver. Since IBM has multiple packages available, I could have easily made the mistake of thinking that I need to download the entire DB2 client (about 600MB) to get this functionality.
In addition, this is all I need to build PDO_IBM for PHP applications or IBM_DB gem file for Ruby and Rails development.
As far as what my group will do, we can start by figuring out and prioritizing what features will make Informix more attractive to developers/programmers. It's not just features in the server. It has to consider everything. Even documentation.
I'm sure I'll have more to say about this later this year. Hopefully I'll have interesting results to report by the time I see some of you at the IIUG conference in April.
I'm currently in Paris in the second week of a business trip. For a two-week trip it is pretty common to have some clothes laundered otherwise this makes for a lot of stuff to lug around.
I took a look at what was offered at my hotel: To launder one shirt (men), they charge 8.50 euros (around 12.37 US dollars). As I was leaving the hotel, I saw a hotel employee with a laundry bag in her hands. Looking at the size of the bag, I could just imagine the small fortune spent by the guest.
As I was walking to the IBM office, I passed a dry cleaner that advertized the cleaning and pressing of men shirts for 2.20 euro per shirt for 5 shirts. The price at the hotel was over 3.8 times that price. With a little knowledge a a 5 minute walk, the hotel guest could save a significant amount of money: for 5 shirts the price goes from 42.50 euros to 11 euros. For a company with a lot of employees that use that type of service, this can add up to significant savings.
Of course, that made me think of Informix. It is well known that IDS provides a high level of performance and scalability and require minimal resources for its administration. In some cases, one database administrator can manage thousands of instances. Of course it is much easier to go with a safe choice, use as much hardware as needed, and hire as many employees and consultants as the situation requires for the management of the environment and business application development. This is simply the cost of doing business...
It seems to me that with a little knowledge and a little effort, that cost of doing business could be greatly optimized.
I think Terri is pulling my leg. She is apparently receiving concerned emails about what happened in Brussels. It was a humorous situations that I wanted to relate in a fun way. I guess I have a future in fiction writing :-).
Really, nothing happened. She took a picture, the police courteously told us that the American embassy did not want people to take picture. Terri deleted the picture from her camera while having a pleasant time with the officers. We then left and laughed about it.
So, don't worry, Terri is doing fine and we all had a good time in Brussels. I strongly encourage people to come and visit.