Big data in motion
The first entry went in almost without problems. I had to quickly add some formatting after I posted it to make it more readable. Over time, I'll figure out how to do this and make my formatting fancier.
I'm new to blogging. I'm also pretty new at reading blogs. With the new Web 2.0 technologies, there is no need to go to a blog every so often to find out if there is something new. So, in case you don't know, here's some information on dealing with blogs.
Blogs support a capability called syndication. This allows to create what's called a feed to warn people of changes. There are two main types of feeds: RSS and Atom. No need to know more about it for now. Just that you can use a feed URL to get the changes in a syndicated site.
I would suggest that you use a feed reader. Why? because there are many sites you may want to subscribe to. For example:
- Guy Bowerman's blog (http://www-128.ibm.com/developerworks/blogs/page/gbowerman)
- Madison Pruet's blog (http://www.ibm.com/developerworks/blogs/page/roundrep)
- Feeds form the IIUG site (http://www.iiug.org/rss/index.php)
- Informix Zone site (http://www.informix-zone.com/)
- and many more. . .
Take a look at the latest IIUG newsletter (insider #94) for more in the Informix Resource section.
Having to visit each site regularly to see if it has new stuff can be time consuming. Using a feed readeraggregates all those and lets you know what's new. That's the way to go!.
If you do a search on the web, you can find multiple feed readers. I did not want to spend too much time figuring it out so I downloaded a Windows-based open-source product that will do until I find or am told about something better. Check out: http://www.feedreader.com/
If you haven't done it already, set yourself up and stay informed on the latest entries.
That's it for the introduction to this blog. Now it's time to dive into Informix and Computing![Read More]
When I was in school I wanted to know why I had to learn something: Why learn about history? It’s about a bunch of dead people, often from far away. I would also ask: Why would I ever learn English. . .
I feel that the computer industry does not only forget about history but is quick to discard what has been done before. Just remember when object databases came out, the trade magazines where trumpeting the death of relational databases.
There is a disconnect between the object-oriented (OO) approach and the use of relational databases. This will be the subject of the next few entries. Lets start with an example:
An object person will look at the employees of a company and see managers, full-time employees, part-time employees and contractors. This will lead to the following model:
With the definition of the multiple types of employees, we can easily see that they will want multiple tables, one per defined object. Of course, for a database person, we see something like:
CREATE TABLE employee (
Empno int PRIMARY KEY,
mgrNo int ,
. . .
As you can see, we can already see that a "data access expert" can start some discussions with the OO architects and programmers.Don’t get me wrong. I like OO. I think it is a wonderful approach but just like anything it can be abused. See what you think of:http://csis.pace.edu/~bergin/patterns/ppoop.html[Read More]
JacquesRoy 120000A2MS 1,517 Views
I saw some interesting comments related to my blog entries. Hope you are reading them... the main subject is object to relational mapping (OMR). I'll get back to that soon. For now, I want to continue what I was talking about
Some people are really passionate about following a strict approach. This can cause problems with such things as encapsulation that insures that the implementation of the object is opaque. Look at it a little bit as being very strict about following the highest possible normal form. My point is that you have to be careful about not offending people in their approach. Learn about their methodology before jumping into a passionate presentation of your approach: Take them from where they are to where you want them to be slowly, watching for resistance where communication could break down.
Looking back at the employee definition presented in my previous blog entry, note the following: A manager can have multiple employees working for her. This leads to a representation where a manager object includes a collection of employee objects. This also leads to implementation performance problems where because all the objects were instantiated (created) it took a long time to create the object that included the collection of object. The concept of "lazy binding" was implemented to solve this. Basically, the object in a collection is not instantiated until it is accessed.
This is another area where database specialists can start a discussion to improve the overall performance. Now that I've set the premise, I'll cover it in more details next time.[Read More]
I was talking earlier about encapsulation and the collection of objects that can be found in another object. Let's look at another possibility:
A corporation has multiple regions, a region has multiple branches, a branch has multiple customers. To summarize:
Let's say that the customers are loans taken by different types of companies. To find out the average amount of the loans given out by each branch, the strict approach would be that each branch has a method (function) that does the following:customer_count = 0
total_loans = 0
for each customer
customer_count = customer_count + 1
total_loans = total_loans + customer.getLoanAmount()
end // for each customer
return(total_loans / customer_count)
We protect the encapsulation of customers by providing a method that returns the loan amount (getLoanAmount). The first problem we have relates to performance: All the customer objects for a branch need to be instantiated (created). That may require quite a bit of memory. The second performance problem is that each customer object instantiation requires one database call.
What about if we want to do this average at the region level instead of the branch level? Then, to preserve the encapsulation, we need to created additional methods to return totals and counts. I'll let you imagine the processing needed. On the performance side, we see that the number of objects instantiated and the number of database calls increase with the number of branches and customer objects processed.
If you can convince the architects and programmers to relax their encapsulation requirements, you could add one method at the branch level, one at the region level, and even possibly one at the corporation level to return the desired average. Considering the average for a region, the method would implement the one SQL statement looking like:SELECT AVG(loan) FROM customers
WHERE region_id = :region_num
GROUP BY region_id;
In this case, I don't instantiate all the customer (and branch) objects, saving processing and memory. It is pretty obvious that the performance of these requests will be greatly improved compared to the "strict" OO approach.
Having a method that uses the database to do the processing is one thing. What about more complex processing like the average risk taken by a branch on their loans?
IDS provides the ability to implement user-defined aggregates. It would be easy to implement the average risk function. The number of lines of code would be less than implementing it in the application and the performance would be better even if it was only because of the significant reduction in the volume of data transferred.
I hope that in the last few blog entries I gave you some things to think about to improve the overall performance of your systems. The bottom line is: get involved in the analysis and design phases of new projects. You can add a lot of value there.[Read More]
JacquesRoy 120000A2MS 1,103 Views
Since I mentioned user-defined aggregates (UDAs), in the previous entry, it as good a time as ever to cover this subject.
Everybody that knows a little bit about (object) relational databases knows about aggregate functions such as AVG, SUM, and so on. User-defined aggregates allow you to create your own functions to aggregate your data. The thing is it can do more.
I had a situation once where people had a complex stored procedure that included two foreach loop that were executing SELECT statements. The goal of the procedure was to take a bunch of geo shapes and merge them into a multi-polygon. This procedure had multiple issues:
1) It was issuing multiple SELECT statements making it less than optimal2) It solved one and only one problem due to the specificity of the SELECT statements. (with IDS 11.50, we could build the SQL statements dynamically in the stored procedure)3) It was complex (82 lines long)
I was able to convert the procedure into a 22 lines UDA (including 2 lines of comments and 6 blank lines). The UDA could then be inserted into an SQL statement that would decide how to group rows. The advantages of the UDA are:
1) The UDA is simpler (22 lines without loops or SQL statements vs. 82 lines)2) The UDA is more flexible since it gets the grouping from the SQL statement3) Better performance, more streamlined
UDAs include an initialization function and parameter. Since the initialization parameter can be of any type, we can pass a row type that can include multiple values providing the flexibility required for any initialization. Any business processing that takes a large number of values to get to an answer can be converted from a client application to a user-defined aggregate. The benefits include:
1) reduced network traffic, translating into increased performance2) Simpler code since all the code related to issuing SQL statements, cursor manipulations and error testing can be eliminated3) That specific business processing becomes available to all applications
Some people are really passionate about following a strict approach. This can cause problems with such things as encapsulation that insures that the implementation of the object is opaque. Look at it a little bit as being very strict about following the highest possible normal form. My point is that you have to be careful about not offending people in their approach. Learn about their methodology before jumping into a passionate presentation of your approach: Take them from where they are to where you want them to be slowly, watching for resistance where comminucation could break down.
Looking back at the employee definition presented in my previous blog entry, note the following: A manager can have multiple employees working for her. This lead to a representation where a manager object includes a collection of employee objects. This lead to implementation performance problems where because all the objects were instantiated (created) it took a long time to create the object that included the collection of object. The concept of "lazy binding" was implemented to solve this. Basically, the object in a collection is not instantiated until it is accessed.
This is another area where database specialists can start a discussion to improve the overall performance. Now that I've set the premise, I'll cover it in more details another time.[Read More]
JacquesRoy 120000A2MS 1,256 Views
I was reading the hibernate documentation the other day. I came across the following statements in chapter 11:
". . .you most likely should consider a Stored Procedure if you need mass data operations"
That makes the case for my previous blog entries. In our case, we should also consider user-defined functions and user-defined aggregates. Let's talk about stored procedure for a bit.
Over the years, I've seen sites where they had basically multiple copied of the same stored procedure. the only difference between the different copied was a variation on a specific SQL statement. With IDS 11.50, we now support dynamic SQL in stored procedures. This is a good time to take a look at these stored procedures and see if you could eliminate some redundancy. One benefit is that if you have to change the processing in the future, you'll have to modify only one procedure as opposed to multiple ones.
Of course, I would also look at the possibility of using user-defined functions or user-defined aggregates as illustrated in another blog entry. Let me know what you find out.[Read More]