Of late, I have found myself called in on a number of performance-related issues and thought I should teach a class on it. Since I'm too lazy to come up with my own benchmark, and since it would be more useful to dissect a widely used benchmark, I thought a good study would be the "Fastest DBA benchmark" used by Lester Knutsen and Advanced DataTools. For those of you unfamiliar with this, this is a periodic contest thrown by Advanced DataTools, where they come up with a slow-running application and challenge participants to tune it. I've included links to details about this contest in the Resources section.
I also realized that classes are not as useful as the written word, so what you are now reading is an exploration of how to tune a database based on a dissection of the Fastest DBA benchmark. We will execute the benchmark, note performance issues, address each issue individually, and attempt the benchmark in a final pass with all of the issues addressed. Note that this is the second benchmark; there will be a follow-on article that dissects the third (and current) benchmark.
This article is not about installing the benchmark — the README file is fairly comprehensive. I had two issues putting it on my MacBook, the first being the SHMBASE for the engine. I had to edit the source $ONCONFIG file and set that properly (as it is re-copied over with each setup.sh execution). The second issue I had was that my Informix user defaults to the 'admin' group, so I had to manually change ownership on the dbspaces to Informix:Informix.
The execution of the benchmark as it is delivered will not complete in a timely manner. The benchmark itself is simple enough: it loads a customer table, then generates bills for those customers. The final kicker is an update to the bill table which takes quite some time. The work is not that complicated, but it provides an opportunity to look at how performance can be improved with tuning.
Typically, in any performance tuning project the first issue is one of problem statement. How do you know when you have tuned the problem away? How can you measure performance and determine that it has improved, and by how much? For the purposes of this article, I will contend that the benchmark takes too long to run and that we need it to execute in under an hour. Furthermore, we will capture database metrics at each stage to determine whether we made it better.
The second issue of database tuning is determining what to tune. We will usually see the best results from correcting:
- The data model
- The application
- The disk
- And finally the engine.
Since I am running on a single disk on a MacBook Pro, there will not be much that I can do in terms of disk tuning. Things to keep an eye on are CPU, disk, and memory usage.
Once you have the benchmark installed, execute an onstat -z to clear the engine metrics so you can be
sure you are seeing only what the benchmark does to the engine.
You will note that the creators of the benchmark were kind enough to turn on the explain plan, so that is built in the current directory. There is a timer on the entire run.sh script, but it might be useful to have intermediate times for each step, so we will edit benchmark_run.sql and insert timestamps after each block of code with following statement:
select current hour to second from systables where tabid=1; |
Next, start the benchmark with run.sh. In a separate window,
start an onstat -pr.
This will print out engine metrics every 5 seconds. You can then sit next to the benchmark and monitor it or go home for the evening
or weekend.
Our first insert into the bills table (on my laptop) took 1m16s (the second took 1m17s) and the engine profile looks like this:
Listing 1. Engine Performance Profile
IBM Informix Dynamic Server Version 11.50.FC6DE -- On-Line -- Up 1 days 02:35:27 -- 118812 Kbytes Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 175586 178320 4720830 96.28 113398 237194 1002488 88.69 isamtot open start read write rewrite delete commit rollbk 4130803 157 495827 1321784 330055 24 65 55 0 gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs 0 0 0 0 0 0 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 112.78 13.67 2 1 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 449 0 2480376 0 0 1 7 28 ixda-RA idx-RA da-RA RA-pgsused lchwaits 28 0 165256 165232 1520 |
This output is analogous to the oil dipstick in a car. In one brief page, it captures the state of the engine.
The top line (under Profile) covers disk and cache performance. Here, we are interested in the read and write cache hit rates. Read cache hits should be in excess of 99 percent, and we would like write cache hits to be above 95 percent. As you can no doubt see, we are not there. Since we haven't done much work yet and the cache numbers may not have settled, this may be OK. Although looking one line higher at the size of the engine in memory (118,812 KB), we aren't working with a lot of memory, Let's keep cache on the table.
The second line (isamtot) indicates how much actual work has been performed (isamtot is total isam operations for example).
The third line (gp_*) describes how the engine is managing with Generic Pages or non-standard pages. Since we are not using any in this benchmark, the readings are all 0. I will remove this line from future dipstick readings.
The fourth line starts with the number of times we have exceeded locks, user threads or buffers (all 0 and should generally be so). The next two are the amount of User CPU and System CPU, the Number of Checkpoints and the number of Flushes.
The fifth line is of great interest:
Table 1. Line 5 of engine profile
| Parameter | Description |
|---|---|
| bufwaits | The number of times a process had to wait for a buffer, whether because there were not enough buffers or because the buffer in question was in use by someone else. |
| lokwaits | The number of times there was a wait for a lock to disengage. |
| lockreqs | The number of lock requests (number of rows or pages locked). |
| deadlks | The number of times deadlocks were detected. |
| dltouts | The number of times a deadlock timed out. |
| ckpwaits | The number of waits on checkpoints. |
| compress | The number of compresses (cleaning up a data page after deletes). |
| seqscans | The number of sequential scans. |
Of interest here is a marginal number of buffer waits, and sequential scans, but a rather large number (at this point) of lock requests.
The sixth line tells us how read-ahead is working, and the number of waits for latches. Index and data read-aheads appear to all get consumed by the engine. Again, we see a rather large number (at this point) of latch requests.
Let's let the benchmark run a little longer and check again.
It's been two days. My laptop has gone to sleep a few times, so we haven't actually run for 48 hours (although the engine thinks it has been up all that time). Let's revisit our dipstick and see what it has to say.
Listing 2. Performance after two days
IBM Informix Dynamic Server Version 11.50.FC6DE -- On-Line -- Up 3 days 05:42:03 -- 127004 Kbytes Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 19966695480 19989379994 60891586765 67.21 660217 892639 3121469 78.85 isamtot open start read write rewrite delete commit rollbk 15495512 9115 1409626 6189708 759769 955295 1112 6097 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 221553.88 127876.85 1136 604 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 364286149 0 59940823034 0 0 51 1707 102128 ixda-RA idx-RA da-RA RA-pgsused lchwaits 224274 0 19966050547 19966274029 4146547 |
The things we wanted to keep an eye on have ballooned out of control. (Note: Typically, the checkpoint wait number might give pause, but in this case, the benchmark is running a single SQL process, so that process is waiting on itself, and checkpoints are a more efficient way to write to disk anyway, so let's not get excited about that.)
Read cache rate is at 67 percent. Write cache rate is at 78 percent. While the amount of logical ISAM instructions we have performed is at 6 million reads, the number of lock requests (which is a physical indicator of what was read) is at 59 billion. The number of sequential scans is out of control as are the buffer waits and latch waits. We are doing a huge amount of read-ahead, which is getting consumed, and that is great, but why are we doing so much?
It looks like we are doing a sequential scan (of an entire table) to satisfy the SQL command, whatever it is. Let's take a closer look at that.
In the sysmaster database are two tables of current interest, sysptprof (Partition Profile) and syssesprof (Session Profile). Let's see what they offer us:
select * from sysptprof order by abs(seqscans) desc; |
Note: If the number of sequential scans exceeds MAXINT (more than 2 billion), it will roll over to a negative number — hence, we ask for the absolute value. Here are the results from that select:
Listing 3. Table performance profile
dbsname benchmark tabname bills partnum 3145799 lockreqs 136990885 lockwts 0 deadlks 0 lktouts 0 isreads 2419074 iswrites 605280 isrewrites 853404 isdeletes 0 bufreads 1085720339 bufwrites 2071693 seqscans 99566 pagreads -1398716020 pagwrites 534809 dbsname benchmark tabname customer partnum 3145795 lockreqs 727188 lockwts 0 deadlks 0 lktouts 0 isreads 99566 iswrites 0 isrewrites 99562 isdeletes 0 bufreads 1898629 bufwrites 99562 seqscans 4 pagreads 727202 pagwrites 99562 |
Note how pagreads (page reads) has rolled over and gone negative. There is no way to tell how many times this has occurred during our run. Likewise, the lockreqs (lock requests) may have rolled over multiple times.
Out of 102,000 sequential scans (see onstat output above), 99,000 of those have been against this table. This table has ~600,000 rows in it. We have issued at least 136 million lock requests against it. (In fact, 20 minutes later, that number is at 934 million. This is a problem.) If we take 99,000*600,000, we get 59.4 billion, which, not coincidentally, makes up a large portion of our lock requests. In other words, we have sequentially scanned this table, locking every row, a lot of times. Since there are 100,000 customers, I must assume that we are close to the end of the current step. In fact, it just finished:
Real 3113m27.925s User 0m0.006s Sys 23m45.717s |
Unfortunately, we did not get a chance to look at the Session Profile table. This would have showed us similar information, but at a session level, so we could have seen that our session had done an inordinate number of sequential scans and lock requests.
I will address this first problem separately because I saw it three times last week from colleagues. Our first major problem is that we are scanning the full bills table every time we want a row out of it and locking every row we read.
This sequential scan/lock nonsense has totally obscured real performance metrics. We cannot pay attention to read cache hit rate since we have been sequentially scanning a huge table over and over. Accordingly, let's address this before looking at what will be more realistic performance metrics.
The actual statement that took two days to run was:
update customer
set balance_due = balance_due + ( select sum ( total_bill )
from bills where bills.customer_number = customer.customer_number )
where customer_number in ( select customer_number from bills );
|
If we look at the explain plan (located in sqexplain.out), we begin to see the problem (my notes in bold):
Listing 4. Explain plan
update customer
set balance_due = balance_due + ( select sum ( total_bill )
from bills where bills.customer_number = customer.customer_number )
where customer_number in ( select customer_number from bills )
Estimated Cost: 3 -- The optimizer thought this would be cheap
Estimated # of Rows Returned: 10 With only 10 rows affected
1) informix.customer: INDEX PATH Excellent, an index was used to read
this table
(1) Index Name: informix. 101_2
Index Keys: customer_number (Serial, fragments: ALL)
Lower Index Filter: informix.customer.customer_number = ANY subquery
Subquery:
Estimated Cost: 2
Estimated # of Rows Returned: 1
1) informix.bills: SEQUENTIAL SCAN Here is our problem
Filters: informix.bills.customer_number = informix.customer.customer_number
Subquery:
Estimated Cost: 2 The optimizer thinks this will be cheap. Why?
Estimated # of Rows Returned: 1
1) informix.bills: SEQUENTIAL SCAN
[deletia]
|
What is wrong with the bills table that the optimizer thinks it should sequentially scan the entire table? Let's have a look at the table.
Select * from benchmark:systables where tabname = 'bills'; tabname bills rowsize 1266 nrows 0.00 created 08/23/2010 ustlowts |
I've excluded a lot of excess information here to show you two key pieces of information. According to the system catalogs, the bills table is empty. "ustlowts" is null, which indicates that statistics have never been run against this table. In fact, if we go look at the benchmark code, we will see that the table is created, populated, and used, but statistics are never updated. This is Database 101 stuff. The optimizer uses system catalogs indicating how many rows there are and how well the indexes are formed to figure out what query path to use. Since the table is "empty," it is obviously easiest to read the entire table rather than trying to use an index.
The second key piece of information is that we are trying to read the bills table using a customer_number. There is no index on customer_number in the bills table. Even if we do update statistics for the table, the optimizer will have no choice but to sequentially scan it.
In our first improvement to the benchmark, let us add an index on the bills table and update statistics for the table. Since an index makes it more expensive to write to a table, create the index immediately after the bills table gets populated. Let us then update statistics after the index has been created. (Note, in V11.x statistics are generated at index creation time as part of the index creation process. Let's test that out.)
We add the following:
create index bills_idx1 on bills(customer_number); |
Immediately, after the bills table is populated, we then clear the status counters
with: onstat -z and restart the
benchmark.
Our index has now been added. Let's check the table info:
tabname bills rowsize 1266 nrows 605280.0000000 ustlowts 2010-08-25 20:52:22.00000 |
Much better. Statistics have been updated, and not only is nrows populated but we also see when statistics were last updated. This run performed as follows:
Listing 5. After updating statistics
Real 36m5.033s User 0m0.005s Sys 0m1.479s dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 1397183 1600342 17971316 92.23 656896 885752 3087149 78.72 isamtot open start read write rewrite delete commit rollbk 14888305 284 1354389 6438035 747981 956259 30 170 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 2980.00 119.77 22 20 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 244886 0 10281980 0 0 10 12 71 ixda-RA idx-RA da-RA RA-pgsused lchwaits 479759 64 901978 1381601 12531 |
We are now within reason on a number of these indicators and have removed the noise, but it is still too early to tune the engine.
The most effective thing we can do to improve performance is correct any data model issues. Here is our current data model.
Figure 1. Data model for the benchmark database
Here we see some problems. For starters, the bills table has all sorts of de-normalized junk in it. All of the Customer, State, and Product information has been dumped in there.
Figure 2. Highlighting superfluous data columns
All of the items in bold are extra junk to carry around that could be easily retrieved from the source tables. Sales_Tx is egregious because the one place where it could be used, it is instead read out of the state table. Product_Price does not belong in here, either, but if we look at how it is used, it is merely poorly named. This is really how much the customer was billed for the product on this bill. It should more properly be called Billed_Price.
If we look a little more deeply, we see static data mixed together with dynamic data. The customer name and address, for example, is unlikely to change. However, we certainly hope that their balance_due does. Furthermore, billing notes would not typically describe a customer. They should describe the relationship between a customer and his bill. Furthermore, the bill itself is broken. In this benchmark, we will generate six bills for each customer, instead of a single bill with six line items on it. Let's revisit this data model with these thoughts in mind.
Here is a revised model. It still has issues: Zipcode is enough to identify the city and state, so those attributes should migrate to a City table (instead of a state table), but that data is static, so we'll leave it alone.
Figure 3. Revised data model
If we were to reorganize the schema, that would take a little bit of work, and we would want to include that in the benchmark time. Note that executing the same application against the above data model took 16 seconds to execute (after the data had been reorganized).
Before we do that, we should size the tables. While Informix no longer has a performance issue with a large number of extents, there is still an issue in that each extent allocation takes time. Far better to size the table properly and not have to waste time extending it every time we insert 20 rows.
Sizing for these tables should be:
Table 2. Table sizes
| Table | Size |
|---|---|
| Bill_Items | 20,416 |
| Billing_Notes | 202,048 |
| Bills | 14,976 |
| Customer | 14,976 |
| Customer_Balance | 1,728 |
Product and state will not change, so we won't bother sizing them. We will, however, update statistics for them. (Note: This became unnecessary and ineffective with changes further down this article).
The next most effective thing we can do to improve performance is tune the application. This fits under the rubric of "Know thy data, Know thy queries." We should keep an eye on the data model when we do this as there may be some things we can assist the application with in the data model.
The first thing that the application does is generate three sets of bills for customers who have a product_code that starts with A. A different number of bills might be generated based on the customer start date (>= 1/1/2000, >=1/1/2005, >=1/1/1985). And variously for products 1 and 2, 4, and 7, and for 9 and 10. Looking at the data, all product codes start with A and the lowest start date in the customer table is 1/1/2007. In other words, we could combine these three insert statements into a single insert statement and only have to read the source tables once instead of three times.
Next, the application updates the Bills table, setting a product discount to 10
where the start_date is <= 1/1/2009 and their balance_due
is > 50000. The WHERE clause is a repeat of this condition
and adds a red herring of "product number in (1 through 10)." Since this
encompasses all products, there is no point to that filter. There are about 40,000
customers whose start_date is < 1/1/2009 and whose
balance_due is > 50000. This means that a sequential scan of the table will be faster
than an indexed read, and we cannot play with the
filter criteria.
Next, we update the bills again, setting the total bill to the price minus the discount plus the sales tax, which is read from the state table. Again, there is the red herring of product numbers in 1 through 10 (in other words, all of them). If only we had left the sales_tax in the Bills table we wouldn't have to reach out and grab it, especially since we now have to navigate through the customer table to get there. So let's put that back in.
Finally, we update the customer, incrementing their balance_due with the value of total_bill from the bills table. There is another red herring in that we only update customers who have bills. Since they all have bills this is moot. We have put Balance_due into its own table at present. If it were in the table with the Bills in it, we could avoid an extra update. Arguably, since there is now only one bill per customer, it might live there.
Then we read the totals to verify that the application ran correctly: Record count, sum(product_price), sum(product_discount), sum(total_bill). This information is now in the Bill_Items table. We could speed this read up if we made summary columns for these in the bill table. Of course, this would yield an incorrect record count, which we happen to know is the #Customers*6, but that is really departing from the straight and narrow).
And, we read Count(*) and sum(balance_due) from customers. Since we now have Balance_Due in the bills table, we should read directly from there now.
Our data model now looks like this:
Figure 4. Revised data model 2
The operations we are going to perform are:
- Scan customer table (6 seconds)
- Scan State table (0 seconds)
- Join Customer and State table (6 seconds)
- Insert into Bills (7 seconds)
- Create indexes (1 second)
- Load bill_items table (8 seconds)
- Create index on bill_items (3 seconds)
- Update bill_items (58 seconds); sequential scan of bill_items with a nested loop read of customer
- Update bills (5 seconds); sequential scan of bills with a nested loop read of bill_items
- Verification (4 seconds)
Listing 6. After revising the data model
Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 721155 734054 10036861 92.82 19724 60567 1450097 98.64 isamtot open start read write rewrite delete commit rollbk 10295768 447 807589 4558564 706236 706197 65 45 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 75.50 42.50 4 3 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 459393 0 7468727 0 0 2 13 22 ixda-RA idx-RA da-RA RA-pgsused lchwaits 605515 0 105695 711210 212 real 1m38.397s user 0m0.007s sys 0m1.510s |
If we look at the update of the bill_items, all we are doing is updating a discount based on information from the customer table, which we had when we first built the bill_items table. Let's take advantage of that and set the discount while we are creating the table in the first place. The following use of a CASE statement will take care of that for us:
Listing 7. CASE statement
insert into bill_items
(bill_number,
customer_number,
product_number,
billed_price,
product_discount)
select bill_number,
bills.customer_number,
product_number,
product_price,
case when (customer.start_date<="01/01/2009"
and customer.balance_due>50000) then 10
else 0
end case
from bills, product, customer
where product.product_number in (1,2,4,7,9,10)
and bills.customer_number=customer.customer_number;
|
And if we run again?
Listing 8. Rerun CASE statement
Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 209038 221593 8178955 97.44 25104 49481 844982 97.03 isamtot open start read write rewrite delete commit rollbk 7271368 376 908477 4242538 706230 100913 65 32 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 45.13 3.65 4 3 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 1617 0 6056020 0 0 2 13 10 ixda-RA idx-RA da-RA RA-pgsused lchwaits 1784 0 202678 204462 81 real 0m34.160s user 0m0.006s sys 0m0.050s loaded bills in 3 sec loaded bill_items in 19 sec updated bills in 6 sec verifications 4 sec |
Look at all those locks. Since we are not manipulating locks ourselves, the engine is locking each row (or page) as it needs it. Let's lock the tables in exclusive mode. Note that this requires using transactions to surround our locks. We need to keep an eye on the logical logs and ensure that we have enough so that they don't fill and put us into a long transaction.
Locking the tables exclusively before beginning each insert our update operation yields:
Listing 9. Exclusive locks
78036 Kbytes Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 9943 217306 7344046 99.88 5757 38980 836059 99.31 isamtot open start read write rewrite delete commit rollbk 7270986 496 908610 4241799 706213 100905 54 28 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 35.80 3.90 1 0 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 3 0 3252 0 0 0 11 11 ixda-RA idx-RA da-RA RA-pgsused lchwaits 6 0 0 6 152 real 0m31.829s user 0m0.006s sys 0m0.048s |
Something else I forgot at the application layer: Informix does very well when reading and writing large volumes of data. But with a logging database, we are going to log every operation we perform. We can improve large batch inserts by setting the table type to raw when we load it and then reset it to a standard table so we can build indexes on it. Changing the Bills and Bill_Items tables to raw and then loading yields:
Listing 10. Use raw tables to load
Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 849 208192 6732674 100.00 10031 13202 836054 98.80 isamtot open start read write rewrite delete commit rollbk 7474373 370 807441 4042766 706210 100903 49 20 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 31.98 2.32 6 6 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 0 0 2664 0 0 3 11 7 ixda-RA idx-RA da-RA RA-pgsused lchwaits 0 0 0 0 13 real 0m21.686s user 0m0.005s sys 0m0.019s |
We have exhausted what we can demonstrate with this particular example and will have to dig into the engine in a following article with an example more geared to engine exploration. Of interest in this example is that we executed the benchmark without touching the engine. We "corrected" the data model and application and updated statistics. The tools we used to monitor what the engine was doing are applicable across any tuning exercise with Informix.
Below is the original benchmark SQL script and the modifications we made in this article. Additions are marked in bold while deletions are marked in italics.
-- File: benchmark_run.sql
-- Date: 4/20/2009
-- Author: Lester Knutsen
-- Description:
-- There are 4 tables in this process to generate bills
-- customer table with 100100 records,
-- states table with sales tax info
-- product table with 10 products and the
-- bills table where the results get written to.
-- Log: Created 4/20/2009
set explain on;
update statistics high for table product(product_number);
update statistics high for table state(state);
update statistics high for table customer(customer_number);
drop table bill_items;
set isolation dirty read;
-- drop and re-create the bills table
drop table "informix".bills ;
-- Note: Type raw and not carrying useless columns around
-- Also got rid of Primary key since we are now a raw table.
create raw table "informix".bills
(
bill_number serial not null ,
customer_number integer,
last_name char(30),
first_name char(30),
address char(50),
city char(20),
state char(2),
state_name char(20),
zip integer,
start_date date,
bill_date date,
bill_notes char(1000),
product_code char(4),
product_number integer,
product_name char(50),
product_price decimal(16,2),
product_discount decimal(16,2),
sales_tx decimal(16,2),
total_bill decimal(16,2),
balance_due decimal(16,2),
primary key (bill_number )
);
-- Note: Bill_items was not part of original data model
create raw table bill_items
(
bill_number integer not null ,
customer_number integer,
product_number integer,
billed_price decimal(16,2),
product_discount decimal(16,2)
) ;
-- Note: Loading Bills in one single statement.
begin work;
lock table bills in exclusive mode;
insert into bills
(bill_number,
customer_number,
bill_date,
sales_tax,
total_bill,
balance_due)
select 0,
customer_number,
today, -- bill_date
state.sales_tax,
0, -- total bill
balance_due
from customer, state
where customer.state=state.state;
alter table bills type(standard);
-- Note: And putting the index back on, effectively updates statistics
create index bill_idx1 on bills(customer_number);
commit;
-- Note: Replaced the next 4 blocks of code with the single block above
-- Create bills for product numbers 1 and 2
insert into bills
(
customer_number,
last_name,
first_name,
address,
city,
state,
state_name,
zip,
start_date ,
bill_date ,
product_code ,
product_number,
product_name ,
product_price ,
product_discount ,
sales_tx,
total_bill
)
select
customer.customer_number,
customer.last_name,
customer.first_name,
customer.address,
customer.city,
customer.state,
state.state_name,
customer.zip,
customer.start_date ,
today, -- bill_date ,
customer.product_code ,
product.product_number,
product.product_name ,
product.product_price ,
0, -- product_discount ,
state.sales_tax,
0 -- total_bill
from customer, state, product
where customer.state = state.state
and customer.product_code[1] = product.product_code[1]
and customer.start_date >= "01/01/2000"
and product.product_number in ( 1, 2 );
Create bills for product number 4 and 7
insert into bills
(
customer_number,
last_name,
first_name,
address,
city,
state,
state_name,
zip,
start_date,
bill_date,
product_code,
product_number,
product_name,
product_price,
product_discount,
sales_tx,
total_bill
)
select
customer.customer_number,
customer.last_name,
customer.first_name,
customer.address,
customer.city,
customer.state,
state.state_name,
customer.zip,
customer.start_date,
today, -- bill_date,
customer.product_code,
product.product_number,
product.product_name,
product.product_price,
0, -- product_discount,
state.sales_tax,
0 -- total_bill
from customer, state, product
where customer.state = state.state
and customer.product_code[1] = product.product_code[1]
and customer.start_date >= "01/01/2005"
and product.product_number in ( 4, 7 );
-- Create bills for product number 9 and 10
insert into bills
(
customer_number,
last_name,
first_name,
address,
city,
state,
state_name,
zip,
start_date ,
bill_date ,
product_code ,
product_number,
product_name ,
product_price ,
product_discount ,
sales_tx,
total_bill
)
select
customer.customer_number,
customer.last_name,
customer.first_name,
customer.address,
customer.city,
customer.state,
state.state_name,
customer.zip,
customer.start_date ,
today, -- bill_date ,
customer.product_code ,
product.product_number,
product.product_name ,
product.product_price ,
0, -- product_discount ,
state.sales_tax,
0 -- total_bill
from customer, state, product
where customer.state = state.state
and customer.product_code[1] = product.product_code[1]
and customer.start_date >= "01/01/1985"
and product.product_number in ( 9, 10 );
Note: Loading Bill_Items with info that used to be in Bills
begin work;
lock table bill_items in exclusive mode;
insert into bill_items
(bill_number,
customer_number,
product_number,
billed_price,
product_discount)
select bill_number,
bills.customer_number,
product_number,
product_price,
case when (customer.start_date<="01/01/2009" and \
customer.balance_due>50000) then 10
else 0
end case
from bills, product, customer
where product.product_number in (1,2,4,7,9,10)
and bills.customer_number=customer.customer_number;
alter table bill_items type(standard);
create index bill_items_idx2 on bill_items(bill_number);
commit;
Note: We took care of this by loading the bill_items table.
-- Update bills - give the best customers a $10 discount
update bills
set product_discount = ( select 10 from customer
where customer.customer_number = bills.customer_number
and customer.start_date <= "01/01/2009"
and customer.balance_due > 50000 )
where bills.customer_number in ( select customer_number
from customer where customer.start_date <= "01/01/2009"
and customer.balance_due > 50000 )
and product_number in ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 );
-- Update bills - calculate the total with sales tax and minus the discount
-- Note: Similar, but taking advantage of our new data model.
begin work;
lock table bills in exclusive mode;
update bills
set total_bill = (( (select sum(bill_items.billed_price-product_discount) from bill_items where bills.bill_number=bill_items.bill_number)) * ( 1 + sales_tax )) ,
|-------10--------20--------30--------40--------50--------60--------70--------80--------9|
|-------- XML error: The previous line is longer than the max of 90 characters ---------|
balance_due=balance_due + (( (select sum(bill_items.billed_price-product_discount) from bill_items where bills.bill_number=bill_items.bill_number)) * ( 1 + sales_tax )) ;
|-------10--------20--------30--------40--------50--------60--------70--------80--------9|
|-------- XML error: The previous line is longer than the max of 90 characters ---------|
commit;
update bills
set total_bill = (( product_price - product_discount ) \
* ( 1 + ( select sales_tax from state where bills.state = state.state ))) \
where product_number in ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 );
Note: Not needed since we have balance in the bills table.
-- Update the customer balance for customers who just got a bill
update customer
set balance_due = balance_due + ( select sum ( total_bill ) \
from bills where bills.customer_number = customer.customer_number )\
where customer_number in ( select customer_number from bills );
-- Calculate the totals - This must match the expected results
-- Note: Modified to read values out of new structures.
select count(*) record_count,
sum ( billed_price ) sum_product_price,
sum ( product_discount) sum_product_discount
from bill_items;
select count(*) recourd_count,
sum ( product_price ) sum_product_price,
sum ( product_discount) sum_product_discount,
sum ( total_bill ) sum_total_bill
from bills;
Note: We had to move the total bill down to this section since
that data is now at the bill level.
Note also: Because there are 4 states in the original customer table
which do not exist in the State table, the number of bills will not
match the number of customers (off by 120) and the balance_due as
computed from bills will not include the 6044509.99 balance due from
these customers. This could of course be corrected, but that is
beyond the scope of this effort.
select count(*) customer_count,
sum(total_bill) sum_total_bill,
sum ( bills.balance_due )
from bills;
-- Perform a checkpoint to make sure all data is written to disk -
-- This is required ;)
execute function sysadmin:task ( 'onmode', 'c' );
-- End of SQL script
|
Learn
- Learn more about the
Fastest Informix DBA
Contest, sponsored by Advanced DataTools.
- Learn more about Information Management at the developerWorks Information Management
zone. Find technical documentation,
how-to articles, education, downloads, product information, and
more.
- Stay current with
developerWorks technical events and webcasts.
- Follow developerWorks on
Twitter.
Get products and technologies
- Build your next
development project with
IBM trial software,
available for download directly from developerWorks.
Discuss
- Participate in the discussion forum.
- Check out the
developerWorks
blogs and get involved in the
developerWorks community.





