In this case, a tortoise-brained hare is an animal that is capable of going fast but wonders why he can't.
Over the last six months or so I have seen a "trending" situation with those who use the Organize-On. This is some pretty cool functionality so it's important that we get it right. Generally speaking, folks who understand how zone-maps work will have a splendid time with Organize. Others, not so much. The machine is supposed to be fast, so why are all my queries so slow?
- All I have to do is add the Organize On keys and I'm good to go. Answer: No, you have to groom first, as this actually applies the Organize to the physical data. Response: Really? when we did this it took forever! Answer: Yes, the first groom may take a bit of time but every groom after that will be painless.
- We CTAS a table every night for maintenance and the groom is always running slow. Answer: if you are using Organize On, don't use a CTAS for regular maintenance. With an Organize, the table is physically broken apart and spread across multiple additional extent-pages to provide for easier groom maintenance. A CTAS re-compresses it. So another groom is required to blow it back out again.
- We like to use Materialized Views to optimize our tables but now the Organize doesn't let us. How can we get this back? Answer: You don't, because the Organize essentially replaces the Materialized View and does it so much better. Asking this question means you might not understand zone maps as well as you think you do! Just sayin'
- We have included the distribution key in the Organize, along with some other join keys. Answer: (Sigh) -Remove them. Join-only keys do not belong in the Organize. The Organize is for filter attributes. Uh, perhaps you don't understand zone maps? Just sayin'
- We have some lookup tables where we applied an Organize. But this didn't seem to matter. Answer- No it won't because if the table is not big enough the system won't use zone maps anyhow.
- We were told to turn off Page-Level zone maps but this caused a 5x reduction in performance. Answer: Page-levels radically reduce disk I/O which is our number-one enemy. Turn them back on and go home happy.
It is clear that our outlying cast of folks who have never been introduced to zone maps think that the Organize is either a clustered-index or some other kind of indexing scheme. Fair enough, is this because of the deceptive naming? Like the Materialized View? (okay, don't get me started). The Organize keys are not keys in the same sense as indexes and are certainly not used as join keys.
They are filter-attributes. By this we mean that we fully intend to use these columns with constant values, lists or selected-lists as constraints.
Zone Map Primer
Here I will use my favorite example because it is time-tested and describes the capability. If I pay a visit to Wal-Mart and I'm looking for batteries, I might visit the customer kiosk and the lady tells me that the batteries are "on the end-cap, Aisle Three". This is an indexing model because she told me exactly where to look. This does not scale because when we get to billions of rows we spend more time searching the index than retrieving records.
In a Netezza model however, she would say "It's on an end-cap but I'm not sure which one." So I go to Aisle 1, then 2, then 3 and jackpot, I buy the batteries and go home. The most important part of her answer however, was in what she "did not say". She did not say that the batteries were in Men's or Women's clothing, Automotive or Electronics. In constraining the search to a particular location, she also told me "where not to look". This aspect of "where not to look" is critical to understanding zone maps. It is also critical to stratospheric scalability. Clearly if this Wal-Mart were to quintuple in size, my battery-buying-duration experience would not change in the least.
Now for the geeky part:
Each Netezza disk has 120K physical extents. Each extent has 3MB of space, divided across 24 pages(blocks) of 128k each. If we are running prior to OS 7.x, we will be zone-mapping at the extent-level. If we are 7.x or higher, we will use the page-level. Run don't walk to 7.x and page-level zone maps! They will radically reduce I/O on the extent and this is critically important. A set of records found only on a single page can return in 1/24th of the time than if it has to scan the full extent (96 percent faster). In a real-world experiment, the zone maps on a Twinfin-18 were changed from page-level to extent-level and the same battery of queries executed against it. It performed 5x to 10x slower than with the page-level turned on. Do not underestimating the influence of disk I/O on query turnaround. Netezza is a very physical machine.
Here's another internal example: Compression in Netezza is a nominal 4x. I have seen it much higher. Let's say we have an uncompressed record that is 10,000 bytes in size. When we read it, we will read all 10,000 bytes. If the data is compressed however, we will read only 2500 bytes, a 75% reduction in disk I/O. Netezza is the only platform where compression boosts the power in both reading and writing data because it reduces disk I/O tremendously.
Zone maps are a table that Netezza holds in memory for supporting a table. Each table has its own set, describing the contents of the extents/pages allocated to the table on a given dataslice. The contents of the zone maps are built automatically when we load up our table. It will collect the information from the records stored on the given extent/page for all integers, dates and timestamps. It will store the high and low value of each, representing the table's record "ranges" for that extent.
Thus when we execute a query using one of these columns as a filter-attribute, it will go to the zone maps first and cherry-pick only those extents containing the data we want, excluding all others. this means that the data on those other zone maps won't even see the light of day for the query-in-play. If we use an Organize, we can use a wider range of data types and be more deliberate with zone-map management.
Let's say we want to search-on the transaction-date on our fact table. If we have not physically organized the data, the same transaction-date value may appear across many hundreds or thousands of extents, affecting the high/low range of many zone maps. If however, we were to physically co-locate the records with common dates into tighter physical groups, the records will physically appear inside fewer zone maps. These are the extents/pages that the machine will cherry-pick for scanning and will completely scan each one. We want it to scan as few of these as possible.
In times past, we had three primary ways of doing this.
- A brute-force sort, which is pretty egregious when the table gets very large.
- A software program that separates the key's distinct values and executes a block-select from the original table to the new, selecting only once for each distinct value. This physically co-locates like-keys. (the data does not have to be sorted, only like-keys co-located)
- A materialized view, which would manufacture a virtual zone map (which is why it's not allowed with the Organize)
Let's say that our table's data takes up 400 extents on each data slice. If our transaction-date appears in 300 of these (it is poorly organized) then when a query runs like so:
Select count(*) from mytable where transaction_date = '2014-01-01';
All 300 extents will be searched for this information. However if we Organize on this transaction-date and then groom - the data will be physically shuffled around to co-locate the records with the same transaction-date on as few extents as possible. Let's further say that once this happens, the given date appears in only one extent. What we have just done is optimized the table 300x. We have eliminated 299 other locations to look for data. This is important because scanning a 3MB extent is a lot of work. If we are scanning 299 additional extents for each dataslice, we're really doing a lot of extra work for nothing. If we translate this to a page-level problem, we may have 24x300 pages originally containing the keys. If with Organize we reduce this to a single page, we have further reduced our scanning load by another 96 percent of the extent containing the page.
The important factor in the example is the "out-in-the-open" value of "2014-01-01". It is a "filter attribute". It is not being joined to a table with this value, Doing it this way means that the FPGA/CPU will discover which zone maps have a high/low boundary that contains this value and will retrieve a candidate list of them, If there is only one as opposed to 300, we have radically reduced out workload. Netezza will literally exclude those extents from being examined at all. We have told the machine where-not-to-look. If we join this value however, such as using a time dimension, applying the date value to time dimensions and joining the time dimension to our fact table, we will require the system to fetch the record in order to examine it, at which time it will determine whether to keep it or toss it. this sort of thing can initiate a full table scan, nullifying the zone map entirely.
We don't want this to happen. We want the data to stay on disk and never see the light of day if it's not participating in the query.
To show how dramatic this can be, we were at one site hosting over 100 billion rows in the primary fact table. A full scan of this table took 8 minutes (which is not too shabby in itself, just sayin' ). The reporting users knew that if a query ever exceeded a few minutes in duration, it was probably ignoring the zone maps. This is because once-Organized, this table would return the average query in sub-second response. Think about that,100 billion rows in sub-second response.
This is why paying attention to zone maps is such a big deal. Optimizing distribution can get us boost in the single-digits (2x, 3x etc) on a given query. Optimizing zone maps can get us 1000x boost and higher.
The Organize-On accepts one or more keys which will be applied to physically co-locate records of like-valued keys, then it will update the zone maps. Here is a test to see if we understand the application:
Take one of your largest tables. CTAS the table to another database, order by the distribution key, or by the hidden "rowid" column to make sure that the given filter key is not ordered. This could take a bit of time, of course. Then perform a query using one of your date parameters as in the example above, and time the query. Now perform an
Alter Table tabname organize on (date column name here).
Then perform groom. Once completed, execute the same query and get a timing. We can see that a several-minute query can go sub-second very easily.
What's more the additional keys in the Organize are (more or less) independently organized. They will all enjoy a much faster turnaround than not using the Organize. If records arrive on the table out-of-order, no worries. Run the groom again. Subsequent runs of groom will always be shorter in duration than the first one.
Clearly however, if the zone-map is intended to apply filter-attributes to guarantee the exclusion of extents/pages completely, we cannot use a join-key. Or at least, not a join-only key. This also means that the distribution key is out (unless we plan to call-up an individual record based on the distribution key). Also, generally do not mix high cardinality keys with low cardinality keys. Netezza finds its strength somewhere in the middle. We will find that it disfavors the low-cardinality keys when high-cardinality ones are in-the-mix.
A good way to tell which of the filter attributes for a table are "high-traffic" is to turn on query history and then examine the table/view associated with "column access statistics - $vhist_column_access_stats. This will provide the number of times the column participated in a query and with which table(s) the base table interacted with. Perform a descending sort on the NUM_WHERE column and this will reveal all. In this short list we will see filter-attributes that are most useful. Don't use any of the join-only columns or the distribution key. These may adversely affect the multi-key algorithm's output and might not optimize any zone maps for this table.
At one site, we noted several inappropriate keys in the Organize, and simply by removing them and "grooming" again, the table experienced a 100x boost. The inappropriate keys were washing out the effectiveness of the other keys.
Many of us have seen this skyline, with buildings of various heights stabbing toward the sky. Compare this to the distribution graph that is part of the Netezza Administration GUI application. Normally this should be very flat (but a jagged-edge is usually okay). Clearly a Manhattan skyline is forbidden.
Or is it? if we have a table that is very skewed (like a Manhattan skyline) but the data can be easily "horizontally" sliced with zone maps, our round-trip time for a "tall" data-slice is no different than a "short" one. All we need to look out for is process-skew (too many horizontal slices on one dataslice)
Measuring the madness
Okay, David this is all very interesting but how can I know which extents or pages or whatever is being used by the keys? Well, there are a couple of handy hidden columns on each row that can help tell-the-tale. One is the _PAGEID and one is the _EXTENTID.
select count(*) , datasliceid dsid, _pageid pid from fact_customer group by datasliceid, _pageid
will tell us how many distinct pages are being used for each data slice.
select count(*), datasliceid, _pageid from fact_customer where transaction_date = '2013-01-01' group by datasliceid, _pageid order by datasliceid, _pageid;
In the above, the "count" should be reasonably even for each dataslice.
If the total count is radically more than "1", then let's organize on transaction_id and then groom. Now try the metrics again to see if it did not reduce the total pages.
I am sure with these two columns in-hand you can think of a variety of creative ways to use them. The conclusion of it all is to get the records with common key values packed as closely as we can so they take up as few extents / pages as possible.
It's a wrap
So now that the Organize seems a little better, you know, organized, maybe this will provide a bit of guidance on how to set up your own Organize and zone maps.
And don't forget to groom when changing the Organize keys.
We won't need to groom every time we do an operation. I would suggest a groom on both a schedule and a threshold. Pick a threshold ( a lot of folks like five percent). When the total deleted rows gets above five percent of the total non-deleted rows, or the total pages per unique data point gets above an unacceptable threshold, it's time to groom. But grooming on every operation is expensive, has marginal value and actually may throw away records we wanted to keep (in case of emergency rollback).