AIX 7, VIOS and SSP - SSD/Flash Cache Best Practice
I have been asked for this a couple of times and the answers that spring to mind are
- Ha ha ha ha ha ha! Oh, you are serious!
- Consultant standard answer number 1: "It depends"
- Hmmm! Tricky!
I guess the problem is: The questioner is really wanting a prediction of what will happen and is it going to be cost-effective. And these questions have no answer.
If you look the topic up on Google you will get hits on the Official Announcement and a few cut'n'pastes of them, which amounts to a paragraph only. So below is my collection of information that might help you.
If you come across other information and recommendations PLEASE COMMENT BELOW so we all benefit.
- Don't bother to try this on a small Virtual Machine
If you have say, 1 or 2 CPU, 4 GB of memory and 20 GB of disks you should get instant performance gains just by adding memory and caching it that way. Very simple to try. Very low risk. No need for SSD/Flash. Relatively not too expensive.
- Have larger warm disk data than you can have RAM
If you have more data in regular I/O (not just archive coldly on the disk) than you could possibly afford in memory then the SSD/Flash cache can boost data access i.e. multiple TB of warm data.
- Don't Cache a Flash
If you are already using SAN-based Flash for your data The SSD/Flash cache is unlikely to help further - IMHO.
- Read the excellent article from Manoj Kumar performance guru for technical background and details
- Downloadable PDF file https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/a23337d9-f1ab-4f03-876c-c38dc29a586a/attachment/b7e402cf-fbf8-4e3f-8547-bf4c77d868e0/media/AIX72%20_flash_cache_blog.pdf
- Only Web version https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power%20Systems/page/Integrated%20Server%20Based%20Caching%20of%20SAN%20Based%20Data
- Add 4 GB extra RAM.
The cache management algorithm needs to keep a history of read access and then need memory - don't starve your Kernel.
The documents I have read say 4 GB minimum virtual machine size = that would be bonkers!
That 4 GB minimum size actually means the developers did not test this feature with stupidly sized micro-partitions. My Laptop here has 16 GB RAM!
- No TIPping = No Testing In Production
Test it before production use with a reproducible workload (benchmark) so you know for sure you are getting a performance gain and how much.
This also will justify the costs and time invested.
- Have your SSD Setup for high performance.
Don't ass the SSD (or Flash) on an adapter that is already swamped with I/O - If the cache is not super fast there is not much point.
- There are somewhat unexpected limitations on the numbers of disks and cache -I hoped these will be improved in later releases.
Only one cache pool.
Only one cache partition.
The command syntax suggests more could be possible but in practice, it will give you an error if you try it.
- Don't ask for a prediction - if we could do that I would play the Loto, become a billionaire and then the President of the USA!
- Cache Size recommendation - ha ha ha ha ha ha ha ha!
This creates a read cache.
You can imagine how that operates as well as me.
It is extremely unlikely you can measure how often your workloads re-reads disk blocks nor the working set size of the re-reads.
If you can then you know how much disk I/O speeds will improve.
- Example from Jaqui Lynch (friendly performance guru that sparked of the need for the Best Practice)
She is thinking off using four SSD's (700 GB each) with 20 TB of data on the file systems (unknown re-read rate or set size).
Cache to data ratio = 14% (4 x 700)/20,000*100
My guess is that is pretty healthy and they have a good chance of a large performance gain.
If the ratio is just a handful of percent it still might be OK, if the working set is a lot less than the full 20 TB.
For example, just one SSD at 700 GB = 2.5% but the hot part of the data used in any one days is 5 TB then we get back to 14%.
- The Knowledge Centre (non American spelling) manual pages are here
and the cache_mgt AIX/VIOS command can be found here
- Oldest AIX versions supported
AIX 7.1 TL4 SP2,
AIX 7.2 TL0 SP0
But please have the most up to date possible AIX and the latest firmware and VIOS and HMC = Good Practise for best performance.
- SAN Offload
If your SAN is already busy you get the double win:win - the cache gives a performance boost AND less pressure on the SAN which means the non-cache disk I/O goes faster.
- RAS: Cache Failure
Some are concerned the SSD/Flash is not redundant i.e. AIX does not mirror the cache to two devices to handle the case of a device failure.
This is true. You should think of the cache as a turbocharger and the server sized to survive the workload without the cache.
You may be able to provide hardware based mirroring or RAID5 in the SAS RAID adapter (assuming that is in the configuration) - I have not checked this option. Feedback welcome.
The Cache is read-only which means it can instantly be switched off on your command or a problem. The next disk read I/O goes back to the real disks which are always the master copy. The cache is never used to stage a write I/O - so there is 0% flushing the cache issues. The cache old copy is invalidated and the I/O goes to the disks. This makes an instant cache stop 100% safe.
- RAS Cache Offline
You should design your server so that will provide a satisfactory service without the Cache. Think of a car with a turbocharger. When the Turbo is working I can cruise along at 150 MPH (ignoring any legal issues). If the Turbo stops then I am reduced to 70 MPH but I can still get hoe at the end of the day. The Turbo is a "great to have" feature but I can survive without it.
- RAS Cache Redundancy
I have system designers saying that the Cache must be available 100% of the time or they are not going to meet their Service Level Agreements so they demand it is fully redundant and IBM must address this immediately. All I can suggest is they call their IBM representative and fill in a Request for Enhancement - we will need client demand to make this a high priority.
- RAS: Machine Evacuation - LPM is IMHO Mandatory
You can use Live Partition Mobility only if you are using the virtual I/O via the VIOS to the SSD/Flash.
If you have direct SSD/Flash access via a physical adapter LPM is not available, obviously. But you could in an emergency switch off the cache, remove the Cache Devices from AIX, remove the adapter from the LPAR and then LPM. But then you have no cache unless you have the same hardware on the LPM target machine.
- VIOS based Cache
Good news this then allows LPM (by cache removal) but the bad news it is will be slower due to the virtual I/O layer between VM and VIOS (although that is in normal daily use on most servers). I have no indicative numbers so don't ask.
- RAS VIOS based Cache
The cache will only be on one VIOS. There is no dual path to an SSD on one VIOS. If using the SSD Cache via a VIOS and you have to shut down that VIOS the cache is switched off.
- As the cache is mostly read it will not benefit much from SSD attached by SAS adapters with write cache.
- On Scale-out POWER8 machines you can use the SSD slots in the System Unit or placed in a EXP24S Drawer.
- As the AIX SSD/Flash cache algorithm decides what to cache based on history it takes time to select the blocks. From my simple tests, this required 5 or more minutes as the performance accelerates. With larger, more complex and changing over time of day disk I/O patterns, for example, RDBMS access you should leave a suitable time. Especially, for larger caches this could be hours or even over 24 hours if you have different workloads throughout the day like morning analysis, afternoon order processing and various batch runs overnight.
If you come across other information and recommendations PLEASE COMMENT BELOW so we all benefit.
At the Power Technical University at Rome Nicolas Sapin, Oracle & AIX IT specialist, IBM presented his results of AIX Caching with an Oracle RDBMS he achieved well over 3 times the transaction rate of SQL statements taking a quarter of the time. He pointed out that at around 30%+ the AIX cache was approaching the performance of the whole database on a Flash Disk Unit - which of course costs more. Note: this suggests that the hot data was roughly 30% of the database. Sessions and shared experience like this are typical of the Technical Universities - Don't miss them.
My thanks to the various AIX Performance developers and designers for taking questions, delivering presentation and testing support:
- You know who you are and they are typically reluctant to be named publicly (or the phone will never stop ringing and the can't get on with the next new exciting feature for me to play with).
Flash Cache Statistics documentation
Indirectly talking to the developers and they point out the Flash Cache algorithm is similar to those used in IBM EasyTier.
Here is a draft version pending it being added to the IBM KnowledgeCentre
cache_mgt monitor get -hs
ETS Device I/O Statistics -- hdisk1
Start time of Statistics -- Mon Mar 27 07:10:41 2017
Read Count: 152125803
Write Count: 79353626
Read Hit Count: 871
Partial Read Hit Count: 63
Read Bytes Xfer: 10963365477376
Write Bytes Xfer: 4506245999616
Read Hit Bytes Xfer: 48398336
Partial Read Hit Bytes Xfer: 5768192
Promote Read Count: 2033078104
Promote Read Bytes Xfer: 532959226494976
Explanation of fields with "cache_mgt monitor get":
The total number of read operations that were issued to the target device. It is the count for all applications that issued read commands that were sent to the SAN device or to cache. This number has no relation to the size of the requests. It is the count of separate read requests.
Read Hit Count
The total number of read operations that were issued to the target device that are full cache read hits. The read hit count is the total number of instances in which a read request is satisfied entirely by the cache. This number has no relation to the size of the requests. It is the actual count of separate read hit requests, and the value is a portion of the "Read Count."
Partial Read Hit Count
The total number of read operations that were issued, which are partial cache read hits. The partial read hit count is the total number of instances in which a read request had part, but not all, of the data requested in the cache. The remainder of the data not available in cache must be acquired from the SAN device. This number has no relation to the size of the requests. It is the actual count of separate read requests, and the value is a portion of the "Read Count."
Promote Read Count
This is the total number of read commands that were issued to the SAN as part of the promote into cache. This number is not tied to the number of promotes because a 1 MB read promote might be divided into multiple read requests, if the maximum transfer size to the SAN disk is less than the 1 MB fragment size.
Read Bytes Xfer
The total number of bytes that were read for the target device. It is the total bytes transferred for read commands that were issued from the applications into the driver, and represents the total byte count of read hits, partial read hits, and read misses.
Read Hit Bytes Xfer
The total number of bytes that were read through the target device that were full cache read hits.
Partial Read Hit Bytes Xfer
The total number of bytes that were read through the target device that were partial cache read hits.
Promote Read Bytes Xfer
This is the total number of bytes that were read from the SAN for promotes.
The total number of write operations that were issued to the target device. This number has no relation to the size of the requests. It is the actual count of separate write requests.
Write bytes Xfer
The total number of bytes that were written to the target device. It is the total number of bytes transferred from all write commands that were issued from the applications into the device.
This diagram might be helpful:
- If you are giving AIX 7 SSD/Flash Cache a try, please let me know how it went.
- Also the basic configuration, a summary of the workload type and the results of your tests.
What more Information?
2 Excellent Article (Manoj Kumar)
3) Jaqui Lynch Articles in IBM Magazine