- What is server-side caching of storage data?
- Hardware setup
- Demonstration of server-side caching for storage data in PowerHA cluster during manual resource group movement
- Demonstration of server-side caching for storage data in PowerHA cluster during node failure
- Downloadable resources
Implementing server-side caching of storage data in an IBM PowerHA cluster environment
This article explains about how to enable server-side caching of storage data on IBM® AIX® operating system when an IBM PowerHA® cluster is configured to provide high availability (HA) to applications using customized application scripts. AIX offers server-side caching of storage data from version AIX 7.1 TL4 SP. The cache devices might be server-attached flash, such as built-in solid-state drives (SSDs) in the server, flash devices that are directly attached using serial-attached SCSI (SAS) controllers, or flash resources in the storage area network (SAN). The caching functionality can be enabled dynamically, while the workload is running, that is, the act of starting to cache does not require the workload to be brought down to a quiescent state. The caching is also completely transparent to the workload. After a target device is cached, all read requests are routed to the caching software, and if that particular block is found to be in the flash cache, the I/O request is served from the cache device. If the block requested is not found in the cache, or if it is a write request, it falls through to the original storage.
What is server-side caching of storage data?
IBM introduced server-side caching of storage data from AIX 7.1 TL4 SP2 onwards. Storage data caching reduces average latency of the transactional workloads and increases throughput. At the same time, if it can offload significant percentage of read requests from SAN, then the SAN can have better write throughput and effectively serve a large number of clients and hosts.
You can learn more about server-side caching of storage data at the following IBM developer Works wikipages:
- Integrated Server Based I/O Caching of SAN Based Data
- AIX 7 SSD/Flash Cache Best Practice Guide
- Caching Storage Data
When a PowerHA cluster environment is configured, storage data caching can be enabled for the application disks. Because a PowerHA resource group can be manually moved across nodes, it is always important to ensure that caching is started/stopped for application disks as part of PowerHA application controller scripts. Before we dive into the details, we first need to review the key terminologies used in this article.
- Cache device: Any SSD or flash storage used for caching
- Cache pool: A group of cache devices that is only utilized for storage caching
- Cache partition: A logical cache device that is carved out from the cache pool
- Target device: A storage device that is being cached. A single cache partition can be used to cache one or more target devices
- Primary node: A PowerHA cluster node (r7r3m206) that will start the application on cluster start
- Secondary node: A PowerHA cluster node (r73m207) that can restart the application on failure
Figure 1 shows the hardware setup that has the PowerHA cluster formed between the two nodes, r7r3m206 and r7r3m207. AIX partitions are configured with the same shared storage and storage mirroring is enabled for the configured disk across the storage. SSDs are also attached to each of the AIX partitions locally. As per the setup, the application disk is hdisk8, where a shared volume group for the application is created, and hdisk1 is configured as a cache disk.
Figure 1. Hardware setup
In this article, we demonstrate server-side caching with PowerHA cluster by resource group movement manually and by simulating node failure.
The output of the
cltopinfo command is given below.
(0) root @ r7r3m206: / # cltopinfo Cluster Name: r7r3m206_cluster Cluster Type: Linked Heartbeat Type: Unicast Repository Disks: Site 1 (site1@r7r3m206): hdisk5 Site 2 (site2@r7r3m207): hdisk5 Cluster Nodes: Site 1 (site1): r7r3m206 Site 2 (site2): r7r3m207 There are 2 node(s) and 1 network(s) defined NODE r7r3m206: Network net_ether_01 r7r3m206 10.40.0.48 NODE r7r3m207: Network net_ether_01 r7r3m207 10.40.0.49 Resource Group RG Startup Policy Online On Home Node Only Fallover Policy Fallover To Next Priority Node In The List Fallback Policy Fallback To Higher Priority Node In The List Participating Nodes r7r3m206 r7r3m207
Demonstration of server-side caching for storage data in PowerHA cluster during manual resource group movement
This section demonstrates how server-side caching works during manual movement of resource group in IBM PowerHA cluster environment. Manual movement of resource group is performed by stopping cluster services with move resource group option in PowerHA environment.
Step 1. Configure cache on primary and secondary nodes.
Create a cache pool and a partition of the required size. Then assign the partition created to the target disk that is to be cached. In this article, hdisk1 is the cache disk and hdisk8 (application disk) is the target disk. Command to create pool, partition, assign target, and list cache is shown in Figure 2 for the primary node, r7r3m206, and in Figure 3 for the secondary node, r7r3m207.
Figure 2. Output of creating a pool, partition, assigning target to partition and listing cache for primary node
Figure 3. Output of creating a pool, partition, assigning target to partition and listing cache for secondary node
Step 2. Start the PowerHA cluster services and check the application status.
As mentioned earlier, the PowerHA cluster is already configured between the
nodes r7r3m206 and r7r3m207. While configuring the cluster application
controller, start scripts must be configured for the application to
monitor it. Scripts are customized and are attached with this document.
PowerHA cluster services can be started using the command line or by SMIT
smit clstart command leads you to the screen shown
in Figure 4. Here, you can select all the nodes of the cluster to be
Figure 4. Starting PowerHA cluster through smit
After the cluster is stable, you can verify the resource group status using
clRGinfo command (as shown in Figure 5). Currently, the
resource group (RG) will be online on the primary node, r7r3m206.
Figure 5. clRGinfo output after cluster services are stable
Because the resource group is active on r7r3m206, the application will be started on the primary node, r7r3m206. Figure 7 shows the application I/O operation on hdisk8, which is application disk (target disk to cache partition). The start and stop application scripts will be added to PowerHA as mentioned in this article which will check for cache list active state. As there should not be stale data, every time before starting the application, the script will check for the cache state. If the state is inactive, it will directly make it active, but if the state is active, it will restart again to get the cache to the active state. Figure 6 shows the active and inactive states of cache for the primary and secondary nodes, whereas, Figure 7 shows the application running status on hdisk8.
Note: Refer to the Downloads section for the application start script, application stop script and application monitor script. Users can modify the script as per their requirement and input.
Figure 6. State of cache on primary and secondary nodes
Figure 7. I/O operation on application disk hdisk8
Step 3. Verify caching on the primary node.
Figure 8 shows the output of server-side caching on the primary node where cache is active and the I/O operations are running on hdisk8. Figure 8 also shows the command to run to know the details of caching operations.
Figure 8. Caching operation statistics
- Read Count: This is the total number of read operations that were issued. This number is the combination of all applications that issued read commands and were sent to the SAN device or to cache disk.
- Write Count: It is the total number of write operations that were issued to device. This number has no relation to the size of the requests. It is the actual count of separate write requests.
- Read Hit Count: This is the total number of read operations that were issued to the device. The full read hit is a count of the instances in which a read request includes all the data requested to cache.
Step 4. Move the resource group from the primary node to the secondary node and verify the cache.
On manually moving the resource group from one node to another node as shown in Figure 9, the application stop script will first stop the application and cache on the primary node (r7r3m206) will become inactive as shown in Figure 10. Because the resource group is moved from r7r3m206 to r7r3m207, the application start script will first perform the stop cache action to check if there is any stale data and again it will restart cache to turn the cache status on r7r3m207 as active.
Figure 9. Manually moving resource group from primary node to secondary node
Figure 10. Cache inactive on primary node
Because RG is moved from the primary node (r7r3m206) to the secondary node (r7r3m207), I/O operations will stop on application hdisk8 on r7r3m206 and cache hit count will be 0 initially on r7r3m207 where cache will be active. After the application, I/O started on hdisk8 on r7r3m207, cache process will start on hdisk1 (which is the cache disk). Figure 11 shows that cache is active on the secondary node after resource group is online on that node and I/O operation will be happening on it, along with cache management.
(0) root @ r7r3m207: / # cache_mgt cache list hdisk8,p1,active
Figure 11. Cache statistics on r7r3m207 node after application move
Step 5. Move back resource group from r7r3m207 to r7r3m206.
A similar operation will be performed on manually moving back the resource group from node r7r3m207 to r73m206. Figure 12 shows that the resource group is active on r7r3m207 and a movement of resource group to the r7r3m206 node is manually initiated.
Figure 12. RG online node and RG movement to r7r3m206
After the resource group is moved to r7r3m206, the application will be inactive on r7r3m207 and cache will be inactive on that node. Figure 13 shows that cache is inactive on r7r3m207 along with no I/O operation happening on hdisk8 and resource group online on r7r3m206. This will be handled by the application server stop script configured in the PowerHA cluster. Along with the application stop, the script will try to first stop the cache management That is, it will get cache into the inactive state.
Figure 13. Cache inactive on r7r3m207, no I/O operations
After RG tries to come back online on r7r3m206, the application start script configured in the PowerHA cluster will get the cache list active on that node before starting the application. So, application read/write or I/O operation starts after cache is active, and the read hit count can be seen in the cache management list. Figure 14 shows that cache is active on r7r3m206, resource group is online, and application I/O has started on hdisk8 along with cache management verifying read hit count.
Figure 14. Cache active on r7r3m206 and verifying cache management
Demonstration of server-side caching for storage data in PowerHA cluster during node failure
This section demonstrates how to encounter server-side caching during a node failure scenario using the reboot option. Initially, resource group will be online on r7r3m206 and offline on r7r3m207. Cache management will be active on r7r3m206 and I/O operation will be running on the same node.
Step 1. Reboot the r7r3m206 node.
This step is in continuation with the previous demonstration of the manual RG movement option. The resource group is currently online on r7r3m206 and therefore, you can perform node failure using the reboot option on r7r3m206. Figure 15 shows the rebooting of the node.
Figure 15. Rebooting primary node, r7r3m206
Step 2. Verify cache management and resource group state after rebooting.
After the primary node (r7r3m206) is rebooted, due to node failure, the resource group will try to move to the secondary node (r7r3m207). During the movement, the application server start script brings cache management active on r7r3m207 and starts I/O operation. Figure 16 shows cache management being active on r7r3m207 and I/O operation running on that node.
Figure 16. Cache management and I/O operation on secondary node, r7r3m207
Step 3. Verification of cache management after rebooting primary node.
Because the reboot operation was run on the primary node (r7r3m206), the resource group moved to r7r3m207 and the respective application started on that node. But after reboot, cache management starts automatically and cache list will be active on r7r3m206. Figure 17 shows the cache list state and you can notice that the I/O operation is not running on r7r3m206 because the resource group is not active on that node. So, even the read hit count in cache management statistics will be 0 because there is no I/O operation happening on r7r3m206. Cache management statistics for r7r3m206 with 0 read hit count is shown in Figure 18.
Figure 17. Cache list active and I/O operations inactive on r7r3m206 after reboot
Figure 18. Cache management statistics on r7r3m206
Step 4. Start cluster service on the primary node, r7r3m206.
Because r7r3m206 was rebooted, the resource group moved to r7r3m207 and
application I/O was active on r7r3m207. After the r7r3m206 primary node is
up and running after reboot, the cluster will be inactive with respect to
PowerHA. So, on starting the cluster service again on r7r3m206, the
resource will fall back to that node from r7r3m207 as per priority defined
in the cluster configured. You can start the cluster services using the
smit clstart command. When the cluster services are started
at r7r3m206, the resource group (RG) will fall back to r7r3m206 and will
be online on r7r3m207.
As per the configuration of the application start script, it will first try to restart cache list on r7r3m206 before starting the application. After the application is started, the read hit count can be seen on the cache target disk. Figure 19 shows the application that is moved back to r7r3m207 and it also shows cache management statistics.
Figure 19. Resource back to primary node and cache management on same node
Step 5. Stop application and cache management on secondary node.
Because cluster services have started again on r7r3m206 after node reboot, the resource group (RG) will fall back from r7r3m207 to r7r3m206 as shown in Figure 20. During this operation, the application PowerHA stop script that is configured, will bring the cache list to inactive state on r7r3m207 and then will stop the I/O operation on the same node as the resource group will be offline on that node. Figure 20 shows that the cache management has stopped on r7r3m207 and also the I/O operation has stopped on that node using the PowerHA stop script configured in the application server.
Figure 20. Application I/O operation stopped along with cache management on secondary node
This article demonstrated the implementation of server-side caching of storage data in an IBM PowerHA cluster environment during manual resource group movement and also in case of natural disaster where node failure occurs. This helps in reducing average latency of the workloads and achieving better write throughput by offloading requests from a storage area network.