Introduction to Active Cloud Engine
Active Cloud Engine is the new file management functionality provided with IBM SONAS.
Data is growing in today's times and organizations are spread across multiple geographies and sites. This causes immense need for data sharing. SONAS can help store huge amounts of data and Active Cloud Engine helps to manage this data very effectively.
Active Cloud Engine is integrated with IBM General Parallel File System (IBM GPFS™) and provides local data management capabilities through storage pools, file sets, and policy engine. Active Cloud Engine also adds remote data-management capabilities to SONAS. It provides seamless data movement between SONAS clusters and sites with ease. It can do the same on demand, periodically and continuously which makes it extremely flexible.
It provides a single view of files, across all sites regardless of their physical location. Users can access the remote files as if they are present locally, using global namespace consolidation. Any organization can benefit with data collaboration as critical information is shared across and easily found. It serves the data to users quickly, no matter which site they are located in.
Active Cloud Engine is scalable and masks wide area network (WAN) latencies and outages by using GPFS to cache massive data sets, allowing data access and modifications even when the remote storage site is unavailable. In addition, Active Cloud Engine sends updates to the remote cluster asynchronously, which allows applications to continue operating while not being constrained by limited outgoing network bandwidth.
Active Cloud Engine makes management of cache data simple. It can also remove stale data from the cache automatically, based on policies. Active Cloud Engine implementation uses the inherent scalability of GPFS to provide a multinode, consistent cache of data located at a home cluster.
Active Cloud Engine terminology
Home cluster: This is the site where a primary copy of data is stored.
Cache cluster: This is the remote site that caches the data.
Gateway node: On the cache site, a few nodes in the cluster are assigned special responsibility of acting as gateway nodes. These gateway nodes are used to send and receive data from the home cluster. The interface nodes in a SONAS cluster are used for serving files to clients. An administrator can configure a node as both interface and gateway.
Application node: An application node is any node in the cache cluster that gets I/O requests from applications. A SONAS interface node is typically an application node.
File sets: They provide a method for partitioning a file system and allow administrative operations at a finer granularity than the entire file system.
Home file set: A home file set is a file set on the home site. Home is configured as a Network File System (NFS) V3 server and exports file sets using the
Caching file set: A cache site creates a file set and associates it with the home exported data using the
mkwcache command. A cache can operate in one of the following supported modes: single-writer, read-only, and local-update. There can be multiple cache sites for a file set exported from the home site. Active Cloud Engine cache can also support sparse files and access control lists (ACLs) that are compatible with GPFS.
Modes of operation
When a cache is configured in this mode, the cached data is available only for reading. If data is modified at home, it gets pulled into the cache after the revalidation duration. The cache behaves like a read-only file system and creating and modifying files is not allowed.
When a cache is configured in this mode, the cache site can exclusively write data. The data written in the cache gets pushed to the home site asynchronously, hiding WAN latencies. This also helps provide better performance to applications, as write-back caching is done. Currently, Active Cloud Engine supports only one single-writer cache, for a file set at home to avoid any conflicts caused by multiple writers trying to write to the same file.
When a cache is configured in this mode, the cached data is available for reading and writing. But the data modified at the cache site is not sent to the home site. So, this mode serves as a scratch-cache. After the data is modified at cache, new updates made at home for that data object are not pulled into the cache.
Commands for setting up an Active Cloud Engine relationship
The following commands provide a quick overview on how to configure and set up an Active Cloud Engine relationship between two sites.
Configuration of home SONAS
1. Create a file set and link it to the file system. This step is not mandatory. If you already have a file set created, then you can use that for further steps.
Listing 1. Command for creating a file set
[firstname.lastname@example.org ~]# mkfset gpfs0 home_data --junction /ibm/gpfs0/home_data (1/2) Creating file set EFSSG0070I File set home_data created successfully. (2/2) Linking file set EFSSG0078I File set home_data successfully linked. EFSSG1000I The command completed successfully.
2. Create a cache source from an existing directory tree on SONAS. While creating a cache tree, you need to provide an IP address and mode in which client would be accessing the data. The following example exports the home_data directory under /ibm/gpfs0 to 10.0.100.30 cache with read-write access.
Listing 2. Command for exporting the file set using mkwcachesource
[email@example.com ~] # mkwcachesource home_data /ibm/gpfs0/home_data --client "10.0.100.30(rw)" EFSSG1000I The command completed successfully.
3. After exporting the file set for caching, verify that is it visible in
Listing 3. Command for listing the exported file sets from home
[firstname.lastname@example.org ~]# lswcachesource WCache-Source Name WCache-Source Path ClientClusterId ClientClusterName WCache-Source Access Mode Is Cached Remote system name home_data /ibm/gpfs0/home_data 12402849612037252644 st003.virtual1.com rw no st003.virtual1.com EFSSG1000I The command completed successfully.
Configuration of cache SONAS
1. Before the creation of any cache file set, some of the interface nodes need to be designated as gateway using the
mkwcachenode command. In the following example, int001st003, int002st003, and int003st003 nodes are given the role of gateway.
Listing 4: Adding the gateway node at cache
[email@example.com ~] # mkwcachenode --nodelist int001st003,int002st003,int003st003 EFSSG1000I The command completed successfully.
mkwcache command is used for the creation of a cache file set and associating it with the home Active Cloud Engine export. The
mkwcache comand requires the following parameters:
--cachemode parameter specifies the mode of operation for cache. The valid values are single-writer, read-only, and local-update. If nothing is specified then default mode is set to read-only.
--remotepath parameter must be set to the home IP address and the home export path similar to an NFS mount.
--homeip parameter needs to be set to the IP address of the home management node.
Listing 5: Command for creating an Active Cloud Engine file set
[firstname.lastname@example.org ~] # mkwcache gpfs0 cache_ro /ibm/gpfs0/cache_ro --cachemode read-only --remotepath 10.0.100.142:/ibm/gpfs0/home_data --homeip 10.0.100.20 EFSSG1000I The command completed successfully.
3. Verify that the newly created cache file set is visible using the
Listing 6: Listing ACE file sets
[email@example.com ~]# lswcache gpfs0 ID Name Status Path CreationTime Comment RemoteFilesetPath CacheState CacheMode Remote system name 1 cache_ro Linked /ibm/gpfs0/cache_ro 8/6/12 11:15 AM 10.0.100.142:/ibm/gpfs0/home_data enabled read-only st002.virtual1.com EFSSG1000I The command completed successfully.
4. Verify the state of the cache file set using the
Listing 7: Listing the state of the Active Cloud Engine file sets
[firstname.lastname@example.org ~]# lswcachestate gpfs0 --connectionstatus -r EFSSG0015I Refreshing data. FilesetName FilesystemName Remote Fileset Path Remote Fileset Status Cache-Gateway Assigned Cache-Gateway Status QueueLength QueueNumExec Remote system name cache_ro gpfs0 10.0.100.142:/ibm/gpfs0/home_data Active int003st003 Active 0 8 st002.virtual1.com EFSSG1000I The command completed successfully
The Active Cloud Engine relationship is now set up and the file sets are ready to be used.
How Active Cloud Engine caching works?
After home and cache are configured, the SONAS clients can start accessing data from interface nodes on the cache cluster.
Figure 1: SONAS Active Cloud Engine architecture
Reads: When a read I/O request is received at the interface node and data is already cached, it is served from the interface node directly. If data is not available in cache, then the interface node sends a fetch request to the gateway. The gateway node reads the data from the home server and stores it locally. After that, it sends a reply to the requesting interface node. On receiving this reply, the interface node reads the data from the local storage and honors the client request. As soon as the requested portion of data is made available, the application is served immediately. It does not wait for the entire file to be fetched from home. But in the background, the rest of the file is also fetched and cached.
Writes: Active Cloud Engine handles the write I/O asynchronously. When a client sends a write request to the interface node, it does the write on the local storage and sends a Remote Procedure Call (RPC) to the gateway node. It then completes the write for the client without actually waiting for the write to happen on the home site.
The gateway node asynchronously sends this data to the home site. Asynchronous writes hide the network latency between the home and cache site. The gateway also intelligently merges small writes into bigger writes and removes duplicate writes to improve network efficiency. Active Cloud Engine also supports the disconnected mode of operation, that is, when connection to the home site fails because of network or hardware issue, then clients can still access the cached data. Any data written on the cache is remembered. After the connection to home is restored, the changed data is sent to home.
Capabilities provided by Active Cloud Engine
Active Cloud Engine pulls data from home on demand. But, if the requested data is not already cached, then applications need to wait for the data to be pulled from home, over the network. Active Cloud Engine also provides the prepopulation feature to pull all the required data at the cache site before applications start accessing the same. If a user wants to pull data into the cache before any access, then the user can generate a list of files that the user is interested in, using policy scan or any other method. Prepopulation using this file list can then be started.
This feature also helps to better use the network bandwidth. As prefetching works in the background, it can also be set up as a cron job. For example, you can run prefetch every night to populate cache with the latest data. When users start accessing data in the morning, it is already available at the cache site.
Users can configure cache sites with less storage than the file set size on home, by setting quotas on the cached file set. Active Cloud Engine will monitor the usage of the file set and automatically trigger eviction if the data size at cache exceeds the allocated quota. Cache eviction will typically find the least recently used files and then empty the data blocks for them to free up space for new data. The unwanted files and temporary files can also be removed from cache using eviction.
Disconnected mode and expiration
An Active Cloud Engine file set goes into the disconnected state when it loses connection with the home site. If the disconnection persists for a longer duration, then data at cache is made unavailable so that clients do not access stale data for read-only file sets. Expiration is disabled by default and can be enabled using the
chwcache command by giving the expiration timeout value. The cache file set is expired after expiration timeout, past disconnection.
When the connection with home is restored, data is made available to the client. Before sending data to the client, a revalidation with home is performed to check whether the data at home has changed. The cache file set can also be expired and unexpired manually using the
Listing 8: Setting expiration timeout on a read-only file set
[email@example.com ~] # chwcache gpfs0 cache_ro --expirationtimeout 60 EFSSG0071I File set cache_ro changed successfully.
Listing 9: After home or network failure, file set becomes disconnected
[firstname.lastname@example.org ~]# lswcachestate gpfs0 cache_ro --connectionstatus -r EFSSG0015I Refreshing data. FilesetName FilesystemName Remote Fileset Path Remote Fileset Status Cache-Gateway Assigned Cache-Gateway Status QueueLength QueueNumExec Remote system name cache_ro gpfs0 10.0.100.38:/ibm/gpfs0/home_data Disconnected int001st002 Active 0 5 10.0.100.38 (non-sonas) EFSSG1000I The command completed successfully.
Listing 10: After expiration timeout, file set is expired
[email@example.com ~] # sleep 60 ; lswcachestate gpfs0 cache_ro --connectionstatus -r EFSSG0015I Refreshing data. FilesetName FilesystemName Remote Fileset Path Remote Fileset Status Cache-Gateway Assigned Cache-Gateway Status QueueLength QueueNumExec Remote system name cache_ro gpfs0 10.0.100.38:/ibm/gpfs0/home_data Expired int001st002 Active 0 5 10.0.100.38 (non-sonas) EFSSG1000I The command completed successfully. [firstname.lastname@example.org ~]# ls /ibm/gpfs0/cache_ro ls: cannot access /ibm/gpfs0/cache_ro: No such file or directory
Listing 11 : Manually unexpire and access the cached data
[email@example.com ~]# ctlwcache gpfs0 cache_ro --unexpire EFSSG0015I Refreshing data. EFSSG1000I The command completed successfully. [firstname.lastname@example.org ~] # lswcachestate gpfs0 cache_ro --connectionstatus -r EFSSG0015I Refreshing data. FilesetName FilesystemName Remote Fileset Path Remote Fileset Status Cache-Gateway Assigned Cache-Gateway Status QueueLength QueueNumExec Remote system name cache_ro gpfs0 10.0.100.38:/ibm/gpfs0/home_data Disconnected int001st002 Active 0 5 10.0.100.38 (non-sonas) EFSSG1000I The command completed successfully. [email@example.com ~]# ls /ibm/gpfs0/cache_ro bash tar
Sometimes, changes made on the cache site can get lost and do not make it to the home site, due to the asynchronous push nature of Active Cloud Engine. This can happen if the gateway node fails, before these changes are sent to home in a single-writer mode. There can also be other reasons when the gateway node can be compromised for example, it runs out of memory or if cluster configuration changes (such as new nodes are added as gateway nodes or existing gateway nodes are deleted) occur.
Active Cloud Engine retains sufficient on-disk information to handle such situations. And, to recover from such kinds of errors or changes, recovery is triggered. The recovery process finds all the changed data at cache and sends it to home. This delta of changes is found using snapshots, so that the file system is not suspended for a long interval and better performance is obtained.
The synchronization of data to home is done before sending any new updates to maintain consistency of data. While recovery is going on, clients on cache can still access the data.
When a home site or its associated storage is lost, the home can be restored using the failover functionality of Active Cloud Engine. A failover finds all valid data at cache and sends it to the new home server. This new home server can then perform all the functions of the original home site, for example, serve as a backup site.
If the home site gets corrupted accidentally and an administrator recognizes it, then an operation called resync can be run manually. This helps to synchronize the data between the cache and home sites. Resync finds the valid data at cache and compares it with the data at the home site. Any mismatched data at the home site is updated and synchronized with the data at the cache site.
IBM Active Cloud Engine provides a persistent, scalable, Portable Operating System Interface (POSIX)-compliant cache across a set of remote clusters connected to a home cluster. This helps to add new data management capabilities to IBM SONAS. Active Cloud Engine moves data across sites transparently and does not require any manual administrator intervention, which helps increase productivity. Active Cloud Engine is also called as Active File Management and is available as part of version 3 release 5 of IBM GPFS.
This article gives an overview of Active Cloud Engine and its capabilities.
- Why IBM SONAS with IBM Active Cloud Engine
- Automated file management with IBM Active Cloud Engine
- Managing Wide Area Network (WAN) caching
- IBM SONAS storage for AIX and Linux environment