Vice partition virtualization for IBM AFS

IBM® AFS™ storage area network (SAN) storage virtualization provides the mechanism to migrate the existing vice partitions from one file server to another file server without restarting the file server or terminating the client processes. This article explains the problem with the existing IBM AFS file server setup in migrating the vice partitions from one file server to another and the steps to resolve the same using IBM AFS SAN storage virtualization.

Share:

Ms. Indira Khopde (indira.khopde@in.ibm.com), Senior Software Engineer, IBM India Software Lab

Photo of Indira KhopdeIndira Khopde works as a Senior Software Engineer for the Andrew File System (AFS) Team at IBM India Software Labs in Pune. She is working for the AFS team for the last three years and is involved in support and feature development of the product. Indira has an M.Sc. Computer Science degree from the University of Pune. She can be reached at indira.khopde@in.ibm.com.



10 September 2013

Also available in Chinese

Overview of IBM AFS

AFS is a distributed file system that enables cooperating hosts (clients and servers) to efficiently share file system resources across both local area networks (LANs) and wide area networks (WANs).

Some of its major features are client/server architecture, administration capability, scalability, location transparency, security, reliability, portability and so on.


Major components of IBM AFS

File server: The file server is an AFS entity that is responsible for providing a central disk repository for a particular set of files within volumes, and for making these files accessible to properly authorized users running on client machines.

Volume server: The volume server allows administrative tasks and probes to be performed on the set of AFS volumes residing on the machine on which it is running. These operations include volume creation and deletion, renaming volumes, dumping and restoring volumes, altering the list of replication sites for a read-only volume, creating and propagating a new read-only volume image, creating and updating backup volumes, listing all volumes on a partition, and examining volume status.

Basic OverSeer (BOS) Server: The BOS Server is an administrative tool that runs on each file server machine in a cell. This server is responsible for monitoring the health of the AFS agents in the proper order after a system restart, answering requests about their status, and restarting them when they fail. It also accepts commands to start, suspend, or resume these processes, and install new server binaries.

Salvager: The salvager differs from other AFS servers as it runs only at selected times. The BOS Server invokes the salvager when the file server, volume server, or both fail. The salvager attempts to repair disk corruption that can result from a failure. As a system administrator, you can also invoke the salvager as necessary, even if the file server or volume server has not failed.

Volume location server: The volume location server maintains and exports the volume location database. This database tracks the server or set of servers on which volume instances reside. The operations it supports are queries returning volume location and status information, volume ID management, and creation, deletion, and modification of VLDB entries.

Client component (Venus): The AFS cache management is handled by an AFS client (afsd), with the help of a set of daemons. It supports both local disk caching and caching in memory.

Vice partition: Vice partitions are IBM AIX® partitions created using physical devices. By convention, each partition is named /vicepx, where x is one or two lowercase letters of the English alphabet. AFS partitions can be named /vicepa or /vicepaa, /vicepb or /vicepbb, and so on up to /vicepz or /vicepzz.

Volumes: Disk partitions used for AFS storage do not directly host individual user files and directories. Rather, connected subtrees of the systems directory structure are placed into containers called volumes.


Existing file server setup

The existing file server setup is as shown in Figure 1, where multiple clients can access volumes residing on Fileserver1.

In the existing design, each file server has its own local storage on which the vice partitions are created and the volumes and data are stored on the local storage itself.

Figure 1. Existing file server setup
Existing file server setup

Problem with the current file server setup

The main issue with the existing setup is that whenever the AFS administrators need to perform a software upgrade on any of the file servers, they first need to move all the volumes from that server to some other server. This includes copying the entire data from the volumes. After the software upgrade is done, the volumes are moved back again to the same file server. The entire procedure is very time consuming and used to take few days to complete.


Solution using SAN storage virtualization

The new AFS servers are much larger machines and they have more virtualized SAN systems housing the AFS vice partition data. We now have two or more AFS file servers sharing the entire data storage of a single SAN.

The new configuration shares the SAN storage between multiple file servers. This helps to reduce the time taken for the migration of vice partitions without requiring to copy the data.

The existing AFS implementation can achieve this by doing some AFS-specific configuration and then deploying the file server. But, it is expected to have some downtime while moving the vice partitions, though the downtime is limited and it also requires restarting the file servers, which might affect the client jobs that are active and running.

Figure 2. New SAN storage-based file server setup
New SAN storage-based file server setup

Moving all vice partitions from one file server to another is achieved by adding two new commands to vos.

Usage:

vos detachpart -fromserver <machine name of
                server from where to detach> -partition <partition names to be moved>
                [-cell <cell name>] [-noauth] [-localauth] [-verbose] [-timeout <timeout in
                seconds >] [-help]

Usage:

vos attachpart -fromserver <machine name of
                server from where to detach> -toserver <machine name of server where to
                attach> -partition <partition names to be moved> [-cell <cell name>]
                [-noauth] [-localauth] [-verbose] [-timeout <timeout in seconds >]
                [-help]

The vos detachpart command detaches the partitions from the file server specified as fromserver , and the vos attachpart command attaches the partitions from the file server specified as fromserver to the file server specified as toserver.

This makes the migration of vice partitions very quick without restarting the file servers and also without terminating the client jobs. They just get a busy message until the migration is completed.

We are working on adding support for moving a single or subset of vice partition from one file server to another.


Steps for moving all vice partitions from one server to another

Let's call the fromserver as fileserver1 (where the vice partitions are currently mounted) and the toserver as fileserver2 (where we want to move the vice partitions). Also, we assume that there is no vos activity during the vice partition movement process.

  1. Mark fileservser1 as VBUSY manually by sending a signal SIGUSR1 to the file server.

    By doing this, the client jobs that are active will start getting a busy message and the client jobs will resume after the vice partition movement is complete.

  2. Run the following command to detach the partition from fileserver1.
    vos detachpart -fromserver
                                fileserver1 -partition all -cell punetest

    This command marks all the volumes as offline and closes all the open file handles, so that we can unmount the vice partitions.

  3. Unmount all the SAN vice partitions from fileserver1.
    • Unmount all the vice partitions so that we do not have the partitions mounted on more than one file server at a time as it might cause data inconsistency.
    • Deactivate all the volume groups. For this, issue the varyoffvg command, specifying the volume group name.
    • Remove the volume group definition from the system for all volume groups. For this, issue the exportvg command, specifying the volume group name.
    • Run the lsvg command to verify if the volume groups are deactivated and the volume group definition is been removed.
  4. Mount all the SAN vice partitions on fileserver2.
    • Make the previously exported volume known to the system. For this, issue the importvg -y command, specifyingthe volume group name.
    • Activate all the volume groups. For this, issue the varyonvg command specifying the volume group name.
    • Run the lsvg command to verify if the volume groups definition is known to system and the volume groups are activated.
    • Mount all the vice partitions, so that we can attach these vice partitions to fileserver2.
  5. Restart the fs instance on fileserver2.

    We need to run the following command for the file server to discover the newly mounted vice partitions.

    bos restart fileserver2 -instance fs -cell
                                punetest
  6. Run the following command to attach partitions to fileserver2.
    vos attachpart -fromserver
                                fileserver1 -toserver fileserver2 -partition all -cell
                            punetest

    This command breaks the callbacks from the old file server, that is, fileserver1 and changes the locations of the volumes so that they now point to new file server, that is, fileserver2 and marks all the volumes as online.

  7. Unmark fileserver1 from VBUSY manually by sending a signal SIGUSR2 to the file server.

    By doing this, the client jobs will stop getting busy messages and will resume their activity.

  8. Shut down the fs instance on fileserver1.

    We need to run the following command to make sure that the clients are going to the new file server, that is, fileserver2 for accessing the volumes.

    bos shutdown -server fileserver1 -instance fs -cell
                            punetest

Sample steps to move partitions from test servers pghafs03.rchland.ibm.com to pghafs02.rchland.ibm.com

We have six vice partitions (/vicepa through /vicepf) that are currently mounted on pghafs03.

 [root@pghafs03] > mount node mounted mounted
                over vfs date options -------- --------------- --------------- ------ ------------
                --------------- /dev/hd4 / jfs2 Mar 11 08:15 rw,log=/dev/hd8 /dev/hd2 /usr jfs2 Mar
                11 08:15 rw,log=/dev/hd8 /dev/hd9var /var jfs2 Mar 11 08:15 rw,log=/dev/hd8 /dev/hd3
                /tmp jfs2 Mar 11 08:15 rw,log=/dev/hd8 /dev/hd1 /home jfs2 Mar 11 08:15
                rw,log=/dev/hd8 /dev/hd11admin /admin jfs2 Mar 11 08:15 rw,log=/dev/hd8 /proc /proc
                procfs Mar 11 08:15 rw /dev/hd10opt /opt jfs2 Mar 11 08:15 rw,log=/dev/hd8
                /dev/livedump /var/adm/ras/livedump jfs2 Mar 11 08:15 rw,log=/dev/hd8 /dev/lvvarlog
                /var/log jfs2 Mar 11 08:15 rw,log=/dev/hd8 /dev/lvafslogs /usr/afs/logs jfs2 Mar 11
                08:15 rw,log=/dev/hd8 /dev/lvscm /opt/IBM/SCM jfs2 Mar 11 08:15 rw,log=/dev/hd8
                /dev/ramdisk0 /usr/vice/cache jfs Mar 11 08:15 rw,nointegrity AFS /afs afs Mar 11
                08:17 rw /dev/fslv00 /testvg jfs Mar 19 08:30 rw,log=/dev/loglv06 /dev/lv01 /vicepa
                jfs May 24 00:47 rw,log=/dev/loglv01 /dev/lv02 /vicepb jfs May 24 00:47
                rw,log=/dev/loglv02 /dev/lv03 /vicepc jfs May 24 00:47 rw,log=/dev/loglv03 /dev/lv04
                /vicepd jfs May 24 00:47 rw,log=/dev/loglv04 /dev/lv05 /vicepe jfs May 24 00:47
                rw,log=/dev/loglv05 /dev/lv00 /vicepf jfs May 24 00:47 rw,log=/dev/loglv00
                [root@pghafs03] >
  1. Mark pghafs03 as VBUSY manually by sending a signal SIGUSR1 to the file server.
     [root@pghafs03] > ps -ef | grep
                            server root 7274520 8781978 0 May 24 - 0:01 /usr/afs/bin/volserver -log -p
                            16 root 8781978 1 0 Mar 12 - 0:06 /usr/afs/bin/bosserve rroot 13041896
                            8781978 0 May 24 - 0:04 /usr/afs/bin/fileserver -hr 24 -L -m 1 -cb 128000
                            -rxpck 2500 -p 256 -udpsize 1048576 -implicit rl -nojumbo -seclog -vc 1200
                            -s 1800 -l 1200 -b 180 [root@pghafs03] > kill -SIGUSR1
                                13041896 [root@pghafs03] > FileLog:Mon May 27 00:56:23 2013
                            Marked fileserver as VBUSY for partition movement Clients start getting
                            VBUSY message. afs: Waiting for busy volume 536894431 (indira.e.1) in cell
                            test36.transarc.com afs: Waiting for busy volume 536894431 (indira.e.1) in
                            cell test36.transarc.com
  2. Run the vos detachpart -fromserver fileserver1 -partition all -cell punetest command to detach partition from pghafs03.
    [root@pghafs03] > vos
                                detachpart -fromserver pghafs03 -partition all -c test36 vos
                            detachpart: Parsing all partitions... vos detachpart: Detaching all
                            partitions from server pghafs03...please check logs for successful
                            completion [root@pghafs03] > All volumes will be marked as offline.
                            [root@pghafs03] > vos exam 536892892 -c test36
                            atd.virt.f.691 536892892 RW 2 K Off-line
                            pghafs03.rchland.ibm.com /vicepf RWrite 536892892 ROnly 0 Backup 0 MaxQuota
                            100000 K Creation Tue Mar 19 10:43:01 2013 Last Update Tue Mar 19 10:48:38
                            2013 Last Access Tue Apr 9 06:53:54 2013 0 accesses in the past day (i.e.,
                            vnode references) RWrite: 536892892 number of sites -> 1 server
                            pghafs03.rchland.ibm.com partition /vicepf RW Site [root@pghafs03] >
                            FileLog:Mon May 27 00:56:36 2013 vos detachpart : Making all volumes
                            offline... Mon May 27 00:56:36 2013 VShutdown:shutting down on- line
                            volumes... Mon May 27 00:56:36 2013 VShutdown: complete. Mon May 27 00:56:36
                            2013 vos detachpart : Made all volumes offline...
  3. Unmount all the SAN vice partitions from pghafs03.
    • Run the script for unmounting the vice partitions from pghafs03.
    • Make sure that all vice partitions are unmounted from pghafs03.
  4. Mount all the SAN vice partitions on pghafs02.
    • Run the script for mounting the vice partitions of pghafs02.
    • Make sure that all vice partitions are mounted from pghafs02.
  5. Restart the fs instance on pghafs02.
    [root@pghafs03] > bos
                                restart -server pghafs02 -instance fs -c test36 [root@pghafs03]
                            >
  6. Run the vos attachpart -fromserver fileserver1 -toserver fileserver2 -partition all -cell punetest command to attach partitions to pghafs02.
    [root@pghafs03]
                            > vos attachpart -fromserver pghafs03 -toserver pghafs02
                                -partition all -c test36 vos attachpart: Parsing all
                            partitions... vos attachpart: Attached all partitions to server pghafs02
                            from server pghafs03... please check logs for successful completion
                            [root@pghafs03]> FileLog: Mon May 27 01:00:45 2013 vos attachpart :
                            breaking all callbacks for each and every volumes... Mon May 27 01:00:45
                            2013 VBreakAllCallBack_r: breaking callback on all volumes... Mon May 27
                            01:00:45 2013 VBreakAllCallBack_r: complete. Mon May 27 01:00:45 2013 vos
                            attachpart : after breaking all callbacks for each and every volumes...
  7. Unmark pghafs03 from VBUSY manually by sending a signal SIGUSR2 to the file server.
    [root@pghafs03] > ps -ef |grep
                            server root 7274520 8781978 0 May 24 - 0:02 /usr/afs/bin/volserver -log -p
                            16 root 8781978 1 0 Mar 12 - 0:06 /usr/afs/bin/bosserver root 13041896
                            8781978 0 May 24 - 0:04 /usr/afs/bin/fileserver -hr 24 -L -m 1 -cb 128000
                            -rxpck 2500 -p 256 -udpsize 1048576 -implicit rl -nojumbo -seclog -vc 1200
                            -s 1800 -l 1200 -b 180 [root@pghafs03] >kill -SIGUSR2
                                13041896 [root@pghafs03] > FileLog: Mon May 27 01:01:32 2013
                            Unmarked fileserver from VBUSY as partition movement is done
  8. Shut down the fs instance on pghafs03.
    [root@pghafs03] > bos
                                shutdown -server pghafs03 -instance fs -c test36
                            [root@pghafs03] > ps -ef| grep server root 8781978 1 0 Mar 12 - 0:06
                            /usr/afs/bin/bosserver [root@pghafs03]>

Error recovery

Even though we have a prerequisite that there should not be any vos activity during the vice partition movement, in case there is any error at any point of time following are the work-around to recover from the situations.

There may be various points from where we want to recover. Let us consider some of them in this section.

Case 1

  • Case 1: Suppose if we want to recover after running step 2

    This is where fileserver1 is in busy mode and all the volumes are marked as offline. To recover from this, we can perform either of the following steps:

    • Run the bos restart fileserver1 -instance fs -cell punetest command.
    • Run the vos attachpart -fromserver fileserver1 -toserver fileserver1 -partition all -cell punetest command, that is, sending the to and from file server as the same and unmarking fileserver1 from busy mode by sending signal SIGUSR2 to fileserver1.

    You can perform any one of these steps to mark the volumes online again.

  • Case 2: Suppose if we want to recover after running step 3

    This is where we have fileserver1 in busy mode and all volumes are offline and we have unmounted the partitions. To recover from this, we can mount the partitions again and do either one of the two steps mentioned in Case 1.

  • Case 3: Suppose if we want to recover after running step 4

    This is where we have fileserver1 in busy mode and all volumes are offline and we have unmounted the partitions from fileserver1 and have mounted the partitions of fileserver2. To recover from this, we can do any of the following steps: Unmount the partitions from fileserver2, mount them back on fileserver1 and, perform either of the steps mentioned in Case 1. Or, perform the remaining steps (step 5 though step 8).

    • Unmount the partitions from fileserver2, mount them back on fileserver1 and can perform either of the steps mentioned in Case 1.
    • Or, perform the remaining steps (step 5 though step 8).

Summary

The SAN storage virtualization for IBM AFS enables the administrator to monitor the health of a file server having hardware issues or during heavy load, and makes the migration of the vice partitions from one file server to another very effortlessly so that the client jobs do not see a failure and also does not require the file server to be restarted.


Resources

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=942713
ArticleTitle=Vice partition virtualization for IBM AFS
publish-date=09102013