AIX 5L™ is an award-winning operating system that delivers superior scalability, reliability, and manageability. It is the default operating system that powers some of the most powerful IBM UNIX® servers in the market.
Typically, a file system can be defined as a piece of software that helps in storing, organizing, and retrieving data from a physical storage medium, be it a hard disk drive, CD-ROM, or any other storage device. The code for such data organization, by its very nature, should be portable. In the real world, though, every operating system provides its own interfaces by which it requests a particular file system operation, and it is expected that the underlying piece of software provides results in the format that the operating system expects. The interfaces vary with different flavors of operating systems, and need to be exported by the file system to be supported on the particular operating system.
In this article, you'll learn about the AIX® operating system file system framework. You'll also get an overview of the IO layer and an explanation of some important concepts. Brief explanations are also included of the interfaces and methods when developing a new file system, or when porting an existing file system to the AIX operating system.
AIX, like many UNIX flavors, hosts the file system as a kernel extension. It is assumed that you have basic knowledge of UNIX programming and file system concepts. It would also be helpful to know how to write kernel extensions for AIX.
Understanding the logical file system and the virtual file system
The logical file system layer is the level of abstraction at which users can request the various file operations, such as read, write, stat, and so on. The logical file system interface supports UNIX-type file access semantics. The logical file system layer acts as a superset of the virtual file system, which encapsulates disparate file systems, that provides the kernel with a consistent view of the underlying directory tree. The logical file system is also responsible for managing the kernel's open file table and the per process file descriptor information.
The virtual file system is an abstraction of the underlying physical file system. The virtual file system provides a standard set of interfaces that you should support in order for your file system to be hosted over the AIX operating system. The virtual file system bridges the underlying disparate physical file system to the logical file system, providing a consistent directory tree hierarchy to the rest of the operating system.
Each unique mount instance of a file system object is represented by a virtual file system structure. A virtual file system can be a physical file system, a network file system, or a logical file system (one that does not have a physical backing store, such as ramfs). Figure 1 shows the AIX file system hierarchy.
Figure 1. The AIX file system hierarchy

As shown in Listing 1, the virtual file system is maintained
as a linked list of struct vfs, as denoted by the
member vfs_next.
Listing 1. Virtual file system structure
<sys/vfs.h>
struct vfs {
struct vfs *vfs_next;
struct gfs *vfs_gfs;
struct vnode *vfs_mntd;
struct vnode *vfs_mntdover;
struct vnode *vfs_vnodes;
int vfs_count;
caddr_t vfs_data;
unsigned int vfs_number;
int vfs_bsize;
#ifdef _SUN
short vfs_exflags;
unsigned short vfs_exroot;
#else
short vfs_rsvd1;
unsigned short vfs_rsvd2;
#endif /* _SUN */
struct vmount *vfs_mdata;
Simple_lock vfs_lock;
};
|
Each entry in the list represents a mounted file system object.
vfs_mntdvfs_mntdrepresents the mount pointvnodeover which this file system has been mounted. For the'/'root file system, this will be the rootvnode.vfs_vnodesvfs_vnodesis a linked list of all thevnodesfor this mounted instance.vfs_lock- You use
vfs_lockto serialize the access to thevnode. vfs_gfsvfs_gfspoints to thestruct gfsstructure for the corresponding file system.
struct gfs contains information pertaining to a file system that is independent of the
mounted instances. It contains all the common traits of a file system layout, as
shown in Listing 2 below. Each file system registered with
the operating system has exactly one struct gfs, while
there is one or more than one struct vfs, each for one
mounted instance. The important members of the
struct gfs are gfs_ops and
gn_ops, which represent the virtual file system operations and the
vnode operations for the file system. You should supply the virtual file system operations and
vnode operations for the file system to be supported on the AIX operating system.
Listing 2.
gfs structure
<sys/vfs.h>
struct gfs {
struct vfsops *gfs_ops;
struct vnodeops *gn_ops;
int gfs_type;
char gfs_name[16];
int (*gfs_init)(struct gfs *);
int gfs_flags;
caddr_t gfs_data;
int (*gfs_rinit)(void);
int gfs_hold;
};
|
The vnode represents an open object in the file
system. Every attempt to create or open a file results in a
vnode object being created for the file. If a
vnode already exists, the reference count is
incremented. The vnode remains alive in the virtual
file system until the last holder goes away, thereby bringing the reference count
to 0. The main job of the vnode is to translate the
pathname to a physical object in the underlying file system. An object can have
more than one pathname if:
- The file system is mounted more than once over different mount points.
- There is soft link or hard link to the object.
However, attempts to open synonymous paths do not result in multiple
vnodes being created. A gnode can be referred by multiple vnodes only when there
is more than one mounted instance of the file system. At any given instance, there
is at least one vnode referring a
gnode. Listing 3 shows the
vnode structure.
Listing 3. vnode structure
<sys/vnode.h>
struct vnode {
ushort v_flag;
ushort v_flag2;
ulong32int64_t v_count;
int v_vfsgen;
Simple_lock v_lock;
struct vfs *v_vfsp;
struct vfs *v_mvfsp;
struct gnode *v_gnode;
struct vnode *v_next;
struct vnode *v_vfsnext;
struct vnode *v_vfsprev;
union v_data {
void * _v_socket;
struct vnode * _v_pfsvnode;
} _v_data;
char *v_audit;
};
|
v_vfspv_vfsprepresents the containingvfsobject. If thisvnodeis a mount point for a different file system,v_mvfspholds thestruct vfsof the containing file system.v_gnodev_gnodepoints back to thegnodeof the object. Each physical object in a file system is represented by a uniquegnode. Unlike thevnode, there is only onegnodeper object, regardless of the number of mounted instances of a file system. There is an one-to-one correspondence between agnodeand a file on the disk.
The open object in each mounted instance is represented by a unique
vnode. Thus, the vnode can
be said to contain the open, instance-specific information, and the
gnode encapsulates the object itself.
Listing 4. gnode structure
<sys/vnode.h>
struct gnode {
enum vtype gn_type;
short gn_flags;
vmid_t gn_seg;
long32int64_t gn_mwrcnt;
long32int64_t gn_mrdcnt;
long32int64_t gn_rdcnt;
long32int64_t gn_wrcnt;
long32int64_t gn_excnt;
long32int64_t gn_rshcnt;
struct vnodeops *gn_ops;
struct vnode *gn_vnode;
dev_t gn_rdev;
chan_t gn_chan;
Simple_lock gn_reclk_lock;
int gn_reclk_event;
struct filock *gn_filocks;
caddr_t gn_data;
};
|
Some important members of the struct gnode are:
gn_vnodegn_vnodepoints to the list ofvnodesreferring thisgnode.gn_opsgn_opsis the standardvnodeoperations that you should register for your file system.gn_datagn_datapoints to the file system-specific information for the object, generally theinode.
The inode object is a file system data structure that
represents the information associated with the file and makes sense only to that
file system. Every gnode has an associated
inode in the file system. A
gnode stores the generic characters for the file, and
the inode is designed to store information specific to
the file system. The inode is encapsulated within the
gnode in the gn_data member
variable.
AIX does not doctor any interfaces for designing an
inode. It is up to you to design
your own inode and interpret it per the design.
Typically, an inode provides information such as the
ownership (group and user), access permissions, creation or modification time, and
so on.
The virtual file system operations
The AIX vnode and virtual file system framework provides the well-defined
interfaces known as the virtual file system and vnode operations. The virtual file
system operations are used to support file system management operations, such as
mounting and unmounting of a file system, syncing the file system with its
backing store during an unmount operation, and querying the various ancillary
provisions available with the file system. Listing 5 shows
mount and unmount operations.
All the virtual file system operations return 0 on successful completion. In event of a failure, an error number is returned from /usr/include/sys/errno.h.
Listing 5. Mount and unmount operations
int (*vfs_mount)(struct vfs *, struct ucred *); int (*vfs_unmount)(struct vfs *, int, struct ucred *); |
vfs_mount- This interface is called by the logical file system layer to invoke the
mounting of the requested file system. It is passed an initialized
struct vfs, which the routine should fill and return. Thevfs_mountinterface returns 0 on a successful mount and an error on failure. vfs_umount- This routine is called for unmounting a file system, either through an
explicit call to the
umountcommand or during operating system shutdown.
The vfs_sync routine, shown in
Listing 6 below, is called to sync the in-memory file system
data with its backing store.
Listing 6. Sync operations
#if defined(__64BIT_KERNEL) || defined(__FULL_PROTO) int (*vfs_sync)(struct gfs *); #else int (*vfs_sync)(); #endif int (*vfs_syncvfs)(struct gfs *, struct vfs *, int, struct ucred *); |
vfs_sync- This interface is passed the
struct gfsthat describes the file system type and not thestruct vfs, as in other virtual file system operations. Thesyncroutine is called once per file system type and not for eachvfsinstance. vfs_syncfsvfs_syncfsis used to sync a specific mount instance, unlikevfs_sync, which is called once for all mounted instances of that particular virtual file system type. Thevfs_syncfscan be called with several options that control the sync operation.
Listing 7. Quota and Access Control Lists management
int (*vfs_quotactl)(struct vfs *, int, uid_t, caddr_t, struct ucred *);
int (*vfs_aclxcntl)(struct vfs *, struct vnode *, int, struct uio *, size_t *,
struct ucred *);
|
vfs_quotactl- The logical file system layer invokes the
vfs_quotactlinterface to perform quota-related control operations of the file system. vfs_aclxcntlvfs_aclxcntlis invoked to perform various Access Control Lists (ACL) to specific control operations on the file system. If a file system has quota and ACL support, then it needs to adhere to the framework defined for the AIX operating system.
The routines return 0 on a successful operation. If the operations are not supported by the underlying file system, the routines should return EINVAL.
Listing 8. Other control operations
int (*vfs_root)(struct vfs *, struct vnode **, struct ucred *); int (*vfs_statfs)(struct vfs *, struct statfs *, struct ucred *); int (*vfs_vget)(struct vfs *, struct vnode **, struct fileid *, struct ucred *); int (*vfs_cntl)(struct vfs *, int, caddr_t, size_t, struct ucred *); |
vfs_rootvfs_rootis used to get thevnoderoot pointer of the file system. The routine returns thevnoderoot instruct vnode **.vfs_statfsvfs_statfsis used to get the file system characteristics. On a successful completion, thestruct statfsis filled with the appropriate file system characteristics. Table 1 describes the various file system characteristics.
Table 1. File system characteristics
| f_blocks | Specifies the number of blocks |
|---|---|
| f_files | Specifies the total number of file system objects |
| f_bsize | Specifies the file system block size |
| f_bfree | Specifies the number of free blocks |
| f_ffree | Specifies the number of free file system objects |
| f_fname | Specifies a 32-byte string indicating the file system name |
| f_fpack | Specifies a 32-byte string indicating a pack ID |
| f_name_max | Specifies the maximum length of an object name |
vfs_vgetvfs_vgetis used to get thevnodeof an object in the file system identified by the virtual file system andfileid. Thefileidis created during a call to thevn_fidvnodeoperation. If avnodeexists in the virtual file system layer for the corresponding object, the reference count is incremented and thevnodereturned. Or, a newvnodeis created through avn_getkernel service referencing the object and returned after setting its count to 1. Thefileidparameter at any given instance is used to uniquely identify an object in the file system.vfs_cntlvfs_cntlis used to implement miscellaneous user-specified control operations. The logical file system invokes thevfs_cntlroutine with a control ID and the necessary arguments. Thevfs_cntlis invoked by thefscntlsubroutine. There can be a maximum of 32768 control operations specified by the user.
Like the virtual file system operations, the AIX file system framework exports a
set of interfaces to do the various file system operations, such as read, write,
stat, and so on. These interfaces are the vnode operations. The AIX Version
5.3 kernel exports in all 56 interfaces for the various file system operations. It
is not necessary to support all of them; it depends on what purpose your file
system driver serves. Some of the extensions are provided purely for backward
compatibility with the older version of the AIX operating system, while others are
callout interfaces to aid the virtual memory manager in paging operations. The
rest of the routines provide the backbone support for the various UNIX file access
semantics.
Listing 9. Creating, naming, and deletion operations
int (*vn_link)(struct vnode *dvp, struct vnode *vp, char *name, struct ucred *cred);
int (*vn_mkdir)(struct vnode *dvp, char *name, int32long64_t mode, struct ucred *cred);
int (*vn_mknod)(struct vnode *dvp, caddr_t name, int32long64_t mode, dev_t dev,
struct ucred *cred);
int (*vn_remove)(struct vnode *vp, struct vnode *dvp, char *name, struct ucred *cred);
int (*vn_rename)(struct vnode *srcVp, struct vnode *srcDvp, caddr_t oldName,
struct vnode *destVp, struct vnode *destDvp, caddr_t newName,
struct ucred *cred);
int (*vn_rmdir)(struct vnode *vp, struct vnode *dvp, char *name, struct ucred *cred);
|
The routines are used to create, name, and delete objects in the underlying file
system. All the routines are passed with at least the parent directory
vnode in which the operation for the child object is to
be performed. The logical file system layer ensures that the
vnode parent directory does not lie in a read-only file
system. The entry points are:
vn_linkvn_linkis invoked to create a new hard link to an existing object as part of the link subroutine's job. The routine creates a hard link in thedvpdirectory with the namenamefor thevnodevp. The logical file system ensures that the objects referred by thedvpandvpparameters reside in the same virtual file system.vn_mkdirvn_mkdiris used to create a new named directory under the directory specified by thevnodedvp. The directory mode permissions are passed as anint32long64_tvariable and the name of the new directory is passed in thenameparameter.vn_mknodvn_mknodis used to create a new file with namenamein thedvpdirectory. Themodecarries the type of the file (regular, special files) and the file access permissions as a combination of bit masks. In the case of special files (such as device files), thedevparameter carries the device number.vn_removevn_removeis invoked by the logical file system layer in favor of the unlink subroutine. This interface is used to remove a directory entry or a link specified by thevnodevplying under thedvpdirectory. It is required that the user calls thevn_relefunction to release a reference to thevnode. If this is the last reference to the object, the physical disk resources occupied by the file are released.vn_renamevn_renameis called by the logical file system to rename a file or a directory. The source object (srcName) lying under thesrcDvpdirectory is referred by thevnodesrvVp. The new name is specified by thenewNameparameter and the new destination directory is specified by thedestDvp. In case an object of the same name already exists in the destination path, the object'svnodeis passed indestVp.vn_rmdirvn_rmdiris used to remove a directory entry specified byvnodevplying in the directorydvp. The logical file system ensures that thevnodeto be removed is a directory and not the current directory or the root directory. To remove a directory, the directory should be empty and should not have any child objects.
Listing 10. Lookup and filehandle operations
int (*vn_lookup)(struct vnode *dvp, struct vnode **vpp, char *name,
int32long64_t vflag, struct vattr *attr, struct ucred *cred);
int (*vn_fid)(struct vnode *vp, struct fileid *fidp, struct ucred *cred);
|
vn_lookup- The logical file system uses this entry point to look up the file with name
nameunder thedvpdirectory. If found, thevnodeof the file looked up is returned invpp.Every lookup to file results in the reference count being incremented. So, you should ensure that a successful
lookupoperation is followed by a close that will decrement the reference count. If theattrfield is not null, the routine should also return the file attributes in theattrparameter. vn_fidvn_fidis invoked to build a file identifier for thevnodevp. This file identifier is used by thevfs_vgetroutine to get thevnodeof the same object for which the file identifier was constructed in the first place. Hence, the file identifier should have enough information to successfully identify the right object.
Listing 11. File access operations
int (*vn_open)(struct vnode *vp, int32long64_t flag, ext_t dev, caddr_t * vinfo,
struct ucred *cred);
int (*vn_create)(struct vnode *dvp, struct vnode **vpp, int32long64_t flags,
caddr_t name, int32long64_t mode, caddr_t *vinfo,
struct ucred *cred);
int (*vn_hold)(struct vnode *vp);
int (*vn_rele)(struct vnode *vp);
int (*vn_close)(struct vnode *vp, int32long64_t flag, caddr_t vinfo,
struct ucred *cred);
int (*vn_map)(struct vnode *vp, caddr_t addr, uint32long64_t length,
uint32long64_t offset, uint32long64_t flags, struct ucred *cred);
int (*vn_unmap)(struct vnode *vp, int32long64_t flag, struct ucred *cred);
|
vn_open- An entry point used to open a file with the file open parameters specified in
the
flagparameter. Thevnodeof the file to be opened is passed in thevpparameter.Typically, the
vnodeis created during the file lookup operation or the file create operation (if open is called with theO_CREATflag). It is up to you to decide what needs to be done during avn_opencall. A successful open should result in the reference count being incremented. In case of opens for a device file, thedevparameter has the device-specific information. vn_create- This routine is called to create a regular file with the name
namein thedvpdirectory. The new file's access mode is passed in themodeparameter. Theflagparameter carries the open flag options for the succeeding open call. On a successful create, a newvnodeentry is created in the virtual file system and the reference count is set to 1. The newly createdvnodeis returned in thevppparameter. vn_hold / vn_rele- The
vn_holdroutine is used to increase the reference count of thevnode. This is to ensure that thevnodeis not deleted accidentally underneath the caller without the caller's knowledge. All calls tovn_holdshould promptly be followed by avn_releafter completing the job to ensure that the reference count is decremented. vn_closevn_closeis an entry point called by the close subroutine to close thevnodevp. This routine is called only when the last reference on thevnodegoes away. No further operations can be performed on thevnodeonce avn_closehas been called for thevnode.vn_mapvn_mapis a routine used to validate file mapping requests resulting from anmmapcall orshmatcall for the object referred by thevnodevp. Theaddrparameter specifies the address in the requesting process's address space where the object is to be mapped. Thelength,offset, andflagsdefine the length of the mapping, the offset within the file from which the file is to be mapped, and a bit mask of flags that defines the type of mapping, respectively.It is expected that the underlying file system store the
gn_segfield of thegnodefor the file that is mapped. The logical file system layer creates the virtual memory object if one doesn't already exist and increments the count.vn_unmap- The
vn_unmapentry is called tounmapthe file that had been already mapped to the process's address space. Thevnodevpspecifies the target file that needs to be unmapped. Theflagparameter specifies the bit mask of flags that defines the type of mapping.The file system implementation is required to perform only file system-specific operations on both the
vn_mapandvn_unmapcalls. The logical file system layer takes the responsibility of handling the virtual memory operations.
Listing 12. Attribute manipulation operations
int (*vn_access)(struct vnode *vp, int32long64_t mode, int32long64_t who,
struct ucred *cred);
int (*vn_getattr)(struct vnode *vp, struct vattr *attr, struct ucred *cred);
int (*vn_setattr)(struct vnode *vp, int32long64_t cmd, int32long64_t arg1,
int32long64_t arg2, int32long64_t arg3, struct ucred *cred);
|
vn_accessvn_accessis an entry point used by the logical volume file system to validate access to thevnodevp. This entry point is used to implement the access subroutine and to check the permissions, such as read, write, execute, and so on.The
whoparameter specifies the user for whom the check needs to be done. Themodeparameter specifies the type of check that needs to be done.vn_getattrvn_getattris an entry point used to get the various file attributes (specified in thevattrstructure) for the givenvnode. This entry point supports thestat,fstat, andlstasubroutines.vn_setattrvn_setattria an entry point used to set the various file attributes (specified in thevattrstructure) for thevnodevp. The values of thearg1,arg2, andarg3arguments depend on thecmdparameter.Table 2 specifies interpretation of these arguments based on the command value. This entry point supports the
chmod,chownx, andutimesubroutines.
Table 2. Possible command values for
vn_setattr| Command | V_OWN | V_UTIME | V_MODE |
|---|---|---|---|
| arg 1 | int fag; | int flag; | int mode; |
| arg 2 | int uid; | timestruct_t *atime; | Unused |
| arg 3 | int gid; | timestruct_t *mtime; | Unused |
Listing 13. Data update operations
int (*vn_fclear)(struct vnode *vp, int32long64_t flags, offset_t offset, offset_t len,
caddr_t vinfo, struct ucred *cred cred);
int (*vn_fsync)(struct vnode *vp, int32long64_t flags, int32long64_t fd,
struct ucred *cred);
int (*vn_ftrunc)(struct vnode *vp, int32long64_t flags, offset_t length, caddr_t vinfo,
struct ucred *cred);
int (*vn_rdwr)(struct vnode *vp, enum uio_rw op, int32long64_t flag,
struct uio *uiop, caddr_t dev, struct vattr *attr,
struct ucred *cred);
int (*vn_lockctl)(struct vnode *vp, offset_t offset, struct eflock *lckdata,
int32long64_t cmd, int (*retry_fcn)(), ulong *retry_id,
struct ucred *cred);
|
vn_fclearvn_fclearis an entry point used to clear a portion of the file and release back whole free blocks to the underlying file system. Theoffsetparameter defines the offset at which the clearing should start. Thelenparameter specifies the number of bytes that need to be cleared. The flag with which the file was opened is passed in theflagparameter. The logical file system updates the file size to reflect the number of bytes cleared.vn_fsyncvn_fsyncis a routine called to request the file system to flush all modified data back to the backing store for the givenvnodevp. This call must be completed synchronously so that the caller can be assured that all I/O has completed successfully. Theflagparameter specifies the various sync options. The different sync options are available in the fcntl.h header file.vn_ftruncvn_ftruncis an entry point used to truncate the file specified by thevnodevp. Thelengthparameter indicates the size of the file after the truncation operation is complete. If the new length is less than the previous length, the data between the two is removed. If the new length is greater than the existing length, zeros are added to extend the file size.When truncation is complete, whole blocks are returned back to the file system and the file size is updated. The operation fails if the range to be truncated is locked.
vn_rdwrvn_rdwris one of the most commonly supportedvnodeoperations.vn_rdwrperforms the file IO operations for the file specified by thevnodevp. Theopparameter indicates whether the request is for a read operation (UIO_READ) or a write operation (UIO_WRITE). Theuioparameter carries the user IO data structure that describes a memory buffer to be used in a data transfer. Thedevparameter carries the device-specific information if the IO is directed to special files. If theattrparameter is not NULL, then the attributes of the file should be passed back in theattrparameter.vn_lockctlvn_lockctlis an entry point used to dictate record based locking for the file targeted byvnodevp. Thestruct eflock(lckdata) defines the locking information to do the necessary record locking. Thecmdparameter defines the type of lock operation that needs to be performed.An optional retry function can be passed in the
retry_fcnparameter that can be used to retry locks if the lock is not granted immediately. Theretry_idstores the value that correlates a retry operation with a specific set of locks. The locking implementation calls the operating system-providedcommon_reclock()routine, which hides the complexities of locking operations from the end user.
Listing 14. Extension operations
int (*vn_ioctl)(struct vnode *vp, int32long64_t cmd, caddr_t arg, size_t flags,
ext_t dev, struct ucred *cred);
int (*vn_readlink)(struct vnode *vp, struct uio *uio, struct ucred *cred);
int (*vn_select)(struct vnode *vp, int32long64_t corel_id, ushort req_event,
ushort *ret_event, void (*notify)(), caddr_t vinfo,
struct ucred *cred);
int (*vn_symlink)(struct vnode *dvp, char *link, char *target, struct ucred *cred);
int (*vn_readdir)(struct vnode *vp, struct uio *uio, struct ucred *cred);
|
vn_ioctlvn_ioctlis an entry point used by the logical file system to perform miscellaneous user-specified IO control operations on special files. If the file system has support for special files, the information is passed on to the intended device driver referred by thevnodevp.The
cmdparameter identifies which IOCTL (IO Control) this routine has been called for. Theargparameter carries the necessary arguments.vn_readlinkvn_readlinkis an entry point used to read the contents of a symbolic link for thevnodevp. The logical file system takes the responsibility of locating thevnodecorresponding to the link. This routine simply does the reading of the data blocks for the link.vn_selectvn_selectis an entry point invoked by the logical file system to poll thevnodevpto determine if it is immediately ready for I/O. It is used to implement the select and poll subroutines. File system implementation can support constructs, such as devices or pipes, that support the select semantics.The
fp_selectkernel service provides more information about select and poll requests. Thereq_eventspecifies the events requested for polling, andnotifyis the callback routine that is called back on the event occurring.vn_symlinkvn_symlinkis used to create a symbolic link for an object whose absolute path is specified in thetargetparameter. The new link is created with the namelinknamein thedvpdirectory.vn_readdirvn_readdiris an interface used to read the directory entries of the directory referred by thevnodevp. The entries are returned asstruct direntin theuiostructure. The read starts at the first directory entry from the address specified in theuio_offsetmember ofstruct uio.When the
uiobuffer is full, theuio_offsetis updated with the starting address of the directory entry that was not pushed in the present buffer. Theuiop->uio_residfield is updated with the number of bytes that have been read into theuiostructure. The end of read operation is specified by an empty read into theuiostructure.
Listing 15. Buffer operations
int (*vn_strategy)(struct vnode *vp, struct buf *buf, struct ucred *cred); |
vn_startegyvn_startegyis the routine responsible for reading data from the block device. This entry point is intended to provide a block-oriented interface for servers for efficiency in paging. Thevnodevpprovides the file information for which the block read needs to be done. Thebufparameter contains thestruct bufthat describes the buffer.
Listing 16. Security-related operations
int (*vn_revoke)(struct vnode *vp, int32long64_t cmd, int32long64_t flags,
struct vattr *attr, struct ucred *cred);
int (*vn_getacl)(struct vnode *vp, struct uio *uio, struct ucred *cred);
int (*vn_setacl)(struct vnode *vp, struct uio *uio, struct ucred *cred);
int (*vn_getpcl)(struct vnode *vp, struct uio *uio, struct ucred *cred);
int (*vn_setpcl)(struct vnode *vp, struct uio *uio, struct ucred *cred);
int (*vn_seek)(struct vnode *vp, offset_t *offset, struct ucred *cred);
|
vn_revokevn_revokeis an entry point used to revoke all access to an object specified by thevnodevp. Thecmdparameter, which defines if the calling process has the file open, can have the following values:- 0—The process did not have the file open.
- 1—The process had the file open.
- 2—The process had the file open and the reference count in the file structure was greater than one.
vn_getacl / vn_setaclvn_getacl / vn_setaclis an entry point used by the logical file system to retrieve the ACL for a file to implement thegetaclsubroutine. Thevn_setaclin turn is used to set the ACL for the file. These routines provide the backbone support for the chacl, chown, chmod, and statacl subroutines.vn_getpcl / vn_setpclvn_getpcl / vn_setpclis an entry point used by the logical file system to retrieve the privilege control list (PCL) on a file to implement thegetpclsubroutine. Thevn_setpclis used to set the privilege control list for a file and supports thesetpclsubroutine.vn_seekvn_seekis an entry point used to validate the offset of theseekoperation. Thevnodefor which theseekoperation is to be validated is passed invpand the offset is passed inoffset. Typically, if the offset is greater than 0 and less than the maximum length of the file, the routine returns an EOK, else it returns EINVAL.
Listing 17. External pager callout operations
int (*pagerBackRange)(struct gnode *gnp, offset_t offset, caddr_t dest,
size_t *nBytesOfRange, size_t *nBytesBacked, uint *flags);
int64_t (*pagerGetFileSize)(struct gnode *gnp);
void (*pagerReadAhead)(struct gnode *gnp, vpn_t pFault, vpn_t * pFirst,
vpn_t *nPage, vpn_t *pTripWire, boolean_t tripWire);
void (*pagerReadWriteBehind)(struct gnode *gnp, int64_t offset, int64_t length,
uint flags);
void (*pagerEndCopy)(struct gnode *gnp, offset_t offset, size_t nBytesMoved,
size_t nBytesBacked, uint flags);
|
AIX has a provision for external pager callout routines that the virtual memory manager consults when paging in/out files from that particular file system.
pagerBackRange- A callback used to request the file system to back the in core pages with
storage in the backing physical store. A call to the
pagerBackRangeis followed by a call to thepagerEndCopycallback. pagerEndCopypagerEndCopydoes the post processing after the copying.pagerReadAhead,pagerReadWriteBehindpagerReadWriteBehindcallbacks to facilitate the virtual memory manager in consulting the file system for the read-ahead and write-behind policies.
All of these callout operations are optional. It is up to you to provide those that are needed.
Other than these, the vnodeops_t provides a number of
interfaces called 421 extensions, which are provided for backward compatibility
with AIX Version 4.2.1
File system helpers and mount helpers
To enable support for multiple file systems, many of the file system routines do not process the command by itself. Instead, they collect the arguments passed to the command and send them on to specific, back-end programs of the file system that do the real processing of the command. The back-end programs are known as file system helpers and mount helpers; it is imperative that a file system developer provide these programs.
The helper programs are in /sbin/helpers/<vfs_type>, where the vfs_type matches the file system type for which the command is being invoked. The program name must match the name of the command being executed.
The mount command is the front-end program for the
mount helper routine provided by each file system. The back-end support programs
provided for the mount and unmount commands are the mount helpers. The front-end
mount program collects the various parameters passed to the mount program. It then
looks into the /etc/filesystems file to determine the virtual file system type of
the target file system. It calls the
/sbin/helpers/<vfs_type>/mount with the
collected parameters to process the command. A typical entry for a file system
looks like Listing 18 in the /etc/filesystems configuration
file.
Listing 18. Sample entry in /etc/filesystems
/data:
dev = /dev/fslv00
vfs = jfs2
log = /dev/hd8
mount = true
options = rw
|
The vfs attribute determines the virtual file system
type for the file system (<vfs_type>). The
mount attribute determines the default mount behavior
for this file system. It can have the following values.
| Automatic | Automatic mounts the file system automatically when the system is started. |
|---|---|
| False | The file system is not mounted by default. |
| Readonly | The file system is mounted as read only. |
| True | The file system is mounted by the mount all
command and unmounted by the unmount all
command. |
| Nodename | Nodename is used by the mount command to determine which node
contains the remote file system. If this attributed is not present, then the
mount is a local mount. |
The options attribute specifies any additional options
to be passed to the back-end mount processor. Both the
mount and unmount command
have six parameters; the first four are common and the last two are specific to
the specific command.
Building and configuring the file system
The file system component is built as a kernel extension in AIX. The kernel maintains a list of active file system types registered with it. For the AIX operating system to honor your file system, it must be registered with AIX. AIX provides two kernel services, gfsadd and gfsdel, for adding and deleting the file system.
Every file system should provide a configuration routine that can be called to
configure the kernel extension as a file system. This routine should fill in the
file system information in struct gfs and call
gfsadd. The gfsadd kernel
service uses the information in the struct gfs to
register it in the global file system table and calls the initialization routine
specified in gfs_init. The initialization routine does
the rest of the file system initialization job.
To load the file system from a user perspective:
- Have a user program or script call the sysconfig subroutine to load the kernel extension.
- Call the sysconfig subroutine again to configure the kernel extension as a virtual file system by specifying the configuration routine.
- The configuration routine calls the gfsadd kernel service to register the kernel extension as a recognized file system for AIX.
Once this is done, the file system becomes operational.
File system development is one of the most challenging aspects of kernel development. By its nature, code for the data management in a file system should be portable and platform-independent. Each operating system has its own IO framework and interfaces that bridge your file system with the kernel. It is this framework that you should thoroughly understand in order to make your file system work for the particular operating system.
You should also have a basic understanding of the kernel design and the interfaces it exports for various support operations, such as memory allocation, data copy routines, and so on. There is no single library or standard that defines these support routines. As a general rule, designing generic interfaces for such common support routines and having platform-specific files defining these operations for each platform should make life easier. A well-designed file system should be easily portable, at least across the various UNIX flavors.
Learn
-
"Writing AIX kernel extensions"
(developerWorks, Aug 2006): This article explains how to write a kernel extensions
for AIX.
- Open AFS: OpenAFS has an AIX port for the
AFS file system.
- Read:
- Kernel Extensions and Device Support Programming Concepts, SC23-4900-03, to learn about kernel programming and the kernel environment for AIX.
- Technical Reference: Kernel and Subsystems, Volume 1, SC23-4917-03, for detailed information about kernel services, device driver operations, and file system operations for AIX.
- Technical Reference: Kernel and Subsystems, Volume 2, SC23-4918-03, for detailed information about the configuration subsystem, communications subsystem, LFT subsystem, printer subsystems, SCSI subsystem, Integrated Device Electronics, SSA subsystem, and the serial DASD subsystem for AIX.
- Popular content:
See what AIX and UNIX content your peers find interesting.
- AIX and
UNIX:
The AIX and UNIX developerWorks zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
- New to AIX and UNIX?:
Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.
- AIX 5L Wiki:
Discover a collaborative environment for technical information related to AIX.
- Search the AIX and UNIX library by topic:
- System administration
- Application development
- Performance
- Porting
- Security
- Tips
- Tools and utilities
- Java™ technology
- Linux®
- Open source
- Safari bookstore:
Visit this e-reference library to find specific technical resources.
- developerWorks technical events and webcasts:
Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and
catch up with IBM technical experts.
Get products and technologies
- IBM trial software:
Build your next development project with software for download directly from
developerWorks.
Discuss
- Participate in the
developerWorks blogs
and get involved in the developerWorks community.
- Participate in the AIX and UNIX forums:
- AIX 5L—technical forum
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools—technical
- Virtualization—technical
- More AIX and UNIX forums

Srikanth Srinivasan is a Staff Software Engineer for the IBM India Systems and Technology Labs. His main focus is on parallel file systems and the Linux kernel. He is part of the IBM General Parallel File systems (GPFS) development team, involved in developing various features for GPFS. Srikanth holds a bachelor's degree in Electronics and Communications Engineering from the University of Madras, India. You can reach him at ssrikanth@in.ibm.com.





