Discover the Boost Filesystem Library

Creating platform-independent code

The absence of a well-defined library that deals with file system manipulation is a long-running issue for the C++ language. In the past, programmers have had to use native APIs to work around the problem. Discover a library that provides a safe, portable, and easy-to-use C++ interface to facilitate file system operations: the Boost Filesystem Library.

Share:

Arpan Sen (arpansen@gmail.com), Independent author

Arpan Sen is a lead engineer working on the development of software in the electronic design automation industry. He has worked on several flavors of UNIX, including Solaris, SunOS, HP-UX, and IRIX as well as Linux and Microsoft Windows for several years. He takes a keen interest in software performance-optimization techniques, graph theory, and parallel computing. Arpan holds a post-graduate degree in software systems. You can reach him at arpansen@gmail.com.


developerWorks Contributing author
        level

08 April 2008

Also available in Chinese

One of the most common issues with the C++ language—and indeed, the C++ standard—is the lack of a well-defined library that helps deal with file system queries and manipulation. This absence leads programmers to use the native operating system-provided application program interfaces (APIs), which makes for code that isn't portable across platforms. Consider a simple situation: You need to figure out whether a file is of type Directory. In the Microsoft® Windows® platform, you could do this by calling the GetAttributes library function, defined in the windows.h header file:

DWORD GetFileAttributes (LPCTSTR lpFileName);

For directories, the result should be FILE_ATTRIBUTE_DIRECTORY, and your code must check for the same. On UNIX® and Linux® platforms, the same functionality could be achieved by using the stat or fstat routines and the S_ISDIR macro defined in sys/stat.h. You must also understand the stat structure. Here's the corresponding code:

#include <sys/stat.h>
#include <stdio.h>
int main()
  {
  struct stat s1;
  int status = stat(<const char* denoting pathname>, &s1);
  printf(“Path is a directory : %d\n”, S_ISDIR(s1.st_mode));
  return 0;
  }

For programs with heavy-duty I/O, such inconsistencies imply a significant engineering effort for porting the code across platforms. It is in this light that you introduce the Boost Filesystem Library. This widely used library provides a safe, portable, and easy-to-use C++ interface for performing file system operations. It's available as a free download from the Boost site.

Your first program using boost::filesystem

Before delving into the finer nuances of the Boost Filesystem Library, Listing 1 shows the code to figure out whether a file is of type Directory using Boost APIs.

Listing 1. Code to determine whether a file is of type Directory
#include <stdio.h>
#include “boost/filesystem.hpp”
int main()
  {
  boost::filesystem::path path("/usr/local/include"); // random pathname
  bool result = boost::filesystem::is_directory(path);  
  printf(“Path is a directory : %d\n”, result);
  return 0;
  }

The code is self-explanatory, and you don't need to know any system-specific routines. The code is verified to compile on both gcc-3.4.4 and cl-13.10.3077 without modification.


Understanding the Boost path object

The key to understanding the Boost Filesystem Library is the path object, as several routines defined in the Filesystem Library are only meant to work with a proper path object. Often, file system paths are operating system dependant. For example, it's well known that UNIX and Linux systems use the virgule ( /) character as a directory separator, while Windows uses a backslash (\) for similar purpose. The boost::filesystem::path is meant to precisely abstract this feature. A path object could be initialized in several ways, the most common being an initialization with a char* or std::string, as shown in Listing 2.

Listing 2. Ways to create a Boost path object
path(); // empty path 
path(const char* pathname); 
path(const std::string& pathname);
path(const char* pathname, boost::filesystem::path::name_check checker); 
path(const std::string& pathname, boost::filesystem::path::name_check checker);

While initializing the path object, the PATHNAME variable could be provided in either the native format or in the portable format defined by the Portable Operating System Interface (POSIX) committee. Both approaches have their relative trade-offs in practice. Consider a situation in which you want to manipulate a directory that your software created and locate the directory in /tmp/mywork on UNIX and Linux systems and in C:\tmp\mywork on Windows. There are several approaches to this problem. Listing 3 shows the native format-oriented approach.

Listing 3. Initialize the path using the native format
#ifdef UNIX
boost::filesystem::path path("/tmp/mywork");
#else
boost::filesystem::path path("C:\\tmp\\mywork ");
#endif

A single #ifdef is needed to initialize the path on a per-operating system basis. However, if you prefer using the portable format, take a look at Listing 4.

Listing 4. Initialize the path using a portable format
boost::filesystem::path path("/tmp/mywork");

Note that path::name_check refers to a name-checking function prototype. The name-check function returns "True" if its argument input PATHNAME is valid for a particular operating system or file system. Several name-checking functions come with the Boost Filesystem Library, and you're welcome to provide your own variants, as well. Some of the more commonly used name-check functions are the Boost-provided portable_posix_name and windows_name.


Overview of the path member functions

The path object comes with several member methods. These member routines don't modify the file system but do provide useful information based on the path name. This section provides an overview of several of these routines:

  • const std::string& string( ): This routine returns a copy of the string with which the path was initialized, with formatting per the path grammar rules.
  • std::string root_directory( ): Given a path, this API would return the root directory; otherwise, it would return an empty string. For example, if the path consists of /tmp/var1, then this routine would return /, which is the root of the UNIX file system. However, if the path is a relative path such as ../mywork/bin, this routine would return an empty string.
  • std::string root_name( ): Given a path that begins from the root of the file system, this routine returns a string that contains the first character of the PATHNAME.
  • std::string leaf( ): Given an absolute path name (for example, /home/user1/file2), this routine provides only the string corresponding to the file name (that is, file2).
  • std::string branch_path( ): This is the complementary routine to leaf. Given a path, it returns all the elements used to construct it except the last. For example, for a path initialized with /a/b/c,path.branch_path( ) returns /a/b. For paths with a single element such as c, this routine returns an empty string.
  • bool empty( ): If the path object contains an empty string (for example, path path1(""), then this routine returns True or False.
  • boost::filesystem::path::iterator: This routine is used to traverse the individual elements of the path. Consider Listing 5.
    Listing 5. Using the path::iterator (begin and end interface)
    #include <iostream>
    #include “boost/filesystem.hpp”
    int main()
      {
      boost::filesystem::path path1("/usr/local/include"); // random pathname
      boost::filesystem::path::iterator pathI = path1.begin();
      while (pathI != path1.end())
        {
        std::cout << *pathI << std::endl;
        ++pathI;
        }
      return 0;
      }
     
    // result: 1

    The output of the above program is /, usr, local, include, in that order, signifying the directory hierarchy.

  • path operator / (char* lhs, const path& rhs): This routine is a non-member function of path. It returns a concatenation of the path formed using lhs and rhs. It automatically inserts / as the path separator, as Listing 6 shows.
    Listing 6. Concatenation of path strings
    #include <iostream>
    #include "boost/filesystem.hpp"
    int main()
      {
      boost::filesystem::path path1("local/include"); 
      boost::filesystem::path path2 = operator/("/usr", path1); 
      std::cout << "Path: " << path2 << std::endl;
      return 0;
      } 
    // result: /usr/local/include

Error handling

File system operations often encounter unexpected issues, and the Boost Filesystem Library reports run time errors using C++ exceptions. The boost::filesystem_error class is derived from the std::runtime_error class. Functions in the library use filesystem_error exceptions to report operational errors. Corresponding to the different possible error types, there are error codes that are defined in the Boost headers. The user code typically resides within try...catch blocks that use the filesystem_error exceptions for reporting relevant error messages. Listing 7 provides a small example of a file is being renamed, except that the file in the from path doesn't exist.

Listing 7. Error handling in Boost
#include <iostream>
#include “boost/filesystem.hpp”
int main()
  {
  try {
  boost::filesystem::path path("C:\\src\\hdbase\\j1"); 
  boost::filesystem::path path2("C:\\src\\hdbase\\j2"); 
  boost::filesystem::rename(path, path2);
  }
  catch(boost::filesystem::filesystem_error e) { 
  // do the needful 
  }    
  return 0;
  }

Categories of function in the Boost Filesystem Library

The boost::filesystem provides for different categories of function: Some, like is_directory, are meant to query the file system, while others, like create_directory, actively modify it. Based on their functionality, the functions are broadly classified in the following categories:

  • Attribute functions: Provide for miscellaneous information such as file size and disk usage.
  • File system-manipulation functions: Meant to create regular files, directories, and symbolic links; copy and rename files; and provide for deletion.
  • Utility/predicate functions: Test for the existence of a file, and so on.
  • Miscellaneous convenience functions: Change file name extensions programmatically, and so on.

Attribute functions

The Boost Filesystem Library consists of the following attribute functions:

  • uintmax_t file_size(const path&): Returns the size of a regular file in bytes
  • boost::filesystem::space_info space(const path&): Takes in a path and returns a space_info structure defined as:
    struct space_info { 
      uintmax_t capacity;
      uintmax_t free;
      uintmax_t available;
    };

    Based on the disk partition to which the file system belongs, this routine returns the same disk usage statistics in bytes for all directories in that partition. For example, both C:\src\dir1 and C:\src\dir2 would return the same disk use data.

  • std::time_t last_write_time(const path&): Returns the last modification time of a file.
  • void last_write_time(const path&, std::time_t new_time): Modifies the last modification time of a file.
  • const path& current_path( ): Returns the full path for the current working directory of the program (Note that this path may not be the same directory from which the program was originally run, because it is programmatically possible to change directories.).

File system-manipulation functions

This set of functions is responsible for creating new files and directories, removing files, and so on:

  • bool create_directory(const path&): This function creates a directory with the given path name. (Note that if the PATHNAME itself consists of invalid characters, the result is often platform defined. For example, in both UNIX and Windows systems, the asterisk (*), the question mark (?), and other such characters are considered invalid and cannot be in a directory name.)
  • bool create_directories(const path&): You can create a directory tree as opposed to a single directory using this API. For example, consider the directory tree /a/b/c, which must be created inside the /tmp folder. Calling this API accomplishes the task, while create_directory, with the same argument, will throw an exception.
  • bool create_hard_link (const path& frompath, const path& topath): This function creates a hard link between frompath and topath.
  • bool create_symlink(const path& frompath, const path& topath): This function creates a symbolic (soft) link between frompath and topath.
  • void copy_file(const path& frompath, const path& topath): The contents and attributes of the file referred to by frompath is copied to the file referred to by topath. This routine expects a destination file to be absent; if the destination file is present, it throws an exception. This, therefore, is not equivalent to the system specified cp command in UNIX. It is also expected that the frompath variable would refer to a proper regular file. Consider this example: frompath refers to a symbolic link /tmp/file1, which in turn refers to a file /tmp/file2; topath is, say, /tmp/file3. In this situation, copy_file will fail. This is yet another difference that this API sports compared to the cp command.
  • void rename(const path& frompath, const path& topath): This function is the API for renaming a file. It is also possible to simultaneously rename and change the location of the file by specifying the full path name in the topath argument, as Listing 8 shows.
    Listing 8. The rename functionality in Boost
    #include <stdio.h>
    #include “boost/filesystem.hpp”
    int main()
      {
      boost::filesystem::path path("/home/user1/abc"); 
      boost::filesystem::rename(path, "/tmp/def");  
      return 0;
      }
     
    // abc is renamed def and moved to /tmp folder
  • bool remove(const path& p): This routine attempts to remove the file or directory being referred to by the path p. In the case of a directory, if the contents of the directory are not already empty, this routine throws an exception. A word of caution: This routine does not care what it's deleting, even if the same file is being simultaneously accessed by other programs!
  • unsigned long remove_all(const path& p): This API attempts to remove a file or directory referred by path p. Unlike remove, it makes no special consideration for directories that are not empty. This function is the Boost equivalent of the UNIX rm –rf command.

Utility and predicate functions

The Boost Filesystem Library contains the following utility and predicate functions:

  • bool exists(const path&): This function checks for the existence of a file. The file could be anything: a regular file, a directory, a symbolic link, and so on.
  • bool is_directory(const path&): This function checks whether a path corresponds to a directory.
  • bool is_regular(const path&): This function checks for a normal file (that is, not a directory, a symbolic link, a socket, or a device file).
  • bool is_other(const path&): Typically, this function checks for device files such as /dev/tty0 or socket files.
  • bool is_empty(const path&): If the path corresponds to a folder, this function checks whether the folder is empty and returns True accordingly. If the path corresponds to a file, this function checks whether file size equals 0. In case of hard or symbolic links to a file, this API checks whether the original file is empty.
  • bool equivalent(const path1& p1, const path2& p2): This extremely useful API compares relative- and absolute-style path names. Consider Listing 9:
    Listing 9. Testing for the equivalence of two paths
    #include <stdio.h>
    #include “boost/filesystem.hpp”
    int main()
      {
      boost::filesystem::path path1("/usr/local/include"); // random pathname
      boost::filesystem::path path2("/tmp/../usr/local/include");
      bool result = boost::filesystem::is_equivalent(path1, path2);  
      printf(“Paths are equivalent : %d\n”, result);
      return 0;
      }
     
    // result: 1
  • path system_complete(const path&): This function is another API along the same lines as bool equivalent(const path1& p1, const path2& p2). Given an arbitrary file path from the current working directory, this API returns the absolute path name of the file. For example, if the user is in the directory /home/user1 and queries for a file ../user2/file2, this function returns /home/user2/file2, which is the complete PATHNAME of the file, file2.

Miscellaneous functions

The Boost Filesystem Library consists of the following miscellaneous functions:

  • std::string extension(const path&): This function returns the extension of a given file name prefixed with a period (.). For example, for a file with name test.cpp,extension returns .cpp. In case a file does not have an extension, the function returns an empty string. In the case of hidden files (that is, files whose names begin with a . in UNIX systems), the function appropriately calculates the extension type or returns empty string (so, for .test.profile, this routine returns .profile).
  • std::string basename(const path&): This is the complementary routine to extension: It returns the string before . in the file name. Note that even in cases where a absolute file name is provided, this API still returns only that string that directly forms part of the file name, as shown in Listing 10.
    Listing 10. Using boost::basename
    #include <stdio.h>
    #include <cstring>
    #include “boost/filesystem.hpp”
    use namespace std;
    int main()
      {
      boost::filesystem::path path1("/tmp/dir1/test1.c "); 
      boost::filesystem::path path2("/tmp/dir1/.test1.profile");
      string result1 = boost::filesystem::basename (path1);  
      string result2 = boost::filesystem::basename (path2);
      printf(“Basename 1: %s  Basename2 : %s\n”, result1.c_str(), result2.c_str());
      return 0;
      }
     // result: Basename1: test1 Basename2: .test1
  • std::string change_extension(const path& oldpath, const std::string new_extension): This API returns a new string that reflects the changed name. Note that the file corresponding to oldpath remains unchanged. This is just a convenience function. Also note that you must explicitly specify the dot in the extension. For example, change_extension("test.c", "so") results in testso as opposed to test.so.

Conclusion

This article provided a brief overview of the Boost Filesystem Library. It's not to be treated as a comprehensive document of the entire file system interface in Boost. The internals of the API set are not discussed, nor are the subtleties of these APIs in a non-UNIX or Windows platform such as VMS. For more information about the file system, see Resources.

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=299915
ArticleTitle=Discover the Boost Filesystem Library
publish-date=04082008