 | Level: Intermediate Arpan Sen (arpan@syncad.com), Lead Engineer, Synapti Computer Aided Design Pvt Ltd
08 Apr 2008 The absence of a well-defined library that deals with file system
manipulation is a long-running issue for the C++ language. In
the past, programmers have had to use native APIs to work around the problem. Discover
a library that provides a safe, portable, and easy-to-use C++
interface to facilitate file system operations: the Boost Filesystem Library.
One of the most common issues with the C++ language—and
indeed, the C++ standard—is the lack of a
well-defined library that helps deal with file system queries and manipulation. This
absence leads programmers to use the native operating system-provided application
program interfaces (APIs), which makes for code that isn't portable across platforms.
Consider a simple situation: You need to figure out whether a file is of type Directory.
In the Microsoft® Windows® platform, you could do this by calling the
GetAttributes library function, defined in the windows.h
header file:
DWORD GetFileAttributes (LPCTSTR lpFileName);
|
For directories, the result should be FILE_ATTRIBUTE_DIRECTORY, and your code must check
for the same. On UNIX® and Linux® platforms, the same functionality could
be achieved by using the stat or fstat
routines and the S_ISDIR macro defined in sys/stat.h. You must also understand the
stat structure. Here's the corresponding code:
#include <sys/stat.h>
#include <stdio.h>
int main()
{
struct stat s1;
int status = stat(<const char* denoting pathname>, &s1);
printf(“Path is a directory : %d\n”, S_ISDIR(s1.st_mode));
return 0;
}
|
For programs with heavy-duty I/O, such inconsistencies imply a significant engineering
effort for porting the code across platforms. It is in this light that you introduce
the Boost Filesystem Library. This widely used library provides a safe, portable,
and easy-to-use C++ interface for performing file
system operations. It's available as a free download from the
Boost site.
Your first program using boost::filesystem
Before delving into the finer nuances of the Boost Filesystem Library,
Listing 1 shows the code to figure out whether a file is of
type Directory using Boost APIs.
Listing 1. Code to determine whether a file is of type Directory
#include <stdio.h>
#include “boost/filesystem.hpp”
int main()
{
boost::filesystem::path path("/usr/local/include"); // random pathname
bool result = boost::filesystem::is_directory(path);
printf(“Path is a directory : %d\n”, result);
return 0;
}
|
The code is self-explanatory, and you don't need to know any system-specific
routines. The code is verified to compile on both gcc-3.4.4 and cl-13.10.3077
without modification.
Understanding the Boost path object
The key to understanding the Boost Filesystem Library is the
path object, as several routines defined in the
Filesystem Library are only meant to work with a proper
path object. Often, file system paths are operating
system dependant. For example, it's well known that UNIX and Linux systems use
the virgule ( /) character as a directory separator,
while Windows uses a backslash (\) for similar
purpose. The boost::filesystem::path is meant to
precisely abstract this feature. A path object could
be initialized in several ways, the most common being an initialization with a
char* or std::string,
as shown in Listing 2.
Listing 2. Ways to create a Boost path object
path(); // empty path
path(const char* pathname);
path(const std::string& pathname);
path(const char* pathname, boost::filesystem::path::name_check checker);
path(const std::string& pathname, boost::filesystem::path::name_check checker);
|
While initializing the path object, the PATHNAME variable
could be provided in either the native format or in the portable format defined
by the Portable Operating System Interface (POSIX) committee. Both approaches
have their relative trade-offs in practice. Consider a situation in which you want
to manipulate a directory that your software created and locate the directory in
/tmp/mywork on UNIX and Linux systems and in C:\tmp\mywork on Windows.
There are several approaches to this problem. Listing 3 shows
the native format-oriented approach.
Listing 3. Initialize the path using the native format
#ifdef UNIX
boost::filesystem::path path("/tmp/mywork");
#else
boost::filesystem::path path("C:\\tmp\\mywork ");
#endif
|
A single #ifdef is needed to initialize the path on a
per-operating system basis. However, if you prefer using the portable format, take
a look at Listing 4.
Listing 4. Initialize the path using a portable format
boost::filesystem::path path("/tmp/mywork");
|
Note that path::name_check refers to a name-checking
function prototype. The name-check function returns "True" if its argument input
PATHNAME is valid for a particular operating system or file system. Several
name-checking functions come with the Boost Filesystem Library, and you're welcome
to provide your own variants, as well. Some of the more commonly used name-check
functions are the Boost-provided portable_posix_name
and windows_name.
Overview of the path member functions
The path object comes with several member methods. These
member routines don't modify the file system but do provide useful information based
on the path name. This section provides an overview of several of these routines:
-
const std::string& string( ): This
routine returns a copy of the string with which the path was initialized,
with formatting per the path grammar rules.
-
std::string root_directory( ): Given a
path, this API would return the root directory; otherwise, it would
return an empty string. For example, if the path consists of
/tmp/var1, then this routine would return
/, which is the root of the UNIX file system.
However, if the path is a relative path such as ../mywork/bin,
this routine would return an empty string.
-
std::string root_name( ): Given a path
that begins from the root of the file system, this routine returns a string
that contains the first character of the PATHNAME.
-
std::string leaf( ): Given an absolute
path name (for example, /home/user1/file2), this routine provides only
the string corresponding to the file name (that is, file2).
-
std::string branch_path( ): This is the
complementary routine to leaf. Given a path,
it returns all the elements used to construct it except the last. For example,
for a path initialized with /a/b/c,
path.branch_path( )
returns /a/b. For paths with a single element
such as c, this routine returns an empty string.
-
bool empty( ): If the path object contains
an empty string (for example, path path1(""), then this routine
returns True or False.
-
boost::filesystem::path::iterator: This
routine is used to traverse the individual elements of the path. Consider
Listing 5.
Listing 5. Using the path::iterator (begin and end interface)
#include <iostream>
#include “boost/filesystem.hpp”
int main()
{
boost::filesystem::path path1("/usr/local/include"); // random pathname
boost::filesystem::path::iterator pathI = path1.begin();
while (pathI != path1.end())
{
std::cout << *pathI << std::endl;
++pathI;
}
return 0;
}
// result: 1
|
The output of the above program is /,
usr, local,
include, in that order, signifying the
directory hierarchy.
-
path operator / (char* lhs, const path& rhs):
This routine is a non-member function of path.
It returns a concatenation of the path formed using
lhs and rhs. It
automatically inserts / as the path separator,
as Listing 6 shows.
Listing 6. Concatenation of path strings
#include <iostream>
#include "boost/filesystem.hpp"
int main()
{
boost::filesystem::path path1("local/include");
boost::filesystem::path path2 = operator/("/usr", path1);
std::cout << "Path: " << path2 << std::endl;
return 0;
}
// result: /usr/local/include
|
 |
Error handling
File system operations often encounter unexpected issues, and the Boost Filesystem
Library reports run time errors using C++ exceptions.
The boost::filesystem_error class is derived from the
std::runtime_error class. Functions in the library use
filesystem_error exceptions to report operational
errors. Corresponding to the different possible error types, there are error codes
that are defined in the Boost headers. The user code typically resides within
try...catch blocks that use the
filesystem_error exceptions for reporting relevant error
messages. Listing 7 provides a small example of a file is being
renamed, except that the file in the from path doesn't
exist.
Listing 7. Error handling in Boost
#include <iostream>
#include “boost/filesystem.hpp”
int main()
{
try {
boost::filesystem::path path("C:\\src\\hdbase\\j1");
boost::filesystem::path path2("C:\\src\\hdbase\\j2");
boost::filesystem::rename(path, path2);
}
catch(boost::filesystem::filesystem_error e) {
// do the needful
}
return 0;
}
|
Categories of function in the Boost Filesystem Library
The boost::filesystem provides for different categories
of function: Some, like is_directory, are meant to query
the file system, while others, like create_directory,
actively modify it. Based on their functionality, the functions are broadly
classified in the following categories:
-
Attribute functions: Provide for miscellaneous information such as
file size and disk usage.
-
File system-manipulation functions: Meant to create regular files,
directories, and symbolic links; copy and rename files; and provide for
deletion.
-
Utility/predicate functions: Test for the existence of a file, and so on.
-
Miscellaneous convenience functions: Change file name extensions
programmatically, and so on.
Attribute functions
The Boost Filesystem Library consists of the following attribute functions:
-
uintmax_t file_size(const path&):
Returns the size of a regular file in bytes
-
boost::filesystem::space_info space(const path&):
Takes in a path and returns a space_info
structure defined as:
struct space_info {
uintmax_t capacity;
uintmax_t free;
uintmax_t available;
};
|
Based on the disk partition to which the file system belongs, this
routine returns the same disk usage statistics in bytes for all
directories in that partition. For example, both C:\src\dir1 and
C:\src\dir2 would return the same disk use data.
-
std::time_t last_write_time(const path&):
Returns the last modification time of a file.
-
void last_write_time(const path&, std::time_t new_time):
Modifies the last modification time of a file.
-
const path& current_path( ):
Returns the full path for the current working directory of the program
(Note that this path may not be the same directory from which
the program was originally run, because it is programmatically possible
to change directories.).
File system-manipulation functions
This set of functions is responsible for creating new files and directories,
removing files, and so on:
-
bool create_directory(const path&):
This function creates a directory with the given path name. (Note that
if the PATHNAME itself consists of invalid characters, the result is
often platform defined. For example, in both UNIX and Windows
systems, the asterisk (*), the question
mark (?), and other such characters are
considered invalid and cannot be in a directory name.)
-
bool create_directories(const path&):
You can create a directory tree as opposed to a single directory using
this API. For example, consider the directory tree /a/b/c, which must
be created inside the /tmp folder. Calling this API accomplishes the
task, while create_directory, with the same
argument, will throw an exception.
-
bool create_hard_link (const path& frompath, const path& topath):
This function creates a hard link between
frompath and
topath.
-
bool create_symlink(const path& frompath, const path& topath):
This function creates a symbolic (soft) link between
frompath and
topath.
-
void copy_file(const path& frompath, const path& topath):
The contents and attributes of the file referred to by
frompath is copied to the file referred to
by topath. This routine expects a
destination file to be absent; if the destination file is present,
it throws an exception. This, therefore, is not equivalent to the
system specified cp command in UNIX. It is
also expected that the frompath variable
would refer to a proper regular file. Consider this example:
frompath refers to a symbolic link /tmp/file1,
which in turn refers to a file /tmp/file2; topath
is, say, /tmp/file3. In this situation, copy_file
will fail. This is yet another difference that this API sports compared
to the cp command.
-
void rename(const path& frompath, const path& topath):
This function is the API for renaming a file. It is also possible to
simultaneously rename and change the location of the file by specifying
the full path name in the topath argument,
as Listing 8 shows.
Listing 8. The rename functionality in Boost
#include <stdio.h>
#include “boost/filesystem.hpp”
int main()
{
boost::filesystem::path path("/home/user1/abc");
boost::filesystem::rename(path, "/tmp/def");
return 0;
}
// abc is renamed def and moved to /tmp folder
|
-
bool remove(const path& p): This
routine attempts to remove the file or directory being referred to by
the path p. In the case of a directory, if the contents of the
directory are not already empty, this routine throws an exception. A
word of caution: This routine does not care what it's deleting,
even if the same file is being simultaneously accessed by other
programs!
-
unsigned long remove_all(const path& p):
This API attempts to remove a file or directory referred by path
p. Unlike remove, it makes no special
consideration for directories that are not empty. This function is the
Boost equivalent of the UNIX rm –rf command.
Utility and predicate functions
The Boost Filesystem Library contains the following utility and predicate functions:
-
bool exists(const path&): This
function checks for the existence of a file. The file could be anything:
a regular file, a directory, a symbolic link, and so on.
-
bool is_directory(const path&): This
function checks whether a path corresponds to a directory.
-
bool is_regular(const path&):
This function checks for a normal file (that is, not a directory, a
symbolic link, a socket, or a device file).
-
bool is_other(const path&): Typically,
this function checks for device files such as /dev/tty0 or socket files.
-
bool is_empty(const path&): If
the path corresponds to a folder, this function checks whether the
folder is empty and returns True accordingly. If the path corresponds
to a file, this function checks whether file size equals 0. In case of
hard or symbolic links to a file, this API checks whether the original
file is empty.
-
bool equivalent(const path1& p1, const path2& p2): This
extremely useful API compares relative- and absolute-style path names.
Consider Listing 9:
Listing 9. Testing for the equivalence of two paths
#include <stdio.h>
#include “boost/filesystem.hpp”
int main()
{
boost::filesystem::path path1("/usr/local/include"); // random pathname
boost::filesystem::path path2("/tmp/../usr/local/include");
bool result = boost::filesystem::is_equivalent(path1, path2);
printf(“Paths are equivalent : %d\n”, result);
return 0;
}
// result: 1
|
-
path system_complete(const path&): This
function is another API along the same lines as
bool equivalent(const path1& p1, const path2& p2).
Given an arbitrary file path from the current working directory, this
API returns the absolute path name of the file. For example, if the
user is in the directory /home/user1 and queries for a file ../user2/file2,
this function returns /home/user2/file2,
which is the complete PATHNAME of the file, file2.
Miscellaneous functions
The Boost Filesystem Library consists of the following miscellaneous functions:
-
std::string extension(const path&): This
function returns the extension of a given file name prefixed with
a period (.). For example, for a file with
name test.cpp,
extension returns
.cpp. In case a file does not have an
extension, the function returns an empty string. In the case of hidden
files (that is, files whose names begin with a .
in UNIX systems), the function appropriately calculates the extension
type or returns empty string (so, for .test.profile, this routine
returns .profile).
-
std::string basename(const path&): This
is the complementary routine to extension: It
returns the string before . in the file
name. Note that even in cases where a absolute file name is provided,
this API still returns only that string that directly forms part of the
file name, as shown in Listing 10.
Listing 10. Using boost::basename
#include <stdio.h>
#include <cstring>
#include “boost/filesystem.hpp”
use namespace std;
int main()
{
boost::filesystem::path path1("/tmp/dir1/test1.c ");
boost::filesystem::path path2("/tmp/dir1/.test1.profile");
string result1 = boost::filesystem::basename (path1);
string result2 = boost::filesystem::basename (path2);
printf(“Basename 1: %s Basename2 : %s\n”, result1.c_str(), result2.c_str());
return 0;
}
// result: Basename1: test1 Basename2: .test1
|
-
std::string change_extension(const path& oldpath, const std::string new_extension): This
API returns a new string that reflects the changed name. Note that the
file corresponding to oldpath remains
unchanged. This is just a convenience function. Also note that
you must explicitly specify the dot in the extension. For example,
change_extension("test.c", "so") results in
testso as opposed to
test.so.
 |
Conclusion
This article provided a brief overview of the Boost Filesystem Library. It's not to
be treated as a comprehensive document of the entire file system interface in
Boost. The internals of the API set are not discussed, nor are the subtleties of
these APIs in a non-UNIX or Windows platform such as VMS. For more information
about the file system, see Resources.
Resources Learn
Get products and technologies
-
IBM
trial software: Build your next development project with software for download
directly from developerWorks.
Discuss
-
Participate in the AIX and UNIX forums:
About the author  | |  | Arpan Sen is a lead engineer working on the development of software in the electronic
design automation industry. He has worked on several flavors of UNIX, including
Solaris, SunOS, HP-UX, and IRIX as well as Linux and Microsoft Windows for several
years. He takes a keen interest in software performance-optimization techniques, graph
theory, and parallel computing. Arpan holds a post-graduate degree in software systems.
You can reach him at arpan@syncad.com. |
Rate this page
|  |