The software that you developed after burning the midnight oil for so long crashed at the customer site. You are at your wits' end because no test case is available to help you duplicate the crash and debug what has gone wrong. This is a situation many are familiar with, but the question that keeps arising is, what can be done about it? Just dumping the stack trace is far from optimal. What you need is insight into the data structures of the code to examine their values.
The solution is the Boost Serialization library. You can use it to dump the program contents into an archive (text or XML file) and restore data from the same archive to recreate an exact snapshot of the code just before it crashed. Sound good? Read on.
The Serialization sources come with the standard Boost installation (see
Resources). Unlike a lot of other Boost libraries, Serialization
is not a header-only library, so you need to build it. To do so, look into the build
instructions that come with the installation (see Resources). If you prefer an off-the-shelf installation, look into boostpro
(again, see Resources). For purposes of this article, I have used
Boost version 1.46.1, and I compiled the code using with gcc-4.3.4.
Hello World with Boost Serialization
Let's create a proof of concept before moving on to bigger things. In Listing 1 below, you see a string whose value is dumped into an archive. In Listing 2 below, you restore the contents of the same archive to verify whether the string's value matches that of the original.
Listing 1. Saving the contents of a string into a text archive
#include <boost/archive/text_oarchive.hpp>
#include <iostream>
#include <fstream>
void save()
{
std::ofstream file("archive.txt");
boost::archive::text_oarchive oa(file);
std::string s = "Hello World!\n";
oa << s;
}
int main()
{
save();
}
|
You now load the contents back.
Listing 2. Loading the contents of a string into a text archive
#include <boost/archive/text_iarchive.hpp>
#include <iostream>
#include <fstream>
void load()
{
std::ifstream file("archive.txt");
boost::archive::text_iarchive ia(file);
std::string s;
ia >> s;
std::cout << s << std::endl;
}
int main()
{
load();
}
|
Predictably, the output of Listing 2 is "Hello World."
Now, let's take a closer look at the code. Boost
creates a text archive—a text file—of the contents you want
to dump. To dump the contents, you create a text_oarchive.
To restore the contents, you create a text_iarchive declared in the headers text_oarchive.hpp and text_iarchive.hpp, respectively.
Dumping and restoring the contents is intuitive, uses the <<
and >> operators and works exactly like streams
I/O, except that the contents are dumped into a file and restored from the
same file at a later point.
Instead of using these two different operators, however, you may want to use the same
& operator for both dump and restore.
Listing 3 below shows you how.
Listing 3. Using the & operator for dump-restore
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <iostream>
#include <fstream>
void save()
{
std::ofstream file("archive.txt");
boost::archive::text_oarchive oa(file);
std::string s = "Hello World!\n";
oa & s; // has same effect as oa << s;
}
void load()
{
std::ifstream file("archive.txt");
boost::archive::text_iarchive ia(file);
std::string s;
ia & s;
std::cout << s << std::endl;
}
|
Let's take a look into the dumped text file:
22 serialization::archive 9 13 Hello World! |
Note that the content and format of the text file may change with future Boost revisions, so it's a bad idea to have any application code that relies on the internal archive contents.
If you want an XML archive instead of a text archive, you must include the
headers xml_iarchive.hpp and xml_oarchive.hpp from the Boost sources. These
headers declare or define the XML archive semantics. However, the dump-restore
is slightly different from what you do for the text archive: The data needs to be
wrapped in a macro named BOOST_SERIALIZATION_NVP.
Listing 4 below provides the code.
Listing 4. Dump-restore from an XML archive
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#include <iostream>
#include <fstream>
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
std::string s = "Hello World!\n";
oa & BOOST_SERIALIZATION_NVP(s);
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
std::string s;
ia & BOOST_SERIALIZATION_NVP(s);
std::cout << s << std::endl;
}
|
Listing 5 shows the contents of the XML archive. The
variable name serves as the tag (<s>Hello World!</s>).
Listing 5. Contents of the XML archive
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <s>Hello World! </s> </boost_serialization> |
This is where the good stuff begins. You can serialize quite a few elements of the
C++ programming language without much extra coding.
Classes, pointers to classes, arrays, and Standard Template Library (STL)
collections can all be serialized. Listing 6 below provides an example
with arrays.
Listing 6. Dump-restore of an array of integers
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
int array1[] = {34, 78, 22, 1, 910};
oa & BOOST_SERIALIZATION_NVP(array1);
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
int restored[5]; // Need to specify expected array size
ia >> BOOST_SERIALIZATION_NVP(restored);
std::ostream_iterator<int> oi(std::cout, " ");
std::copy(a, a+5, oi);
}
int main()
{
save();
load();
}
|
That was simple. Dumping went off exactly as it did for the string class; however, during restore, you need to specify the expected array size. Otherwise, the program ends up crashing. Listing 7 provides the dumped XML archive for the code in Listing 6.
Listing 7. XML archive created by an array dump
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <array1> <count>5</count> <item>34</item> <item>78</item> <item>22</item> <item>1</item> <item>910</item> </array1> </boost_serialization> |
Can you do this just by specifying a pointer int* restored
and have the array restored for you? The short answer is no. The size must be
specified at all times. The long answer, however, is that serializing pointers to
primitive types is non-trivial.
To serialize STL lists and vectors, you must understand that for every STL type, the application code must include a header file with a similar name from the Serialization sources. For lists, you include boost/serialization/list.hpp, and so on. Note that with lists and vectors, you don't need to provide any size or range during the loading back of the information—yet another reason to prefer STL containers over application containers with identical functionality. Listing 8 below shows the code for serializing STL collections.
Listing 8. Serializing STL collections
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/serialization/list.hpp>
#include <boost/serialization/vector.hpp>
#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
float array[] = {34.2, 78.1, 22.221, 1.0, -910.88};
std::list<float> L1(array, array+5);
std::vector<float> V1(array, array+5);
oa & BOOST_SERIALIZATION_NVP(L1);
oa & BOOST_SERIALIZATION_NVP(V1);
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
std::list<float> L2;
ia >> BOOST_SERIALIZATION_NVP(L2); // No size/range needed
std::vector<float> V2;
ia >> BOOST_SERIALIZATION_NVP(V2); // No size/range needed
std::ostream_iterator<float> oi(std::cout, " ");
std::copy(L2.begin(), L2.end(), oi);
std::copy(V2.begin(), V2.end(), oi);
}
|
Listing 9 shows what the XML archive looks like.
Listing 9. Dumped archive while using STL containers
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <L1> <count>5</count> <item_version>0</item_version> <item>34.200001</item> <item>78.099998</item> <item>22.221001</item> <item>1</item> <item>-910.88</item> </L1> <V1> <count>5</count> <item_version>0</item_version> <item>34.200001</item> <item>78.099998</item> <item>22.221001</item> <item>1</item> <item>-910.88</item> </V1> </boost_serialization> |
Do you want to serialize your own types? Yes! Let's take a small example of a structure that represents the date:
typedef struct date {
unsigned int m_day;
unsigned int m_month;
unsigned int m_year;
} date;
|
To make a class serialized, you need to define a method called serialize
as part of the class definition. This method is called during the dump, as well
as during the restoration of the class. Here's the declaration for the
serialize method:
template<class Archive>
void serialize(Archive& archive, const unsigned int version)
{
//… your custom code here
}
|
From the second snippet, you can see that serialize is a
template function, and the first argument is expected to be a reference to the
Boost archive. So, what should the code look like for an XML archive? Here's what
you put in:
template<class Archive>
void serialize(Archive& archive, const unsigned int version)
{
archive & BOOST_SERIALIZATION_NVP(m_day);
archive & BOOST_SERIALIZATION_NVP(m_month);
archive & BOOST_SERIALIZATION_NVP(m_year);
}
|
Listing 10 below provides the complete code.
Listing 10. Dump-restore of date type
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <iostream>
#include <fstream>
typedef struct date {
unsigned int m_day;
unsigned int m_month;
unsigned int m_year;
date( int d, int m, int y) : m_day(d), m_month(m), m_year(y)
{}
date() : m_day(1), m_month(1), m_year(2000)
{}
friend std::ostream& operator << (std::ostream& out, date& d)
{
out << "day: " << d.m_day
<< " month: " << d.m_month
<< " year: " << d.m_year;
return out;
}
template<class Archive>
void serialize(Archive& archive, const unsigned int version)
{
archive & BOOST_SERIALIZATION_NVP(m_day);
archive & BOOST_SERIALIZATION_NVP(m_month);
archive & BOOST_SERIALIZATION_NVP(m_year);
}
} date;
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
date d(15, 8, 1947);
oa & BOOST_SERIALIZATION_NVP(d);
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
date dr;
ia >> BOOST_SERIALIZATION_NVP(dr);
std::cout << dr;
}
|
Note that except for defining the serialize method, you
haven't done anything special for handling user-defined types. The above code works
fine, but there is an obvious problem: You may need to serialize types that come
from third parties and whose class declarations may not be modified. For such
situations, you would use the non-intrusive version of serialize
that you defined outside of the class scope. Listing 11 below shows the
non-intrusive serialize method for the date
class. Note that the code still works if the serialize method
is defined in the global scope; however, it's good coding practice to define the method
in the relevant namespace.
Listing 11. Non-intrusive version of the serialize method
namespace boost {
namespace serialization {
template<class Archive>
void serialize(Archive& archive, date& d, const unsigned int version)
{
archive & BOOST_SERIALIZATION_NVP(d.m_day);
archive & BOOST_SERIALIZATION_NVP(d.m_month);
archive & BOOST_SERIALIZATION_NVP(d.m_year);
}
} // namespace serialization
} // namespace boost
|
Listing 12 shows the XML archive for the date type.
Listing 12. XML archive for user-defined types
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <d class_id="0" tracking_level="0" version="0"> <d.m_day>15</d.m_day> <d.m_month>8</d.m_month> <d.m_year>1947</d.m_year> </d> </boost_serialization> |
Classes are often derived from other classes, and you need a way to handle the
serialization of base classes while serializing the derived class. For both the base and
the derived class, you must define the serialize method.
In addition, you need to tweak the derived class's serialize
definition, as shown in Listing 13.
Listing 13. Serializing the base class
template<class Archive>
void serialize(Archive& archive, const unsigned int version)
{
// serialize base class information
archive & boost::serialization::base_object<Base Class>(*this);
// serialize derived class members
archive & derived-class-member1;
archive & derived-class-member2;
// …
}
|
It's a very bad idea to call the base class's serialize
method directly inside the derived class's serialize method.
It may work, perhaps, but it won't be possible to track class versioning (described later)
or eliminate redundancies in the generated archive. A recommended coding style to
avoid such a mistake is to make the serialize method private
in all classes and use the declaration friend class boost::serialization::access
in all the classes that are to be serialized.
Dumping derived classes through base class pointers
Dumping derived classes through pointers is entirely possible; however, both the base
and derived classes should have the respective serialize
methods defined. Also, you need to call the method...
<archive name>.register_type<derived-type name>( ) |
. . . during the dump, as well as during the restore process. You should assume that your
date class is derived from some class called
base. Listing 14 shows what
you should code in the save and load
methods.
Listing 14. Using base class pointers for serialization
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
oa.register_type<date>( );
base* b = new date(15, 8, 1947);
oa & BOOST_SERIALIZATION_NVP(b);
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
ia.register_type<date>( );
base *dr;
ia >> BOOST_SERIALIZATION_NVP(dr);
date* dr2 = dynamic_cast<date*> (dr);
std::cout << dr2;
}
|
Here, the base pointer is used during both dump and
restore. However, it's actually the date object that
is being serialized. You have registered the date type before the dump and
restored it in both cases.
Using pointers to objects during dump-restore
It is possible to dump and restore using pointers to objects. Doing so makes matters interesting. What do you expect the XML archive contents to be? Clearly, dumping pointer values won't make the cut. You need the actual object to be dumped and later restored. Also, what about multiple pointers to the same object? If the XML archive has multiple copies of the same object dumped, then it's clearly less than optimal. The great thing about Boost Serialization is that the syntax is much the same everywhere, including that for pointers. Listing 15 below is a modified version of Listing 10.
Listing 15. Dump-restore using pointers
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
date* d = new date(15, 8, 1947);
std::cout << d << std::endl;
oa & BOOST_SERIALIZATION_NVP(d);
// … other code follows
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
date* dr;
ia >> BOOST_SERIALIZATION_NVP(dr);
std::cout << dr << std::endl;
std::cout << *dr;
}
|
Note that in this listing, the values of d and
dr are different, but the contents are same.
Listing 16 shows the XML archive for the code in
Listing 15.
Listing 16. XML archive with pointer usage
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <d class_id="0" tracking_level="1" version="0" object_id="_0"> <d.m_day>15</d.m_day> <d.m_month>8</d.m_month> <d.m_year>1947</d.m_year> </d> </boost_serialization> |
Now, consider a case where you dump two pointers to the same object and observe what the archive for the same looks like. Listing 17 shows the slightly modified version of the code in Listing 15.
Listing 17. Dump-restore using pointers
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
date* d = new date(15, 8, 1947);
std::cout << d << std::endl;
oa & BOOST_SERIALIZATION_NVP(d);
date* d2 = d;
oa & BOOST_SERIALIZATION_NVP(d2);
// … other code follows
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
date* dr;
ia >> BOOST_SERIALIZATION_NVP(dr);
std::cout << dr << std::endl;
std::cout << *dr;
date* dr2;
ia >> BOOST_SERIALIZATION_NVP(dr2);
std::cout << dr2 << std::endl;
std::cout << *dr2;
}
|
Listing 18 below provides the XML archive for the code in Listing 17. Observe how the second pointer is handled; also, only a single object has been dumped.
Listing 18. XML archive with d2 being a pointer
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <d class_id="0" tracking_level="1" version="0" object_id="_0"> <d.m_day>15</d.m_day> <d.m_month>8</d.m_month> <d.m_year>1947</d.m_year> </d> <d2 class_id_reference="0" object_id_reference="_0"></d2> </boost_serialization> |
The handling is exactly the same in the user application code when it comes to
references. However, note that during restoration, two unique objects are
created. For that reason, the archive should also hold two objects but with
same values. Unlike the case of pointers, here's what the archive would look
like if d2 were a reference in Listing 17 (see
Listing 19 below).
Listing 19. XML archive, with d2 being a reference to d
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <d class_id="0" tracking_level="0" version="0"> <d.m_day>15</d.m_day> <d.m_month>8</d.m_month> <d.m_year>1947</d.m_year> </d> <d2> <d.m_day>15</d.m_day> <d.m_month>8</d.m_month> <d.m_year>1947</d.m_year> </d2> </boost_serialization> |
Splitting serialize into save and load
There are times when you don't want to use the same serialize
method to dump and restore objects. Under such circumstances, you can split the
serialize method into two methods—aptly named
save and load—and
have similar signatures. Both methods are part of the same class that earlier defined
serialize. Also, you need to add the macro
BOOST_SERIALIZATION_SPLIT_MEMBER as part of the class
definition. Listing 20 shows what the methods look like.
Listing 20. Splitting serialize into the save and load methods
template<class Archive>
void save(Archive& archive, const unsigned int version) const
{
//…
}
template<class Archive>
void load(Archive& archive, const unsigned int version)
{
//…
}
BOOST_SERIALIZATION_SPLIT_MEMBER( ) // must be part of class
|
Note const after the save
method signature. Without the const qualifier, the
code will not compile. For your date class,
Listing 21 shows how the methods now look.
Listing 21. Save and load methods for the date class
template<class Archive>
void save(Archive& archive, const unsigned int version) const
{
archive << BOOST_SERIALIZATION_NVP(m_day);
archive << BOOST_SERIALIZATION_NVP(m_month);
archive << BOOST_SERIALIZATION_NVP(m_year)
}
template<class Archive>
void load(Archive& archive, const unsigned int version)
{
archive >> BOOST_SERIALIZATION_NVP(m_day);
archive >> BOOST_SERIALIZATION_NVP(m_month);
archive >> BOOST_SERIALIZATION_NVP(m_year)
}
BOOST_SERIALIZATION_SPLIT_MEMBER( ) // must be part of class
|
The method signatures for serialize, save,
and load have an unsigned integer version as the last
argument. What is this number for? Classes over time might have their internal
variable names changed, have new fields added or remove existing ones, and so on.
It's a natural progression for the software development process, except that the
archive retains information about the old state of the data type. To circumvent
the problem, you use version numbers.
Let's take an example with the date class. Suppose you
introduce a new field called m_tag of type
string in the date class.
The previous version of the class was dumped as version 0 in the archive, as per
Listing 12. Listing 22 below shows the
load method for the class (you could have used
serialize but using load
makes for a cleaner implementation here).
Listing 22. Using versioning to handle newer class fields
template<class Archive>
void load(Archive& archive, const unsigned int version)
{
archive >> BOOST_SERIALIZATION_NVP(m_day);
archive >> BOOST_SERIALIZATION_NVP(m_month);
archive >> BOOST_SERIALIZATION_NVP(m_year);
if (version > 0)
archive >> BOOST_SERIALIZATION_NVP(m_tag);
}
|
Clearly, using versioning properly makes the code work with legacy archives used in earlier generations of the software.
Shared pointers are an oft-used but nonetheless extremely powerful programming
technique. One of the major advantages of Boost Serialization is that, once again,
serializing shared pointers is a breeze and the syntax the same as what you have
learned thus far. The only caveat is that you must include the header
boost/serialization/shared_ptr.hpp in the application code. Start by modifying
Listing 15 and use boost::shared_ptr
instead of normal pointers. The code is shown Listing 23 below.
Listing 23. Dump-restore using shared pointers
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
boost::shared_ptr<date> d (new date(15, 8, 1947));
oa & BOOST_SERIALIZATION_NVP(d);
// … other code follows
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
boost::shared_ptr<date> dr;
ia >> BOOST_SERIALIZATION_NVP(dr);
std::cout << *dr;
}
|
Is serialization the panacea for all evils? Not yet. There are usage limitations as to what Serialization does and does not support. For example, if a pointer to an object in the stack is dumped before the actual object itself, Boost Serialization crashes. The object needs to be dumped first, and then the pointer to the object (note that a pointer can be dumped as a stand alone without the object being dumped). Take the example in Listing 24.
Listing 24. A pointer to an object on the stack needs to be dumped after the actual object
void save()
{
std::ofstream file("archive.xml");
boost::archive::xml_oarchive oa(file);
date d(15, 8, 1947);
std::cout << d << std::endl;
date* d2 = &d;
oa & BOOST_SERIALIZATION_NVP(d);
oa & BOOST_SERIALIZATION_NVP(d2);
}
void load()
{
std::ifstream file("archive.xml");
boost::archive::xml_iarchive ia(file);
date dr;
ia >> BOOST_SERIALIZATION_NVP(dr);
std::cout << dr << std::endl;
date* dr2;
ia >> BOOST_SERIALIZATION_NVP(dr2);
std::cout << dr2 << std::endl;
}
|
In this listing, you cannot dump d2 before d.
If you look into the XML archive, this becomes clearer: d2
is dumped as a reference to d (see
Listing 25).
Listing 25. XML archive in which both an object and its pointer are dumped
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <!DOCTYPE boost_serialization> <boost_serialization signature="serialization::archive" version="9"> <d class_id="0" tracking_level="1" version="0" object_id="_0"> <d.m_day>15</d.m_day> <d.m_month>8</d.m_month> <d.m_year>1947</d.m_year> </d> <d2 class_id_reference="0" object_id_reference="_0"></d2> </boost_serialization> |
If there are multiple pointers to the same object, Serialization associates the pointer
with the original object with the class_id_reference
(every object has a unique class ID). Every subsequent pointer to the original
object will have the object_id_reference changed to
_1, _2, and so on.
That's it for this article. You have learned what Boost Serialization is; how to create and use text and XML archives; and how to dump and restore plain old data types (STL collections, classes, pointers to classes, shared pointers, and arrays). The article also briefly touched upon handling class hierarchies with serialization and versioning. Serialization is a powerful tool to have at your disposal. Make good use of it in your code to simplify the debug experience.
Learn
-
Get the information you need to install
Boost.
-
AIX and UNIX developerWorks
zone: The AIX and UNIX zone provides a wealth of information relating to
all aspects of AIX systems administration and expanding your UNIX skills.
-
New to AIX and UNIX?
Visit the New to AIX and UNIX page to learn more.
-
Technology
bookstore: Browse the technology bookstore for books on this and other
technical topics.
Get products and technologies
-
Learn more about the Boost
C++libraries and find information about Boost installation. -
Learn more about and download the Boost
Serialization
library.
-
boostprois the off-the-shelf Boost installer. -
Try out IBM
software for free. Download a trial version, log into an online trial, work with
a product in a sandbox environment, or access it through the cloud. Choose from over 100 IBM product trials.
Discuss
- Follow developerWorks on Twitter.
-
Participate in developerWorks blogs and get involved in the developerWorks community.
- Get involved in the My developerWorks community.
-
Participate in the AIX and UNIX® forums:
- AIX Forum
- AIX Forum for developers
- Cluster Systems Management
- Performance Tools Forum
- Virtualization Forum
- More AIX and UNIX Forums
Arpan Sen is a lead engineer working on the development of software in the electronic design automation industry. He has worked on several flavors of UNIX, including Solaris, SunOS, HP-UX, and IRIX as well as Linux and Microsoft Windows for several years. He takes a keen interest in software performance-optimization techniques, graph theory, and parallel computing. Arpan holds a post-graduate degree in software systems. You can reach him at arpansen@gmail.com.




