A common requirement for a content management system is to be able to store historic versions of content to undo a change or see what kind of content was active in the past.
By default WCM stores versions of content and components and even pages. Those versions are typically living on the authoring system.
For some customers there is a legal requirement to be able to not only see what kind of content existed in the past but to reconstruct the web page at a certain point of time.
The following architecture outline shows a possible implementation for such a requirement.
A WCM/Portal based system typically consists of data stored in the file system of Portal, the database, the LDAP and possibly backend systems connected to Portal/WCM. While the outlined architecture does not cover backend systems those could also be included in a solution if needed.
An additional environment will be dedicated as Archive environment. The environment will have a copy of the production LDAP, it's own database and Portal install. Ideally all of these components will be integrated into a single virtual machine (e.g. KVM or VMWare). Live syndication to the Archive environment as well as parallel code deployment whenever a deployment is done on production will be done on the environment. At a certain point in time - e.g. each evening - syndication will be deactivated (can be scripted) and the whole virtual image is backed up. It is important to disable the automatic update of the time inside the virtual image.
Whenever the need arises to "restore" the web site to a certain point in time the image from that time will be restored. Since it will have the local time still no actions to expire content, ... will run. Since all required components are contained within the image the content will look on the web site as it did before. As mentioned above if there are critical backend systems that also need to be included then those could have a similar archive environment or other solutions could be chosen to restore those systems to the point in time.
Another possibility used at customers is to use a tool that walks through the site / records the actual web site response like e.g. IBM Tealeaf.