Incremental backups on systems with a large number of files

The client can use large amounts of memory to run incremental backup operations, especially on file systems that contain large numbers of files.

The term memory as used here is the addressable memory available to the client process. Addressable memory is a combination of physical RAM and virtual memory.

On average, the client uses approximately 700 bytes of memory per object (file or directory). Thus for a file system with one million files and directories, the client requires, on average, approximately 700 MB of memory. The exact amount of memory that is used per object varies, depending on the length of the object path and name length, or the nesting depth of directories. The number of bytes of data is not an important factor in determining the backup-archive client memory requirement.

The maximum number of files can be determined by dividing the maximum amount of memory available to a process by the average amount of memory that is needed per object.

The total memory requirement can be reduced by any of the following methods:

  • Use the client option memoryefficientbackup diskcachemethod. This choice reduces the use of memory to a minimum at the expense of performance and a significant increase in disk space that is required for the backup. The file description data from the server is stored in a disk-resident temporary database, not in memory. As directories on the workstation are scanned, the database is consulted to determine whether to back up, update, or expire each object. At the completion of the backup, the database file is deleted.
  • Use the client option memoryefficientbackup yes. The average memory that is used by the client then becomes 700 bytes times the number of directories plus 700 bytes per file in the directory that is being processed. For file systems with large numbers (millions) of directories, the client still might not be able to allocate enough memory to perform incremental backup with memoryefficientbackup yes.
  • Oracle Solaris operating systemsLinux operating systemsAIX operating systemsUNIX and Linux® clients might be able to use the virtualmountpoint client option to define multiple virtual mount points within a single file system, each of which can be backed up independently by the client.
  • If the client option resourceutilization is set to a value greater than 4, and multiple file systems are being backed up, then reducing resourceutilization to 4 or lower limits the process to incremental backup of a single file system at a time. This setting reduces the memory requirement. If the backup of multiple file systems in parallel is required for performance reasons, and the combined memory requirements exceed the process limits, then multiple instances of the backup client can be used to back up multiple file systems in parallel. For example, if you want to back up two file systems at the same time but their memory requirements exceed the limits of a single process, then start one instance of the client to back up one of the file systems, and start a second instance of the client to back up the other file system.
  • Use the - incrbydate client option to perform an "incremental-by-date" backup.
  • Use the exclude.dirclient option to prevent the client from traversing and backing up directories that do not need to be backed up.
  • Oracle Solaris operating systemsLinux operating systemsAIX operating systemsExcept for Mac OS X, use the client image backup function to back up the entire volume. An image backup might actually use less system resources and run faster than incremental backup of some file systems with a large number of small files.
  • Reduce the number of files per file system by spreading the data across multiple file systems.