System Administration Toolkit: Migrating and moving UNIX directory trees

Occasionally, you need to copy around an entire UNIX® directory tree, either between areas on the same system or between different systems. There are many different methods of achieving this, but not all preserve the right amount of information or are compatible across different systems. This article discusses the various options available for UNIX and how best to make them work.

Martin Brown (mc@mcslp.com), Freelance Writer, Freelance Developer

Martin Brown has been a professional writer for more than seven years. He is the author of numerous books and articles across a range of topics. His expertise spans myriad development languages and platforms -- Perl, Python, Java™, JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows®, Solaris, Linux, BeOS, Mac OS X and more -- as well as Web programming, systems management, and integration. He is a Subject Matter Expert (SME) for Microsoft® and regular contributor to ServerWatch.com, LinuxToday.com, and IBM developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and other sites. You can contact him through his Web site.



25 July 2006

Also available in Chinese Russian

About this series

The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.


Using cp

The standard cp command is capable of copying entire directory trees if you use the -r command-line option to recurse into subdirectories. This option performs an unspecified operation on non-standard files. Some UNIX variants and the GNU cp tool support the -R option that correctly copies named pipes, links, and other files.

At the simplest level, the cp command can copy one directory to a new directory with a different name (see Listing 1).

Listing 1. cp command -- copying one directory to a new directory with a different name
$ cp -r srcdir destdir

You should, however, be careful when you specify source files and target locations with the cp command, as the way they are handled can have a significant effect on the results. For example, let's assume that you want to copy the directory /home/mc to the directory /export/home/mc. If /export/home/mc does not exist, then Listing 2 copies the directory /home/mc to /export/home/mc.

Listing 2. Specifying source files and target locations with the cp command
$ cp -r /home/mc /export/home/mc

If, however, /export/home/mc already exists, then Listing 2 copies the directory /home/mc into the directory, creating the new directory /export/home/mc/mc.

To copy the contents of one directory into an existing directory, select the files in the source directory, as shown in Listing 3.

Listing 3. Copying the contents of one directory into an existing directory
$ cp -r /home/mc/* /export/home/mc

One very useful option with the cp tool is to use the -p command-line option, which also ensures that the permissions and ownership of each file are retained.


Using tar

The tar command was originally developed for archiving files to tape (literally, tape archive). For example, you might copy the files in the current directory to a tape using the command in Listing 4.

Listing 4. Copying the files in the current directory to tape using tar
$ tar cf /dev/rmt0 .

Listing 4 can be dissected as follows:

  • The c option creates a new archive.
  • The f option uses the next option on the command line as the name of the destination. In this case, use the first raw tape device (/dev/rmt0). You could also create a tar file with all of the information in it.
  • The . tells tar to add every file and directory (and all the files and directories below the current one) to the archive.

However, rather than copy files and a directory structure to a tape, you can also use tar to copy into a file. Even more usefully, you can copy files into the standard output and, then using pipes, you can extract the files from the standard input and copy the files from one location to another. The tar command is also generally more reliable at copying and recreating non-standard file types on systems, as the cp command does not support the -R command-line option.

For example, Listing 5 shows how to copy the files from the current directory to an existing directory.

Listing 5. Copying the files from the current directory to an existing directory
$ tar cf - . | (cd DIR; tar xf - )

Listing 5 can be dissected as follows:

  • tar cf - . creates a new archive, to standard output, of the files in the current directory.
  • cd DIR changes the directory. Note that this directory should exist before you start copying files into it.
  • tar xf - extracts the files from the standard input.
  • By placing the above two components into parentheses, they are effectively treated as one command, rather than two, with the change directory command occurring before the archive is extracted.
  • The pipe between the two (|) feeds the standard output from the first tar into the standard input of the second, effectively copying the files into, and then out of, a non-existent archive file.

The tar command retains the full path of the files included in the archive, if you specify the path explicitly. Listing 6 copies files into the archive with their explicit path, which means that they cannot be extracted anywhere but back to their original location.

Listing 6. Specifying the path explicitly
$ tar cf  myhome.tar /home/mc

Some tar variants include support to strip off the leading forward slash, enabling you to extract the files anywhere. To ensure you can always put the files where you want, you should add files from the current directory, using Listing 7.

Listing 7. Adding files from the current directory
$ cd /home/mc
$ tar cf myhome.tar .

The tar command has an advantage over cp, in that you can monitor the transfer of files as they are copied between the source and destination by adding the v command-line option to switch on verbose mode. Generally, it is best to use this on the portion of the command that is extracting files instead of creating them, as it ensures that the files have been copied properly, rather than confirming that they have been read properly (see Listing 8).

Listing 8. Adding the v command-line option
$ tar cf - .|(cd /tmp/mc; tar xvf -)
./
./.bash_aliases
./.bash_history
./.bash_path
./.bash_profile
./.bash_vars
./.bashrc
./xmlsimple.pl
./rest.xml
...

Note that if the tar supported on your system has problems with long pathnames, then it might not support the newer tar format. GNU tar supports the new tar format and has no problems with long, or very deep, pathnames.

By default, most tar variants correctly copy and recreate files and directories with the same ownership and permission information, however, some variants adapt this information if you are running the root user and change the ownership when the files are extracted. You can ensure that permissions and ownership are preserved using the p option (see Listing 9).

Listing 9. Using the p option
$ tar cpf - .|(cd /tmp/mc; tar xvpf -)

Finally, you can also create a new directory for the files to be copied into, by extending the second half of the command (see Listing 10).

Listing 10. Creating a new directory for the files to be copied into
$ tar cpf - .|(mkdir /tmp/mc; cd /tmp/mc; tar xvpf -)

On its own, tar is a very useful tool for copying files and directory structures. However, it really comes into its own when you use it to copy files over a network. Before you look at that trick, you'll use the same basic method with another archiving tool, cpio.


Using cpio

The cpio tool is similar to the tar tool, but rather than accepting a file or directory specification, you must supply it with a list of files. This can be more practical if you only want to copy specific files. For example, to create a cpio archive containing specific directories, you might use Listing 11.

Listing 11. Creating a cpio archive containing specific directories
$ ls ./dira ./dirc |cpio -ov > diranc.cpio

The ls portion of this command outputs a list of the files (in this case, the contents of the two directories) to be copied. The latter half is the cpio command to copy them into archive. By dissecting this, you get two options:

  • The o option copies files out to an archive.
  • The v option displays a list of files as they are copied, which is useful for verification.

The actual archive is created by redirecting the output of cpio into a new file.

The above command is limited, in that it will only copy in files that are explicitly listed. The best way to copy in an entire directory is to use the find command (see Listing 12).

Listing 12. Using the find command to copy in an entire directory
$ find . |cpio -ov >archive.cpio

To extract files from a cpio archive, use the i command-line option. You should also use the d option to ensure that any directories in the archive that do not exist in the destination structure are recreated. By using the two together, you can copy from one directory to another, as shown in Listing 13.

Listing 13. Using the i and d options together
$ find . |cpio -ov |(cd /tmp/mc; cpio -idv)
.
./.bash_aliases
./.bash_history
./.bash_path
./.bash_profile
./.bash_vars
./.bashrc
./xmlsimple.pl
./rest.xml
46 blocks
.
.bash_aliases
.bash_history
.bash_path
.bash_profile
.bash_vars
.bashrc
xmlsimple.pl
rest.xml
46 blocks

Because you use verbose mode in both portions of the command, you can confirm whether the size of the archive created and extracted is identical. In this case, both operations used 46 blocks.

Note that cpio will not overwrite files on the destination if they have the same, or newer, modification time.


Copying over a network

An obvious way of transferring files over a network within UNIX is to use Network File System (NFS) to mount the remote directory and copy between them. That is a straightforward solution, but it is not always possible, or practical, for all situations.

One of the simplest ways to copy files over a network is to use tar or cpio to create an archive file, which you can then transfer over a network. The method has some advantages, such as the flexibility of how and when you copy the files, but also has disadvantages, including the complexity of the copy process and the disk space requirement to store a complete duplicate of the files, both when you create the archive on the source and when you copy the archive to the destination.

As you've seen, it's straightforward to create an archive:

Listing 14. Creating an archive
$ tar cf mydir.tar .

You can then copy the file over using whatever method is appropriate, for example, copy the file over with cp and NFS, or transfer to a remote system with FTP or SFTP.

The archive file method, however, is not a particularly efficient method. You can improve the efficiency by using compression.


Using compression

If you are creating an archive with cpio or tar and are copying the file to a destination over a slow link (for example, a WAN or the Internet ,rather than a LAN environment), then you can save time by compressing the archive file before transfer. Choosing the right compression format will be dependent on the level of compression you want.

The archive method is straightforward. You can either do it post archive creation, as shown in Listing 15.

Listing 15. Archiving post archive creation
$ tar cf mydir.tar .
$ bzip2 mydir.tar

You can also do it by using a pipe to generate a compressed version of the archive (see Listing 16).

Listing 16. Using a pipe to generate a compressed version of the archive
$ tar cf - .| bzip2 >mydir.tar.bz2

The method in Listing 16 has the benefit that it works with all versions of tar, cpio, or any other archive tool. It also works across a range of different platforms, where different variants of tar might or might not support inline compression. If you have a version of GNU tar installed, you can compress using Gzip by using the z command-line option to tar (see Listing 17).

Listing 17. Using the z command-line option to tar
$ tar zcf mydir.tar.gz .

Another alternative for copying directories between systems is to use the pipe solution shown in Listing 16, but then use a remote shell tool as the destination.


Copying directly over a network

You can copy directly over a network by piping the output from a typical tar or cpio command through a remote shell, such as remote shell (rsh) or secure shell (ssh). Which remote shell technology you use is entirely up to the shells available in your environment. The former, rsh, is a standard remote shell system that offers basic authentication security, but no encryption, while the latter, ssh, offers both authentication and encryption of the data.

Both methods use the same basic command-line structure (see Listing 18).

Listing 18. Copying directly over a network
$ tar cf - ./*|rsh remotehost tar xf - -C /remotedir

This command is similar to the localized tar, except that the destination tar command is being executed on the remote system. The system works because of the pipe between the two commands.

Remember that depending on your remote shell configuration, you might need to enter a password during the process to authenticate on the remote machine. The same process can also be used with ssh. Listing 19 specifies a user/host combination.

Listing 19. Specifying a user/host combination for authentication on the remote machine
$ tar cf - ./*|ssh user@remotehost tar xf - -C /remotedir

For better performance over slow links, you should use compression, as shown in Listing 20.

Listing 20. Using compression when copying directly over a network
$ tar czf - ./*|ssh user@remotehost tar xzf - -C /remotedir

Both rsh and ssh also have simpler command-line cousins that can make the process of copying from a remote system even more straightforward. For example, with rcp, the cousin to rsh, you would use Listing 21.

Listing 21. Copying a remote system with rcp
$ rcp -r filename remotehost:/remotedir

You must use the -r command-line option to copy directories recursively.

The scp command, cousin to ssh, uses the same structure (see Listing 22).

Listing 22. Using scp
$ scp -r filename remotehost:/remotedir

Synchronizing over a network

All of the above solutions have been concerned with copying files, both locally and over a network. However, they all rely on copying an entire directory structure each time the copy is made when it might not always be necessary. Sometimes, you only need to copy the files that have changed since the last time you performed a copy, essentially synchronization rather than a complete re-copy.

If you are using tar or cpio, then you can achieve a time-based synchronization by explicitly specifying the files that you want to include in the archive. For example, if you are running a synchronization job through cron, then you can use a command like this to create an archive that only contains files changed within the last day (see Listing 23).

Listing 23. Creating an archive that only contains the files changed within the last day
$ tar cf archive.tar `find . -mtime -1 -type f`

The find command finds files where the modification has been changed in the last one day. I only select files, because if you include directories, then tar includes all files within that directory and includes more information than you want in the archive file.

For a more robust synchronization, you can use the rsync tool, a free software utility that can efficiently exchange files over the network. The rsync tool can be an effective way of copying and synchronizing files, especially over slower links.


Summary

There are a wide range of different tools and choices available to you when copy files and directory trees in UNIX, whether on the same system or between systems over any kind of network. Which tool you use depends on your exact situation and environment. I tend to use tar, because it is the most compatible tool across the range of different UNIX systems that I use. For users in Linux® environments, the scp tool, which is a standard component on most Linux distributions, might be more appropriate.

Resources

Learn

Get products and technologies

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=150022
ArticleTitle=System Administration Toolkit: Migrating and moving UNIX directory trees
publish-date=07252006