System Administration Toolkit
Migrating and moving UNIX directory trees
This content is part # of # in the series: System Administration Toolkit
This content is part of the series:System Administration Toolkit
Stay tuned for additional content in this series.
About this series
The typical UNIX® administrator has a key range of utilities, tricks, and systems he or she uses regularly to aid in the process of administration. There are key utilities, command-line chains, and scripts that are used to simplify different processes. Some of these tools come with the operating system, but a majority of the tricks come through years of experience and a desire to ease the system administrator's life. The focus of this series is on getting the most from the available tools across a range of different UNIX environments, including methods of simplifying administration in a heterogeneous environment.
cp command is capable of copying entire directory trees if you use the
-r command-line option to recurse into subdirectories. This option performs an unspecified operation on non-standard files. Some UNIX variants and the GNU cp tool support the
-R option that correctly copies named pipes, links, and other files.
At the simplest level, the
cp command can copy one directory to a new directory with a different name (see Listing 1).
Listing 1. cp command -- copying one directory to a new directory with a different name
$ cp -r srcdir destdir
You should, however, be careful when you specify source files and target locations with the
cp command, as the way they are handled can have a significant effect on the results. For example, let's assume that you want to copy the directory /home/mc to the directory /export/home/mc. If /export/home/mc does not exist, then Listing 2 copies the directory /home/mc to /export/home/mc.
Listing 2. Specifying source files and target locations with the cp command
$ cp -r /home/mc /export/home/mc
If, however, /export/home/mc already exists, then Listing 2 copies the directory /home/mc into the directory, creating the new directory /export/home/mc/mc.
To copy the contents of one directory into an existing directory, select the files in the source directory, as shown in Listing 3.
Listing 3. Copying the contents of one directory into an existing directory
$ cp -r /home/mc/* /export/home/mc
One very useful option with the cp tool is to use the
-p command-line option, which also ensures that the permissions and ownership of each file are retained.
tar command was originally developed for archiving files to tape (literally, tape archive). For example, you might copy the files in the current directory to a tape using the command in Listing 4.
Listing 4. Copying the files in the current directory to tape using tar
$ tar cf /dev/rmt0 .
Listing 4 can be dissected as follows:
coption creates a new archive.
foption uses the next option on the command line as the name of the destination. In this case, use the first raw tape device (/dev/rmt0). You could also create a tar file with all of the information in it.
.tells tar to add every file and directory (and all the files and directories below the current one) to the archive.
However, rather than copy files and a directory structure to a tape, you can also use tar to copy into a file. Even more usefully, you can copy files into the standard output and, then using pipes, you can extract the files from the standard input and copy the files from one location to another. The
tar command is also generally more reliable at copying and recreating non-standard file types on systems, as the
cp command does not support the
-R command-line option.
For example, Listing 5 shows how to copy the files from the current directory to an existing directory.
Listing 5. Copying the files from the current directory to an existing directory
$ tar cf - . | (cd DIR; tar xf - )
Listing 5 can be dissected as follows:
tar cf - .creates a new archive, to standard output, of the files in the current directory.
cd DIRchanges the directory. Note that this directory should exist before you start copying files into it.
tar xf -extracts the files from the standard input.
- By placing the above two components into parentheses, they are effectively treated as one command, rather than two, with the
change directorycommand occurring before the archive is extracted.
- The pipe between the two (|) feeds the standard output from the first tar into the standard input of the second, effectively copying the files into, and then out of, a non-existent archive file.
tar command retains the full path of the files included in the archive, if you specify the path explicitly. Listing 6 copies files into the archive with their explicit path, which means that they cannot be extracted anywhere but back to their original location.
Listing 6. Specifying the path explicitly
$ tar cf myhome.tar /home/mc
Some tar variants include support to strip off the leading forward slash, enabling you to extract the files anywhere. To ensure you can always put the files where you want, you should add files from the current directory, using Listing 7.
Listing 7. Adding files from the current directory
$ cd /home/mc $ tar cf myhome.tar .
tar command has an advantage over
cp, in that you can monitor the transfer of files as they are copied between the source and destination by adding the
v command-line option to switch on verbose mode. Generally, it is best to use this on the portion of the command that is extracting files instead of creating them, as it ensures that the files have been copied properly, rather than confirming that they have been read properly (see Listing 8).
Listing 8. Adding the
v command-line option
$ tar cf - .|(cd /tmp/mc; tar xvf -) ./ ./.bash_aliases ./.bash_history ./.bash_path ./.bash_profile ./.bash_vars ./.bashrc ./xmlsimple.pl ./rest.xml ...
Note that if the tar supported on your system has problems with long pathnames, then it might not support the newer tar format. GNU tar supports the new tar format and has no problems with long, or very deep, pathnames.
By default, most tar variants correctly copy and recreate files and directories with the same ownership and permission information, however, some variants adapt this information if you are running the root user and change the ownership when the files are extracted. You can ensure that permissions and ownership are preserved using the
p option (see Listing 9).
Listing 9. Using the
$ tar cpf - .|(cd /tmp/mc; tar xvpf -)
Finally, you can also create a new directory for the files to be copied into, by extending the second half of the command (see Listing 10).
Listing 10. Creating a new directory for the files to be copied into
$ tar cpf - .|(mkdir /tmp/mc; cd /tmp/mc; tar xvpf -)
On its own, tar is a very useful tool for copying files and directory structures. However, it really comes into its own when you use it to copy files over a network. Before you look at that trick, you'll use the same basic method with another archiving tool, cpio.
The cpio tool is similar to the tar tool, but rather than accepting a file or directory specification, you must supply it with a list of files. This can be more practical if you only want to copy specific files. For example, to create a cpio archive containing specific directories, you might use Listing 11.
Listing 11. Creating a cpio archive containing specific directories
$ ls ./dira ./dirc |cpio -ov > diranc.cpio
ls portion of this command outputs a list of the files (in this case, the contents of the two directories) to be copied. The latter half is the
cpio command to copy them into archive. By dissecting this, you get two options:
ooption copies files out to an archive.
voption displays a list of files as they are copied, which is useful for verification.
The actual archive is created by redirecting the output of cpio into a new file.
The above command is limited, in that it will only copy in files that are explicitly listed. The best way to copy in an entire directory is to use the
find command (see Listing 12).
Listing 12. Using the find command to copy in an entire directory
$ find . |cpio -ov >archive.cpio
To extract files from a cpio archive, use the
i command-line option. You should also use the
d option to ensure that any directories in the archive that do not exist in the destination structure are recreated. By using the two together, you can copy from one directory to another, as shown in Listing 13.
Listing 13. Using the
d options together
$ find . |cpio -ov |(cd /tmp/mc; cpio -idv) . ./.bash_aliases ./.bash_history ./.bash_path ./.bash_profile ./.bash_vars ./.bashrc ./xmlsimple.pl ./rest.xml 46 blocks . .bash_aliases .bash_history .bash_path .bash_profile .bash_vars .bashrc xmlsimple.pl rest.xml 46 blocks
Because you use verbose mode in both portions of the command, you can confirm whether the size of the archive created and extracted is identical. In this case, both operations used 46 blocks.
Note that cpio will not overwrite files on the destination if they have the same, or newer, modification time.
Copying over a network
An obvious way of transferring files over a network within UNIX is to use Network File System (NFS) to mount the remote directory and copy between them. That is a straightforward solution, but it is not always possible, or practical, for all situations.
One of the simplest ways to copy files over a network is to use tar or cpio to create an archive file, which you can then transfer over a network. The method has some advantages, such as the flexibility of how and when you copy the files, but also has disadvantages, including the complexity of the copy process and the disk space requirement to store a complete duplicate of the files, both when you create the archive on the source and when you copy the archive to the destination.
As you've seen, it's straightforward to create an archive:
Listing 14. Creating an archive
$ tar cf mydir.tar .
You can then copy the file over using whatever method is appropriate, for example, copy the file over with cp and NFS, or transfer to a remote system with FTP or SFTP.
The archive file method, however, is not a particularly efficient method. You can improve the efficiency by using compression.
If you are creating an archive with cpio or tar and are copying the file to a destination over a slow link (for example, a WAN or the Internet ,rather than a LAN environment), then you can save time by compressing the archive file before transfer. Choosing the right compression format will be dependent on the level of compression you want.
The archive method is straightforward. You can either do it post archive creation, as shown in Listing 15.
Listing 15. Archiving post archive creation
$ tar cf mydir.tar . $ bzip2 mydir.tar
You can also do it by using a pipe to generate a compressed version of the archive (see Listing 16).
Listing 16. Using a pipe to generate a compressed version of the archive
$ tar cf - .| bzip2 >mydir.tar.bz2
The method in Listing 16 has the benefit that it works with all versions of tar, cpio, or any other archive tool. It also works across a range of different platforms, where different variants of tar might or might not support inline compression. If you have a version of GNU tar installed, you can compress using Gzip by using the
z command-line option to tar (see Listing 17).
Listing 17. Using the
z command-line option to tar
$ tar zcf mydir.tar.gz .
Another alternative for copying directories between systems is to use the pipe solution shown in Listing 16, but then use a remote shell tool as the destination.
Copying directly over a network
You can copy directly over a network by piping the output from a typical
cpio command through a remote shell, such as remote shell (rsh) or secure shell (ssh). Which remote shell technology you use is entirely up to the shells available in your environment. The former, rsh, is a standard remote shell system that offers basic authentication security, but no encryption, while the latter, ssh, offers both authentication and encryption of the data.
Both methods use the same basic command-line structure (see Listing 18).
Listing 18. Copying directly over a network
$ tar cf - ./*|rsh remotehost tar xf - -C /remotedir
This command is similar to the localized tar, except that the destination
tar command is being executed on the remote system. The system works because of the pipe between the two commands.
Remember that depending on your remote shell configuration, you might need to enter a password during the process to authenticate on the remote machine. The same process can also be used with ssh. Listing 19 specifies a user/host combination.
Listing 19. Specifying a user/host combination for authentication on the remote machine
$ tar cf - ./*|ssh user@remotehost tar xf - -C /remotedir
For better performance over slow links, you should use compression, as shown in Listing 20.
Listing 20. Using compression when copying directly over a network
$ tar czf - ./*|ssh user@remotehost tar xzf - -C /remotedir
Both rsh and ssh also have simpler command-line cousins that can make the process of copying from a remote system even more straightforward. For example, with rcp, the cousin to rsh, you would use Listing 21.
Listing 21. Copying a remote system with rcp
$ rcp -r filename remotehost:/remotedir
You must use the
-r command-line option to copy directories recursively.
scp command, cousin to ssh, uses the same structure (see Listing 22).
Listing 22. Using scp
$ scp -r filename remotehost:/remotedir
Synchronizing over a network
All of the above solutions have been concerned with copying files, both locally and over a network. However, they all rely on copying an entire directory structure each time the copy is made when it might not always be necessary. Sometimes, you only need to copy the files that have changed since the last time you performed a copy, essentially synchronization rather than a complete re-copy.
If you are using tar or cpio, then you can achieve a time-based synchronization by explicitly specifying the files that you want to include in the archive. For example, if you are running a synchronization job through cron, then you can use a command like this to create an archive that only contains files changed within the last day (see Listing 23).
Listing 23. Creating an archive that only contains the files changed within the last day
$ tar cf archive.tar `find . -mtime -1 -type f`
find command finds files where the modification has been changed in the last one day. I only select files, because if you include directories, then tar includes all files within that directory and includes more information than you want in the archive file.
For a more robust synchronization, you can use the rsync tool, a free software utility that can efficiently exchange files over the network. The rsync tool can be an effective way of copying and synchronizing files, especially over slower links.
There are a wide range of different tools and choices available to you when copy files and directory trees in UNIX, whether on the same system or between systems over any kind of network. Which tool you use depends on your exact situation and environment. I tend to use tar, because it is the most compatible tool across the range of different UNIX systems that I use. For users in Linux® environments, the scp tool, which is a standard component on most Linux distributions, might be more appropriate.
- System Administration Toolkit: Process administration tricks: Check out other parts in this series.
- Bash: Bash is an alternative shell to the standard Bourne shell with similar syntax, but an expanded range of features, including aliasing, job control, and auto-completion of file and directory names.
- AIX® and UNIX articles: Check out other articles written by Martin Brown.
- Search the AIX and UNIX library by topic:
- AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.
- IBM trial software: Build your next development project with software for download directly from developerWorks.
- developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and catch up with IBM technical experts.