Learn Linux, 101: File and directory management

Getting comfortable with Linux files and directories

You've probably heard that everything in Linux is a file, so start on the right path with a solid grounding in file and directory management -- finding, listing, moving, copying, and archiving. You can use this material in this article to study for the LPI® 101 exam for Linux system administrator certification, or just to learn for fun.

Share:

Ian Shields, Senior Programmer, IBM

Ian ShieldsIan Shields works on a multitude of Linux projects for the developerWorks Linux zone. He is a Senior Programmer at IBM at the Research Triangle Park, NC. He joined IBM in Canberra, Australia, as a Systems Engineer in 1973, and has since worked on communications systems and pervasive computing in Montreal, Canada, and RTP, NC. He has several patents and has published several papers. His undergraduate degree is in pure mathematics and philosophy from the Australian National University. He has an M.S. and Ph.D. in computer science from North Carolina State University. Learn more about Ian in Ian's profile on developerWorks Community.


developerWorks Contributing author
        level

06 October 2009

Also available in Russian Japanese Portuguese Spanish

About this series

This series of articles helps you learn Linux system administration tasks. You can also use the material in these articles to prepare for Linux Professional Institute Certification level 1 (LPIC-1) exams.

See our series roadmap for a description of and link to each article in this series. The roadmap is in progress and reflects the latest (April 2009) objectives for the LPIC-1 exams: as we complete articles, we add them to the roadmap. In the meantime, though, you can find earlier versions of similar material, supporting previous LPIC-1 objectives prior to April 2009, in our LPI certification exam prep tutorials.

Prerequisites

To get the most from the articles in this series, you should have a basic knowledge of Linux and a working Linux system on which you can practice the commands covered in this article. Sometimes different versions of a program will format output differently, so your results may not always look exactly like the listings and figures shown here.

Overview

This article grounds you in the basic Linux commands for manipulating files and directories. Learn to:

  • List directory contents
  • Copy, move, or remove files and directories
  • Manipulate multiple files and directories recursively
  • Use wildcard patterns for manipulating files
  • Use the find command to locate and act on files based on type, size, or time
  • Compress and decompress files using gzip and bzip2
  • Archive files using tar, cpio and dd

This article helps you prepare for Objective 103.2 in Topic 103 of the Linux Professional Institute's Junior Level Administration (LPIC-1) exam 101. The objective has a weight of 4.


Listing directories

Develop skills on this topic

This content is part of a progressive knowledge path for advancing your skills. See Basics of Linux system administration: Working at the console

All files on Linux and UNIX® systems are accessed as part of a single large tree-structured filesystem that is rooted at /. You may add more branches to this tree by mounting them and remove them by unmounting them. Mounting and unmounting will be covered in the article on Mounting and unmounting of filesystems (see the series roadmap).

Listing directory entries

In this article, we will practice the commands using the files created in the article "Learn Linux 101: Text streams and filters." If you followed along in that article, you created a directory, lpi103-2, in your home directory. If you didn't, then you can use another directory on your system to practice the commands discussed in this article.

File and directory names are either absolute, meaning they begin with a /, or they are relative to the current working directory, meaning they do not begin with a /. The absolute path to a file or directory consists of a / followed by series of zero or more directory names, each followed by another / and then a final name.

Given a file or directory name that is relative to the current working directory, simply concatenate the absolute name of the working directory, a /, and the relative name. For example, the directory, lpi103-2, that we created in the earlier article was created in my home directory, /home/ian, so its full, or absolute, path is /home/ian/lpi103-2.

You can display the name of the current working directory with the pwd command. It is also usually available in the PWD environment variable. Listing 1 shows the use of the pwd command, and three different ways to use the ls command to list the files in this directory.

Listing 1. Listing directory entries
[ian@echidna lpi103-2]$ pwd
/home/ian/lpi103-2
[ian@echidna lpi103-2]$ echo "$PWD"
/home/ian/lpi103-2
[ian@echidna lpi103-2]$ ls
sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab
[ian@echidna lpi103-2]$ ls "$PWD"
sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab
[ian@echidna lpi103-2]$ ls /home/ian/lpi103-2
sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab

As you can see, you can give a relative or absolute directory name as a parameter to the ls command, and it will list the contents of that directory.

Listing details

On a storage device, a file or directory is contained in a collection of blocks. Information about a file is contained in an inode which records information such as the owner, when the file was last accessed, how large it is, whether it is a directory or not, and who can read from or write to it. The inode number is also known as the file serial number and is unique within a particular filesystem. We can use the -l (or --format=long) option to display some of the information stored in the inode.

By default, the ls command does not list special files, those whose names start with a dot (.). Every directory other than the root directory has at least two special entries: the directory itself (.) and the parent directory (..). The root directory does not have a parent directory.

Listing 2 uses the -l and -a options to display a long format listing of all files including the . and .. directory entries.

Listing 2. Displaying a long directory listing
[ian@echidna lpi103-2]$ ls -al
total 52
drwxrwxr-x.  2 ian ian 4096 2009-08-11 21:21 .
drwx------. 35 ian ian 4096 2009-08-12 10:55 ..
-rw-rw-r--.  1 ian ian    8 2009-08-11 21:17 sedtab
-rw-rw-r--.  1 ian ian   24 2009-08-11 14:02 text1
-rw-rw-r--.  1 ian ian   25 2009-08-11 14:27 text2
-rw-rw-r--.  1 ian ian   63 2009-08-11 15:41 text3
-rw-rw-r--.  1 ian ian   26 2009-08-11 15:42 text4
-rw-rw-r--.  1 ian ian   24 2009-08-11 18:47 text5
-rw-rw-r--.  1 ian ian   98 2009-08-11 21:21 text6
-rw-rw-r--.  1 ian ian   15 2009-08-11 14:41 xaa
-rw-rw-r--.  1 ian ian    9 2009-08-11 14:41 xab
-rw-rw-r--.  1 ian ian   17 2009-08-11 14:41 yaa
-rw-rw-r--.  1 ian ian    8 2009-08-11 14:41 yab

In Listing 2, the first line shows the total number of disk blocks (52) used by the listed files. The remaining lines tell you about the directory entries.

  • The first field (drwxrwxr-x or -rw-rw-r-- in this case) tells us whether the file is a directory (d) or a regular file (-). You may also see symbolic links (l) or other values for some special files (such as files in the /dev filesystem). You will learn more about symbolic links in the article Create and change hard and symbolic links (see the series roadmap). The type is followed by three sets of permissions (such as rwx or r--) for the owner, the members of the owner's group, and everyone. The three values, respectively, indicate whether the user, group, or everyone has read (r), write (w), or execute (x) permission. Other uses such as setuid will be covered in the article Manage file permissions and ownership (see the series roadmap).
  • The next field is a number that tells us the number of hard links to the file. We said that the inode contains information about the file. The file's directory entry contains a hard link (or pointer) to the inode for the file, so every entry listed should have at least one hard link. Directory entries have an additional one for the . entry and one for each subdirectory entry. So we can see from Listing 2 that my home directory, represented by .., has quite a few subdirectories, as it has 35 hard links.
  • The next two fields are the file's owner and the owner's primary group. Some systems, such as Red Hat or Fedora systems, default to providing a separate group for each user. On other systems, all users may be in one or perhaps a few groups.
  • The next field contains the length of the file in bytes.
  • The penultimate field contains the timestamp of the last modification.
  • And the final field contains the name of the file or directory.

The -i option of the ls command will display the inode numbers for you. You will see inodes again later in this article and also in the article Create and change hard and symbolic links (see the series roadmap).

Multiple files

You can also specify multiple parameters to the ls command, where each name is either that of a file or directory. For directory names, the ls command lists the contents of the directory rather than information about the directory itself. In our example, suppose we wanted information about the lpi103-2 directory entry itself as it is listed in the parent directory. The command ls -l ../lpi103-2 would give us a listing like the previous example. Listing 3 shows how to add the -d option to list information about directory entries rather than the contents of directories and also how to list entries for multiple files or directories.

Listing 3. Using ls -d
[ian@echidna lpi103-2]$ ls -ld ../lpi103-2 sedtab xaa
drwxrwxr-x. 2 ian ian 4096 2009-08-12 15:31 ../lpi103-2
-rw-rw-r--. 1 ian ian    8 2009-08-11 21:17 sedtab
-rw-rw-r--. 1 ian ian   15 2009-08-11 14:41 xaa

Note that the modification time for lpi103-2 is different from that in the previous listing. Also, as in the previous listing, it is different from the timestamps of any of the files in the directory. Is this what you would expect? Not normally. However, in developing this article, I created some extra examples and then deleted them, so the directory time stamps reflect that fact. We will talk more about file times later under Handling multiple files and directories.

Sorting the output

By default, ls lists files alphabetically. There are a number of options for sorting the output. For example, ls -t will sort by modification time (newest to oldest) while ls -lS will produce a long listing sorted by size (largest to smallest). Adding -r will reverse the sort order. For example, use ls -lrt to produce a long listing sorted from oldest to newest. Consult the man page for other ways you can list files and directories.


Copying, moving, and deleting files

We have now learned some ways to create files, but suppose we want to make copies of files, rename files, move them around the filesystem hierarchy, or even delete them. We use three short commands for these purposes.

cp
is used to make a copy of one or more files or directories. You must give one (or more source names and one target name. Source or target names may include a path specification. If the target is an existing directory, then all sources are copied into the target. If the target is a directory that does not exist, then the (single) source must also be a directory and a copy of the source directory and its contents is made with the target name as the new name. If the target is a file, then the (single) source must also be a file and a copy of the source file is made with the target name as the new name, replacing any existing file of the same name. Note that there is no default assumption of the target being the current directory as in DOS and Windows operating systems.
mv
is used to move or rename one or more files or directories. In general, the names you may use follow the same rules as for copying with cp; you can rename a single file or move a set of files into a new directory. Because the name is only a directory entry that links to an inode, it should be no surprise that the inode number does not change unless the file is moved to another filesystem, in which case moving it behaves more like a copy followed by deleting the original.
rm
is used to remove one or more files. We will see how to remove directories shortly.

Where's the rename command?

If you are used to a DOS or Windows® system, you may find it strange to use mv to rename a file. Linux does have a rename command, but it has different syntax from the DOS and Windows commands of the same name. See the man page for details on how to use it.

Listing 4 illustrates the use of cp and mv to make some backup copies of our text files. We also use ls -i to show inodes for some of our files.

  1. We first make a copy of our text1 file as text1.bkp.
  2. We then decide to create a backup subdirectory using the mkdir command
  3. We make a second backup copy of text 1, this time in the backup directory, and show that all three files have different inodes.
  4. We then move our text1.bkp to the backup directory and after that rename it to be more consistent with the second backup. While we could have done this with a single command, we use two here for illustration.
  5. We check the inodes again and confirm that text1.bkp with inode 934193 is no longer in our lpi103-2 directory, but that the inode is that of text1.bkp.1 in the backup directory.
Listing 4. Copying and moving files
[ian@echidna lpi103-2]$ cp text1 text1.bkp
[ian@echidna lpi103-2]$ mkdir backup
[ian@echidna lpi103-2]$ cp text1 backup/text1.bkp.2
[ian@echidna lpi103-2]$ ls -i text1 text1.bkp backup
933892 text1  934193 text1.bkp

backup:
934195 text1.bkp.2
[ian@echidna lpi103-2]$ mv text1.bkp backup
[ian@echidna lpi103-2]$ mv backup/text1.bkp backup/text1.bkp.1
[ian@echidna lpi103-2]$ ls -i text1 text1.bkp backup
ls: cannot access text1.bkp: No such file or directory
933892 text1

backup:
934193 text1.bkp.1  934195 text1.bkp.2

Normally, the cp command will copy a file over an existing copy, if the existing file is writable. On the other hand, the mv will not move or rename a file if the target exists. There are several useful options relevant to this behavior of cp and mv.

-f or --force
will cause cp to attempt to remove an existing target file even if it is not writable.
-i or --interactive
will ask for confirmation before attempting to replace an existing file
-b or --backup
will make a backup of any files that would be replaced.

As usual, consult the man pages for full details on these and other options for copying and moving.

Listing 6 illustrates copying with backup and then file deletion.

Listing 5. Making backup copies and deleting files
[ian@echidna lpi103-2]$ cp text2 backup
[ian@echidna lpi103-2]$ cp --backup=t text2 backup
[ian@echidna lpi103-2]$ ls backup
text1.bkp.1  text1.bkp.2  text2  text2.~1~
[ian@echidna lpi103-2]$ rm backup/text2 backup/text2.~1~
[ian@echidna lpi103-2]$ ls backup
text1.bkp.1  text1.bkp.2

Note that the rm command also accepts the -i (interactive) and -f (force options). Once you remove a file using rm, the filesystem no longer has access to it. Some systems default to setting an alias alias rm='rm -i' for the root user to help prevent inadvertent file deletion. This is also a good idea for ordinary users if you are nervous about what you might accidentally delete.

Before we leave this discussion, it should be noted that the cp command defaults to creating a new timestamp for the new file or files. The owner and group are also set to the owner and group of the user doing the copying. The -p option may be used to preserve selected attributes. Note that the root user may be the only user who can preserve ownership. See the man page for details.


Creating and removing directories

We have already seen how to create a directory with mkdir. Now we will look further at mkdir and introduce rmdir, its analog for removing directories.

Mkdir

Suppose we are in our lpi103-2 directory and we wish to create subdirectories dir1 and dir2. mkdir, like the commands we have just been reviewing, will handle multiple directory creation requests in one pass as shown in Listing 6.

Listing 6. Creating multiple directories
[ian@echidna lpi103-2]$ mkdir dir1 dir2

Note that there is no output on successful completion, although you could use echo $? to confirm that the exit code is really 0.

If, instead, you wanted to create a nested subdirectory, such as d1/d2/d3, this would fail because the d1 and d2 directories do not exist. Fortunately, mkdir has a -p option that allows it to create any required parent directories, as shown in Listing 7.

Listing 7. Creating parent directories
[ian@echidna lpi103-2]$ mkdir d1/d2/d3
mkdir: cannot create directory `d1/d2/d3': No such file or directory
[ian@echidna lpi103-2]$ echo $?
1
[ian@echidna lpi103-2]$ mkdir -p d1/d2/d3
[ian@echidna lpi103-2]$ echo $?
0

Rmdir

Removing directories using the rmdir command is the opposite of creating them. Again, there is a -p option to remove parents as well. You can remove a directory with rmdir only if it is empty as there is no option to force removal. We'll see another way to accomplish that particular trick when we look at recursive manipulation. Once you learn this, you will probably seldom use rmdir on the command line, but it is still good to know about it.

To illustrate directory removal, we copied our text1 file into the directory d1/d2 so that it is no longer empty. We then used rmdir to remove all the directories we just created with mkdir. As you can see, d1 and d2 were not removed because d2 was not empty. The other directories were removed. Once we remove the copy of text1 from d2, we can remove d1 and d2 with a single invocation of rmdir -p.

Listing 8. Removing directories
[ian@echidna lpi103-2]$ cp text1 d1/d2
[ian@echidna lpi103-2]$ rmdir -p d1/d2/d3 dir1 dir2
rmdir: failed to remove directory `d1/d2': Directory not empty
[ian@echidna lpi103-2]$ ls . d1/d2
.:
backup  sedtab  text2  text4  text6  xab  yab
d1      text1   text3  text5  xaa    yaa

d1/d2:
text1
[ian@echidna lpi103-2]$ rm d1/d2/text1
[ian@echidna lpi103-2]$ rmdir -p d1/d2

Handling multiple files and directories

Up to now the commands we have used have operated on a single file or perhaps a few individually named files. For the rest of this article, we will look at various operations for handling multiple files, recursively manipulating part of a directory tree, and saving or restoring multiple files or directories.


Recursive manipulation

Recursive listing

The ls command has a -R (note upper case "R") option for listing a directory and all its subdirectories. The recursive option applies only to directory names; it will not find all the files called 'text1', for example, in a directory tree. You may use other options that we have seen already along with -R. A recursive listing of our lpi103-2 directory, including inode numbers, is shown in Listing 9.

Listing 9. Displaying directory listings recursively
[ian@echidna lpi103-2]$ ls -iR
.:
934194 backup  933892 text1  933898 text3  933900 text5  933894 xaa  933896 yaa
933901 sedtab  933893 text2  933899 text4  933902 text6  933895 xab  933897 yab

./backup:
934193 text1.bkp.1  934195 text1.bkp.2

Recursive copy

You can use the -r (or -R or --recursive) option to cause the cp command to descend into source directories and copy the contents recursively. To prevent an infinite recursion, the source directory itself may not be copied. Listing 10 shows how to copy everything in our lpi103-2 directory to a copy1 subdirectory. We use ls -R to show the resulting directory tree.

Listing 10. Copying recursively
[ian@echidna lpi103-2]$ cp -pR . copy1
cp: cannot copy a directory, `.', into itself, `copy1'
[ian@echidna lpi103-2]$ ls -R
.:
backup  copy1  sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab

./backup:
text1.bkp.1  text1.bkp.2

./copy1:
text2  text3  text5  xaa  yaa  yab

Recursive deletion

We mentioned earlier that rmdir only removes empty directories. We can use the -r (or -R or --recursive) option to cause the rm command to remove both files and directories as shown in Listing 11 where we remove the copy1 directory that we just created, along with its contents, including the backup subdirectory and its contents.

Listing 11. Deleting recursively
[ian@echidna lpi103-2]$ rm -r copy1
[ian@echidna lpi103-2]$ ls -R
.:
backup  sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab

./backup:
text1.bkp.1  text1.bkp.2

If you have files that are not writable by you, you may need to add the -f option to force removal. This is often done by the root user when cleaning up, but be warned that you can lose valuable data if you are not careful.


Wildcards and globbing

Often, you may need to perform a single operation on many filesystem objects, without operating on the entire tree as we just did with recursive operations. For example, you might want to find the modification times of all the text files we created in lpi103-2, without listing the split files. Although this is easy with our small directory, it is much harder in a large filesystem.

To solve this problem, use the wildcard support that is built in to the bash shell. This support, also called "globbing" (because it was originally implemented as a program called /etc/glob), lets you specify multiple files using wildcard pattern.

A string containing any of the characters '?', '*' or '[', is a wildcard pattern. Globbing is the process by which the shell (or possibly another program) expands these patterns into a list of pathnames matching the pattern. The matching is done as follows:

?
matches any single character.
*
matches any string, including an empty string.
[
introduces a character class. A character class is a non-empty string, terminated by a ']'. A match means matching any single character enclosed by the brackets. There are a few special considerations.
  • The '*' and '?' characters match themselves. If you use these in filenames, you need to be careful about appropriate quoting or escaping.
  • Because the string must be non-empty and terminated by ']', you must put ']' first in the string if you want to match it.
  • The '-' character between two others represents a range that includes the two other characters and all characters between them in the collating sequence. For example, [0-9a-fA-F] represents any upper or lower case hexadecimal digit. You can match a '-' by putting it either first or last within a range.
  • The '!' character specified as the first character of a range complements the range so that it matches any character except the remaining characters. For example, [!0-9] means any character except the digits 0 through 9. A '!' in any position other than the first matches itself. Remember that '!' is also used with the shell history function, so you need to be careful to properly escape it.

Note: Wildcard patterns and regular expression patterns share some characteristics, but they are not the same. Pay careful attention.

Globbing is applied separately to each component of a path name. You cannot match a '/', nor include one in a range. You can use it anywhere that you might specify multiple file or directory names, for example in the ls, cp, mv, or rm commands. In Listing 12, we first create a couple of oddly named files and then use the ls and rm commands with wildcard patterns.

Listing 12. Wildcard pattern examples
[ian@echidna lpi103-2]$ echo odd1>'text[*?!1]'
[ian@echidna lpi103-2]$ echo odd2>'text[2*?!]'
[ian@echidna lpi103-2]$ ls
backup  text1       text2       text3  text5  xaa  yaa
sedtab  text[*?!1]  text[2*?!]  text4  text6  xab  yab
[ian@echidna lpi103-2]$ ls text[2-4]
text2  text3  text4
[ian@echidna lpi103-2]$ ls text[!2-4]
text1  text5  text6
[ian@echidna lpi103-2]$ ls text*[2-4]*
text2  text[2*?!]  text3  text4
[ian@echidna lpi103-2]$ ls text*[!2-4]* # Surprise!
text1  text[*?!1]  text[2*?!]  text5  text6
[ian@echidna lpi103-2]$ ls text*[!2-4] # Another surprise!
text1  text[*?!1]  text[2*?!]  text5  text6
[ian@echidna lpi103-2]$ echo text*>text10
[ian@echidna lpi103-2]$ ls *\!*
text[*?!1]  text[2*?!]
[ian@echidna lpi103-2]$ ls *[x\!]*
text1  text[*?!1]  text10  text2  text[2*?!]  text3  text4  text5  text6  xaa  xab
[ian@echidna lpi103-2]$ ls *[y\!]*
text[*?!1]  text[2*?!]  yaa  yab
[ian@echidna lpi103-2]$ ls tex?[[]*
text[*?!1]  text[2*?!]
[ian@echidna lpi103-2]$ rm tex?[[]*
[ian@echidna lpi103-2]$ ls *b*
sedtab  xab  yab

backup:
text1.bkp.1  text1.bkp.2
[ian@echidna lpi103-2]$ ls backup/*2
backup/text1.bkp.2
[ian@echidna lpi103-2]$ ls -d .*
.  ..

Notes:

  1. Complementation in conjunction with '*' can lead to some surprises. The pattern '*[!2-4]' matches the longest part of a name that does not have 2, 3, or 4 following it, which is matched by both text[*?!1] and text[2*?!]. So now both surprises should be clear.
  2. As with earlier examples of ls, if pattern expansion results in a name that is a directory name and the -d option is not specified, then the contents of that directory will be listed (as in our example above for the pattern '*b*').
  3. If a filename starts with a period (.), then that character must be matched explicitly. Notice that only the last ls command listed the two special directory entries (. and ..).

Remember that any wildcard characters in a command are liable to be expanded by the shell, which may lead to unexpected results. Furthermore, if you specify a pattern that does not match any filesystem objects, then POSIX requires that the original pattern string be passed to the command. Some earlier implementations passed a null list to the command, so you may run into old scripts that give unusual behavior. We illustrate these points in Listing 13.

Listing 13. Wildcard pattern surprises
[ian@echidna lpi103-2]$ echo text*
text1 text10 text2 text3 text4 text5 text6
[ian@echidna lpi103-2]$ echo "text*"
text*
[ian@echidna lpi103-2]$ echo text[[\!?]z??
text[[!?]z??

For more information on globbing, look at man 7 glob. You will need the section number, as there is also glob information in section 3. The best way to understand all the various shell interactions is by practice, so try these wildcards out whenever you have a chance. Remember to try ls to check your wildcard pattern before allowing cp, mv, or worse, rm to do something unexpectedly.


Touching files

We will now look at the touch command, which can update file access and modification times or create empty files. In the next part, we will see how to use this information for finding files and directories. We will continue using the lpi103-2 directory for our examples. We will also look at the various ways you may specify timestamps.

touch

The touch command with no options takes one or more filenames as parameters and updates the modification time of the files. This is the same timestamp normally displayed with a long directory listing. In Listing 14, we use our old friend echo to create a small file called f1, and then use a long directory listing to display the modification time (or mtime). In this case, it happens also to be the time the file was created. We then use the sleep command to wait for 60 seconds and run ls again. Note that the timestamp for the file has changed by a minute.

Listing 14. Updating modification time with touch
[ian@echidna lpi103-2]$ echo xxx>f1; ls -l f1; sleep 60; touch f1; ls -l f1
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:24 f1
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1

If you specify a filename for a file that does not exist, then touch will normally create an empty file for you, unless you specify the -c or --no-create option. Listing 15 illustrates both these commands. Note that only f2 is created.

Listing 15. Creating empty files with touch
[ian@echidna lpi103-2]$ touch f2; touch -c f3; ls -l f*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 f2

The touch command can also set a file's modification time (also known as mtime) to a specific date and time using either the -d or -t options. The -d is very flexible in the date and time formats that it will accept, while the -t option needs at least an MMDDhhmm time with optional year and seconds values. Listing 16 shows some examples.

Listing 16. Setting mtime with touch
[ian@echidna lpi103-2]$ touch -t 200908121510.59 f3
[ian@echidna lpi103-2]$ touch -d 11am f4
[ian@echidna lpi103-2]$ touch -d "last fortnight" f5
[ian@echidna lpi103-2]$ touch -d "yesterday 6am" f6
[ian@echidna lpi103-2]$ touch -d "2 days ago 12:00" f7
[ian@echidna lpi103-2]$ touch -d "tomorrow 02:00" f8
[ian@echidna lpi103-2]$ touch -d "5 Nov" f9
[ian@echidna lpi103-2]$ ls -lrt f*
-rw-rw-r--. 1 ian ian 0 2009-07-31 18:31 f5
-rw-rw-r--. 1 ian ian 0 2009-08-12 12:00 f7
-rw-rw-r--. 1 ian ian 0 2009-08-12 15:10 f3
-rw-rw-r--. 1 ian ian 0 2009-08-13 06:00 f6
-rw-rw-r--. 1 ian ian 0 2009-08-14 11:00 f4
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 f2
-rw-rw-r--. 1 ian ian 0 2009-08-15 02:00 f8
-rw-rw-r--. 1 ian ian 0 2009-11-05 00:00 f9

If you're not sure what date a date expression might resolve to, you can use the date command to find out. It also accepts the -d option and can resolve the same kind of date formats that touch can.

You can use the -r (or --reference) option along with a reference filename to indicate that touch (or date) should use the timestamp of an existing file. Listing 17 shows some examples.

Listing 17. Timestamps from reference files
[ian@echidna lpi103-2]$ date
Fri Aug 14 18:33:48 EDT 2009
[ian@echidna lpi103-2]$ date -r f1
Fri Aug 14 18:25:50 EDT 2009
[ian@echidna lpi103-2]$ touch -r f1 f1a
[ian@echidna lpi103-2]$ ls -l f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a

A Linux system records both a file modification time and a file access time. These are also known respectively as the mtime and atime. Both timestamps are set to the same value when a file is created, and both are reset when it is modified. If a file is accessed at all, then the access time is updated, even if the file is not modified. For our last example with touch, we will look at file access times. The -a (or --time=atime, --time=access or --time=use) option specify that the access time should be updated. Listing 18 uses the cat command to access the f1 file and display its contents. We then use ls -l and ls -lu to display the modification and access times respectively for f1 and f1a, which we created using f1 as a reference file. We then reset the access time of f1 to that of f1a using touch -a and verify that it was reset.

Listing 18. Access time and modification time
[ian@echidna lpi103-2]$ ls -lu f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:39 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a
[ian@echidna lpi103-2]$ ls -l f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a
[ian@echidna lpi103-2]$ touch -a -r f1a f1
[ian@echidna lpi103-2]$ ls -lu f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a

For more complete information on the many allowable time and date specifications, see the man or info pages for the touch and date commands.


Finding files

Now that we've covered the file and directory topic with the big recursive hammer that hits everything, and the globbing hammer that hits more selectively, let's look at the find command, which can be more like a surgeon's knife. The find command is used to find files in one or more directory trees, based on criteria such as name, timestamp, or size. Again, we will use the lpi103-2 directory.

find

The find command will search for files or directories using all or part of the name, or by other search criteria, such as size, type, file owner, creation date, or last access date. The most basic find is a search by name or part of a name. Listing 19 shows an example from our lpi103-2 directory where we first search for all files that have either a '1' or a 'k' in their name, then perform some path searches that are explained in the notes below.

Listing 19. Finding files by name
[ian@echidna lpi103-2]$ find . -name "*[1k]*"
./f1a
./f1
./text10
./backup
./backup/text1.bkp.1
./backup/text1.bkp.2
./text1
[ian@echidna lpi103-2]$ find . -ipath "*ACK*1"
./backup/text1.bkp.1
[ian@echidna lpi103-2]$ find . -ipath "*ACK*/*1"
[

Notes:

  1. The patterns that you may use are shell wildcard patterns like those we saw earlier under Wildcards and globbing.
  2. You can use -path instead of -name to match full paths instead of just base file names. In this case, the pattern may span path components, unlike ordinary wildcard matches, which match only a single part of a path.
  3. If you want case-insensitive search as shown in the use of ipath above, precede the find options that search on a string or pattern with an 'i'.
  4. If you want to find a file or directory whose name begins with a dot, such as .bashrc or the current directory (.), then you must specify a leading dot as part of the pattern. Otherwise, name searches will ignore these files or directories.

In the first example above, we found both files and a directory (./backup). Use the -type parameter along with one-letter type to restrict the search. Use 'f' for regular files, 'd' for directories, and 'l' for symbolic links. See the man page for find for other possible types. Listing 20 shows the result of searching for directories (-type d) alone and with a file name (*, or everything, in this case).

Listing 20. Finding files by type
[ian@echidna lpi103-2]$ find . -type d
.
./backup
[ian@echidna lpi103-2]$ find . -type d -name "*"
.
./backup

Note that the -type d specification without any form of name specification displays directories that have a leading dot in their names (only the current directory in this case), as does the wildcard "*".

We can also search by file size, either for a specific size (n) or for files that are either larger (+n) or smaller than a given value (-n). By using both upper and lower size bounds, we can find files whose size is within a given range. By default, the -size option of find assumes a unit of 'b' for 512-byte blocks. Among other choices, specify 'c' for bytes, or 'k' for kilobytes. In Listing 21, we first find all files with size 0, and then all with size of either 24 or 25 bytes. Note that specifying -empty instead of -size 0 also finds empty files.

Listing 21. Finding files by size
[ian@echidna lpi103-2]$ find . -size 0
./f1a
./f6
./f8
./f2
./f3
./f7
./f4
./f9
./f5
[ian@echidna lpi103-2]$ find . -size -26c -size +23c -print
./text2
./text5
./backup/text1.bkp.1
./backup/text1.bkp.2
./text1

The second example in Listing 21 introduces the -print option, which is an example of an action that may be taken on the results returned by the search. In the bash shell. This is the default action if no action is specified. On some systems and some shells, an action is required; otherwise, there is no output.

Other actions include -ls, which prints file information equivalent to that from the ls -lids command, and -exec, which executes a command for each file. The -exec must be terminated by a semicolon, which must be escaped to avoid the shell interpreting it first. Also specify {} wherever you want the returned file used in the command. Remember that curly braces also have meaning to the shell and must be escaped (or quoted). Listing 22 shows how the -ls and the -exec options can be used to list file information. Notice that the second form does not list the inode information.

Listing 22. Finding and acting on files
[ian@echidna lpi103-2]$ find . -size -26c -size +23c -ls
933893    4 -rw-rw-r--   1 ian      ian            25 Aug 11 14:27 ./text2
933900    4 -rw-rw-r--   1 ian      ian            24 Aug 11 18:47 ./text5
934193    4 -rw-rw-r--   1 ian      ian            24 Aug 12 15:36 ./backup/text1.bkp.1
934195    4 -rw-rw-r--   1 ian      ian            24 Aug 12 15:36 ./backup/text1.bkp.2
933892    4 -rw-rw-r--   1 ian      ian            24 Aug 11 14:02 ./text1
[ian@echidna lpi103-2]$ find . -size -26c -size +23c -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 25 2009-08-11 14:27 ./text2
-rw-rw-r--. 1 ian ian 24 2009-08-11 18:47 ./text5
-rw-rw-r--. 1 ian ian 24 2009-08-12 15:36 ./backup/text1.bkp.1
-rw-rw-r--. 1 ian ian 24 2009-08-12 15:36 ./backup/text1.bkp.2
-rw-rw-r--. 1 ian ian 24 2009-08-11 14:02 ./text1

The -exec option can be used for as many purposes as your imagination can dream up. For example:

find . -empty -exec rm '{}' \;

removes all the empty files in a directory tree, while

find . -name "*.htm" -exec mv '{}' '{}l' \;

renames all .htm files to .html files.

For our final examples of find, we use the timestamps described with the touch command to locate files having particular timestamps. Listing 23 shows three examples:

  1. When used with -mtime -2, the find command finds all files modified within the last two days. A day in this case is a 24-hour period relative to the current date and time. Note that you would use -atime if you wanted to find files based on access time rather than modification time.
  2. Adding the -daystart option means that we want to consider days as calendar days, starting at midnight. Now the f3 file is excluded from the list.
  3. Finally, we show how to use a time range in minutes rather than days to find files modified between one hour (60 minutes) and 10 hours (600 minutes) ago.
Listing 23. Finding files by timestamp
[ian@echidna lpi103-2]$ date
Sat Aug 15 00:27:36 EDT 2009
[ian@echidna lpi103-2]$ find . -mtime -2 -type f -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 ./f1a
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 ./f1
-rw-rw-r--. 1 ian ian 0 2009-08-13 06:00 ./f6
-rw-rw-r--. 1 ian ian 0 2009-08-15 02:00 ./f8
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 ./f2
-rw-rw-r--. 1 ian ian 58 2009-08-14 17:30 ./text10
-rw-rw-r--. 1 ian ian 0 2009-08-14 11:00 ./f4
-rw-rw-r--. 1 ian ian 0 2009-11-05 00:00 ./f9
[ian@echidna lpi103-2]$ find . -daystart -mtime -2 -type f -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 ./f1a
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 ./f1
-rw-rw-r--. 1 ian ian 0 2009-08-15 02:00 ./f8
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 ./f2
-rw-rw-r--. 1 ian ian 58 2009-08-14 17:30 ./text10
-rw-rw-r--. 1 ian ian 0 2009-08-14 11:00 ./f4
-rw-rw-r--. 1 ian ian 0 2009-11-05 00:00 ./f9
[ian@echidna lpi103-2]$ find . -mmin -600 -mmin +60 -type f -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 ./f1a
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 ./f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 ./f2
-rw-rw-r--. 1 ian ian 58 2009-08-14 17:30 ./text10

The man pages for the find command can help you learn the extensive range of options that we cannot cover in this brief introduction.


Identifying files

File names often have a suffix such as gif, jpeg, or html that give a hint of what the file might contain. Linux does not require such suffixes and generally does not use them to identify a file type. Knowing what type of file you are dealing with helps you know what program to use to display or manipulate it. The file command tells you something about the type of data in one or more files. Listing 24 shows some examples of using the file command.

Listing 24. Identifying file contents
[ian@echidna lpi103-2]$ file backup text1 f2 ../p-ishields.jpg /bin/echo
backup:            directory
text1:             ASCII text
f2:                empty
../p-ishields.jpg: JPEG image data, JFIF standard 1.02
/bin/echo:         ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically 
linked (uses shared libs), for GNU/Linux 2.6.18, stripped

The file command attempts to classify each file using three types of test. Filesystem tests use the results of the stat command to determine whether a file is empty or a directory, for example . So-called magic tests check a file for specific contents that identify it. These signatures are also known as magic numbers. Finally, language tests look at the content of text files to attempt to determine if a file is an XML file, C or C++ language source, a troff file, or some other file that is considered source for some kind of language processor. The first type that is found is reported unless the -k or --keep-goingoption is specified.

The file command has many options that you may learn about using the man pages. Listing 25 shows how to use the -i (or --mime) option to display the file type as a MIME string instead of the normal human-readable output.

Listing 25. Identifying file contents as MIME
[ian@echidna lpi103-2]$ file -i backup text1 f2 ../p-ishields.jpg /bin/echo
backup:            application/x-directory; charset=binary
text1:             text/plain; charset=us-ascii
f2:                application/x-empty; charset=binary
../p-ishields.jpg: image/jpeg; charset=binary
/bin/echo:         application/x-executable; charset=binary

The magic number files are also managed by the file command. Again, see the man pages for more information.

Note: The identify command, which is part of the ImageMagick package, is an additional tool that provides more detail when identifying image file types.


Compressing files

When you are backing up, archiving, or transmitting files, it is common to compress the files. In a Linux environment, two popular compression programs are gzip and bzip2. The gzip command uses the Lempel-Ziv algorithm, while bzip2 uses the Burrows-Wheeler block sorting algorithm.

Using gzip and gunzip

Compression generally works well on text files. Many image formats already compress the data, so compression may not work well on these or other binary files. To illustrate compression on a reasonably large text file, let's copy /etc/services to the directory we have been using and compress it using gzip as shown in Listing 26. We use the -p option of cp to preserve the timestamp of /etc/services. Note that the compressed file has the same timestamp and has a .gz suffix.

Listing 26. Compressing with gzip
[ian@echidna lpi103-2]$ cp -p /etc/services .
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna lpi103-2]$ gzip services
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 124460 2009-04-10 04:42 services.gz

You decompress a gzipped file using the -d option of gzip or, more commonly, using the gunzip command. Listing 27 shows the first of these choices. Note that the uncompressed file now has the original file name and timestamp.

Listing 27. Decompressing with gzip
[ian@echidna lpi103-2]$ gzip -d services.gz
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services

Using bzip2 and bunzip2

The bzip2 command operates in a similar manner to gzip as shown in Listing 28.

Listing 28. Compressing with bzip2
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna lpi103-2]$ bzip2 services
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 113444 2009-04-10 04:42 services.bz2
[ian@echidna lpi103-2]$ bunzip2 services.bz2
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services

Differences between gzip and bzip2

By design, many of the bzip2 options are the same as those of gzip, but the two commands do not have identical options. You may have noted that in both our examples, the uncompressed file had the same name and timestamp as the original. However, renaming or touching the compressed file can change this behavior. The gzip command has -N or --name option to force the name and timestamp to be preserved, but bzip2 does not. The gzip command also has a -l option to display information about the compressed file, including the name that will be used when it is decompressed. Listing 29 illustrates some of these differences between the two commands.

Listing 29. Some differences between gzip and bzip2
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna ~]$ gzip -N services
[ian@echidna ~]$ touch services.gz
[ian@echidna ~]$ mv services.gz services-x.gz
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 124460 2009-09-23 14:08 services-x.gz
[ian@echidna ~]$ gzip -l services-x.gz
         compressed        uncompressed  ratio uncompressed_name
             124460              630983  80.3% services-x
[ian@echidna ~]$ gzip -lN services-x.gz
         compressed        uncompressed  ratio uncompressed_name
             124460              630983  80.3% services
[ian@echidna ~]$ gunzip -N services-x.gz
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna ~]$
[ian@echidna ~]$ bzip2 services
[ian@echidna ~]$ mv services.bz2 services-x.bz2
[ian@echidna ~]$ touch services-x.bz2
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 113444 2009-09-23 14:10 services-x.bz2
[ian@echidna ~]$ bunzip2 services-x.bz2
[ian@echidna ~]$ ls -l serv*
-rw-rw-r--. 1 ian ian 630983 2009-09-23 14:10 services-x
[ian@echidna ~]$ rm services-x # Don't need this any more

Both gzip and bzip2 will accept input from stdin. Both support the -c option to direct output to stdout.

There are two other commands associated with bzip2

  1. The bzcat command decompresses files to stdout and is equivalent to bzip2 -dc.
  2. The bzip2recover command attempts to recover data from damaged bzip2 files.

The man pages will help you learn more about the other options of gzip and bzip2.

Other compression tools

Two older programs, compress and uncompress, are still frequently found on Linux and UNIX systems.

In addition, the zip and unzip commands from the Info-ZIP project are implemented for Linux. These provide cross-platform compression functions that are available on a wide range of hardware and operating systems. Be aware that not all operating systems support the same file attributes or filesystem capabilities. If you download a zipped product file and unzip it on a Windows system, then transfer the resulting files to a CD or DVD for installation on Linux, you may experience problems installing because, for example, the Windows system does not support the symbolic links that were part of the original uncompressed file set.

For more information on these or other compression programs, see their respective man pages .


Archiving files

The tar, cpio, and dd commands are commonly used for backing up groups of files or even whole partitions, either for archiving or for transmission to another user or site. Exam 201, which is part of the LPIC-2 certification, focuses on backup considerations in greater detail.

There are three general approaches to backup:

  1. A differential or cumulative backup is a backup of all things that have changed since the last full backup. Recovery requires the last full backup plus the latest differential backup.
  2. An incremental backup is a backup of only those changes since the last incremental backup. Recovery requires the last full backup plus all of the incremental backups (in order) since the last full backup.
  3. A full backup is a complete backup, usually of a whole filesystem, directory, or group of related files. This takes the longest time to create, so it is usually used with one of the other two approaches.

These commands, along with the other commands you have learned about in this article, give you the tools to perform any of these backup tasks.

Using tar

The tar (originally from Tape ARchive) creates an archive file, or tarfile or tarball, from a set of input files or directories; it also restores files from such an archive. If a directory is given as input to tar, all files and subdirectories are automatically included, which makes tar very convenient for archiving subtrees of your directory structure.

Output can be to a file, a device such as tape or diskette, or stdout. The output location is specified with the -f option. Other common options are -c to create an archive, -x to extract an archive, -v for verbose output, which lists the files being processed, -z to use gzip compression, and -j to use bzip2 compression. Most tar options have a short form using a single hyphen and a long form using a pair of hyphens. The short forms are illustrated here. See the man pages for the long form and for additional options.

Listing 30 shows how to create a backup of our lpi103-2 directory using tar.

Listing 30. Backing up our lpi103-2 directory using tar
[ian@echidna lpi103-2]$ tar -cvf ../lpitar1.tar .
./
./text3
./yab
...
./f5

Usually you will want to compress archive files to save space or reduce transmission time. The GNU version of the tar command allows you to do this with a single option —-z for compression using gzip and -b for compression using bzip2. Listing 31 illustrates the use of the -z option and the difference in size between the two archives.

Listing 31. Compressing the tar archive with gzip
[ian@echidna lpi103-2]$ tar -zcvf ../lpitar2.tar ~/lpi103-2/
tar: Removing leading `/' from member names
/home/ian/lpi103-2/
/home/ian/lpi103-2/text3
/home/ian/lpi103-2/yab
...
/home/ian/lpi103-2/f5
[ian@echidna lpi103-2]$ ls -l ../lpitar*
-rw-rw-r--. 1 ian ian 30720 2009-09-24 15:38 ../lpitar1.tar
-rw-rw-r--. 1 ian ian   881 2009-09-24 15:39 ../lpitar2.tar

Listing 31 also shows another important feature of tar. We used an absolute directory path, and the first line of output tells you that tar is removing the leading slash (/) from member names. This allows files to be restored to some other location for verification and can be particularly important if you are trying to restore system files. If you really want to store absolute names, use the -p option. It is also a good idea to avoid mixing absolute path names with relative path names when creating an archive, since all will be relative when restoring from the archive.

The tar command can append additional files to an archive using the -r or --append option. This may cause multiple copies of a file in the archive. In such a case, the last one will be restored during a restore operation. You can use the --occurrence option to select a specific file among multiples. If the archive is on a regular filesystem instead of tape, you may use the -u or --update option to update an archive. This works like appending to an archive, except that the time stamps of the files in the archive are compared with those on the filesystem, and only files that have been modified since the archived version are appended. As mentioned, this does not work for tape archives.

The tar command can also compare archives with the current filesystem and restore files from archives. Use the -d, --compare, or --diff option to perform comparisons. The output will show files whose contents differ, as well as files whose time stamps differ. Normally, only files that differ, if any, are listed. Use the -v option discussed earlier for verbose output . The -C or --directory option tells tar to perform an operation starting from the specified directory rather than the current directory.

Listing 32 shows some examples. We use touch to modify the timestamp of the f1 file, then illustrate comparison operations of tar before restoring f1 from one of our archives. We use a variety of option forms for illustration.

Listing 32. Comparing and restoring using tar
[ian@echidna lpi103-2]$ touch f1
[ian@echidna lpi103-2]$ tar --diff --file ../lpitar1.tar .
./f1: Mod time differs
[ian@echidna lpi103-2]$ tar -df ../lpitar2.tar -C /
home/ian/lpi103-2/f1: Mod time differs
[ian@echidna lpi103-2]$ tar -xvf ../lpitar1.tar ./f1 # See below
./f1
[ian@echidna lpi103-2]$ tar --compare -f ../lpitar2.tar --directory /

The files or directories you specify for restoration must match the name in the archive. Attempting to restore just f1 rather than ./f1 in this case would not work. You can use globbing, but you need to be careful to avoid restoring more or less than you want. You can use the --list or -t option to list archive contents if you are unsure what is in an archive. Listing 33 shows an example of a wildcard specification that would have restored more files than just ./f1.

Listing 33. Listing archive contents with tar
[ian@echidna lpi103-2]$ tar -tf ../lpitar1.tar "*f1*"
./f1a
./f1

You can use the find command to select the files for archiving and then pipe the result to tar. We'll discuss this technique as part of the discussion of cpio, but the same method works for tar.

As with the other commands you have studied here, there are many options that are not covered in this brief introduction. See the man or info pages for more details.

Using cpio

The cpio command operates in copy-out mode to create an archive, copy-in mode to restore an archive, or copy-pass mode to copy a set of files from one location to another. You use the -o or --create option for copy-out mode, the -i or --extract option for copy-in mode, and the -p or --pass-through option for copy-pass mode. Input is a list of files provided on stdin. Output is either to stdout or to a device or file specified with the -f or --file option.

Listing 34 shows how to generate a list of files using the find command and then pipe the list to cpio. Note the use of the -print0 option on find to generate null-terminate strings for file names, and the corresponding --null option on cpio to read this format. This will correctly handle file names that have embedded blank or newline characters. The -depth option tells find to list directory entries before the directory name. In this example, we simply create two archives of our lpi103-2 directory, one with relative names and one with absolute names. We do not use the many capabilities of find to restrict the selected files, such as finding only the files modified this week.

Listing 34. Backing up a directory using cpio
[ian@echidna lpi103-2]$ find . -depth -print0 | cpio --null -o > ../lpicpio.1
3 blocks
[ian@echidna lpi103-2]$ find ~/lpi103-2/ -depth -print0 | cpio --null -o > ../lpicpio.2
4 blocks

If you'd like to see the files listed as they are archived, add the -v option to cpio.

The cpio command in copy-in mode (option -i or --extract) can list the contents of an archive or restore selected files. When you list the files, specifying the --absolute-filenames option reduces the number of extraneous messages that some older versions of cpio will otherwise issue as they strip any leading / characters from each path that has one. This option is quietly ignored on many current implementations. Output from selectively listing our previous archives is shown in Listing 35.

Listing 35. Listing and restoring selected files using cpio
[ian@echidna lpi103-2]$ cpio  -i --list  "*backup*" < ../lpicpio.1
backup
backup/text1.bkp.1
backup/text1.bkp.2
3 blocks
[ian@echidna lpi103-2]$ cpio  -i --list absolute-filenames "*text1*" < ../lpicpio.2
/home/ian/lpi103-2/text10
/home/ian/lpi103-2/backup/text1.bkp.1
/home/ian/lpi103-2/backup/text1.bkp.2
/home/ian/lpi103-2/text1
4 blocks

Listing 36 shows how to restore all the files with "text1" in their path into a temporary subdirectory. Some of these are in subdirectories. Unlike tar, you will need to specify the -d or --make-directories option explicitly if your directory tree does not exist. Furthermore, cpio will not replace any newer files on the filesystem with archive copies unless you specify the -u or --unconditional option.

Listing 36. Restoring selected files using cpio
[ian@echidna lpi103-2]$ mkdir temp
[ian@echidna lpi103-2]$ cd temp
[ian@echidna temp]$ cpio  -idv "*f1*" "*.bkp.1" < ../../lpicpio.1
f1a
f1
backup/text1.bkp.1
3 blocks
[ian@echidna temp]$ cpio  -idv "*.bkp.1" < ../../lpicpio.1
cpio: backup/text1.bkp.1 not created: newer or same age version exists
backup/text1.bkp.1
3 blocks
[ian@echidna temp]$ cpio  -id --no-absolute-filenames "*text1*" < ../../lpicpio.2
cpio: Removing leading `/' from member names
4 blocks
./home/ian/lpi103-2/backup/text1.bkp.1
./home/ian/lpi103-2/backup/text1.bkp.2
./home/ian/lpi103-2/text1
./backup/text1.bkp.1
[ian@echidna temp]$ cd ..
[ian@echidna lpi103-2]$ rm -rf temp # You may remove these after you have finished

For details on other options, see the man page.

The dd command

In its simplest form, the dd command copies an input file to an output file. You have already seen the cp command, so you may wonder why have another command to copy files. The dd command can do a couple of things that regular cp cannot. In particular, it can perform conversions on the file, such as converting lowercase to uppercase or ASCII to EBCDIC. It can also reblock a file, which may be desirable when transferring it to tape. It can skip or include only selected blocks of a file. And finally, it can read and write to raw devices, such as /dev/sda, which allows you to create or restore a file that is a whole partition image. Writing to devices usually requires root authority.

We will start with a simple example of converting a file to upper case using the conv option as shown in Listing 37. We use the if option to specify the input file rather than using the default of stdin. A similar of option is available to override the default output to stdout. For purposes of illustration, we have specified different input and output block sizes using the ibs and obs options. For large files it can be handy to use larger block sizes to speed up operations when transferring disk to disk. Otherwise, block sizes are mostly used with magnetic tapes. Note the three status lines at the end of the listing showing how many complete and partial blocks were read and written and the total amount of data transferred.

Listing 37. Converting text to upper case using dd
[ian@echidna lpi103-2]$ cat text6
1 apple
2 pear
3 banana
9       plum
3       banana
10      apple
1 apple
2 pear
3 banana
9       plum
3       banana
10      apple
[ian@echidna lpi103-2]$ dd if=text6 conv=ucase ibs=20 obs=30
1 APPLE
2 PEAR
3 BANANA
9       PLUM
3       BANANA
10      APPLE
1 APPLE
2 PEAR
3 BANANA
9       PLUM
3       BANANA
10      APPLE
4+1 records in
3+1 records out
98 bytes (98 B) copied, 0.00210768 s, 46.5 kB/s

Either file may be a raw device. This will usually be the case for magnetic tape, but a whole disk partition, such as /dev/hda1 or /dev/sda2, can be backed up to a file or tape. Ideally, the filesystem on the device should be unmounted, or at least mounted read only, to ensure that data does not change during the backup. Listing 39 shows an example where the input file is a raw device, dev/sda3, and the output file is a file, backup-1, in the root user's home directory. To dump the file to tape or floppy disk, you would specify something like of=/dev/fd0 or of=/dev/st0.

Listing 38. Backing up a partition using dd
[root@echidna ~]# dd if=/dev/sda2 of=backup-1
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 24.471 s, 32.6 MB/s

Note that 797,852,160 bytes of data were copied and the output file is indeed that large, even though only about 3% of this particular partition is actually used. Unless you are copying to a tape with hardware compression, you will probably want to compress the data. Listing 39 shows one way to accomplish this, along with the output of ls and df commands, which show you the file sizes and the usage percentage of the filesystem on /dev/sda3.

Listing 39. Backing up with compression using dd
[root@echidna ~]# dd if=/dev/sda2 |gzip >backup-2
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 23.4617 s, 34.0 MB/s
[root@echidna ~]# ls -l backup-[12]
-rw-r--r--. 1 root root 797852160 2009-09-25 17:13 backup-1
-rw-r--r--. 1 root root    995223 2009-09-25 17:14 backup-2
[root@echidna ~]# df -h /dev/sda2
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             755M   18M  700M   3% /grubfile

The gzip compression reduced the file size to about 20% of the uncompressed size. However, unused blocks may contain arbitrary data, so even the compressed backup may be much larger than the total data on the partition.

If you divide the total bytes copied by the number of records processed, you will see that dd is writing 512-byte blocks of data. When copying to a raw output device such as tape, this can result in a very inefficient operation. As we mentioned above, specify the obs option to change the output size or the ibs option to specify the input block size. You can also specify just bs to set both input and output block sizes to a common value. When using tape, remember to use the same block size for reading the tape as you used for writing it.

If you need multiple tapes or other removable storage to store your backup, you will need to break it into smaller pieces using a utility such as split. If you need to skip blocks such as disk or tape labels, you can do so with dd. See the man page for examples.

The dd command is not filesystem aware, so you will need to restore a dump of a partition to find out what is on it. Listing 40 shows how to restore the partition that was dumped in Listing 39 to a partition, /dev/sdc7, that was specially created on a removable USB drive just for this purpose.

Listing 40. Restoring a partition using dd
[root@echidna ~]# gunzip backup-2 -c | dd  of=/dev/sdc7
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 30.624 s, 26.1 MB/s

You may be interested to know that some CD- and DVD-burning applications use the dd command under the covers to do the actual device writing. If the utility you use provides a log of commands actually used, you may find it instructive to look at the log now that you know a little more about dd. Indeed, if you burn an ISO image to a CD or DVD disc, one way to verify that there were no errors is to use dd to read the disc back and pipe the result through the cmp utility. Listing 41 illustrates the general technique using the backup file that we created in this article rather than an ISO image. Note that we calculate the number of blocks to read using the file size of the image.

Listing 41. Comparing an image with a filesystem.
[root@echidna ~]# ls -l backup-1
-rw-r--r--. 1 root root 797852160 2009-09-25 17:13 backup-1
[root@echidna ~]# echo $(( 797852160 / 512 )) # calculate number of 512 byte blocks
1558305
[root@echidna ~]# dd if=/dev/sdc7 bs=512 count=1558305 | cmp - backup-1
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 26.7942 s, 29.8 MB/s

Resources

Learn

Get products and technologies

  • With IBM trial software, available for download directly from developerWorks, build your next development project on Linux.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=433363
ArticleTitle=Learn Linux, 101: File and directory management
publish-date=10062009