This article grounds you in the basic Linux commands for manipulating files and directories. Learn to:
- List directory contents
- Copy, move, or remove files and directories
- Manipulate multiple files and directories recursively
- Use wildcard patterns for manipulating files
- Use the
findcommand to locate and act on files based on type, size, or time - Compress and decompress files using
gzipandbzip2 - Archive files using
tar,cpioanddd
This article helps you prepare for Objective 103.2 in Topic 103 of the Linux Professional Institute's Junior Level Administration (LPIC-1) exam 101. The objective has a weight of 4.
All files on Linux and UNIX® systems are accessed as part of a single large tree-structured filesystem that is rooted at /. You may add more branches to this tree by mounting them and remove them by unmounting them. Mounting and unmounting will be covered in the article on Mounting and unmounting of filesystems (see the series roadmap).
In this article, we will practice the commands using the files created in the article "Learn Linux 101: Text streams and filters." If you followed along in that article, you created a directory, lpi103-2, in your home directory. If you didn't, then you can use another directory on your system to practice the commands discussed in this article.
File and directory names are either absolute, meaning they begin with a /, or they are relative to the current working directory, meaning they do not begin with a /. The absolute path to a file or directory consists of a / followed by series of zero or more directory names, each followed by another / and then a final name.
Given a file or directory name that is relative to the current working directory, simply concatenate the absolute name of the working directory, a /, and the relative name. For example, the directory, lpi103-2, that we created in the earlier article was created in my home directory, /home/ian, so its full, or absolute, path is /home/ian/lpi103-2.
You can display the name of the current working directory with the
pwd command. It is also usually available in
the PWD environment variable. Listing 1 shows the use of the
pwd command, and three different ways to use
the ls command to list the files in this
directory.
Listing 1. Listing directory entries
[ian@echidna lpi103-2]$ pwd
/home/ian/lpi103-2
[ian@echidna lpi103-2]$ echo "$PWD"
/home/ian/lpi103-2
[ian@echidna lpi103-2]$ ls
sedtab text1 text2 text3 text4 text5 text6 xaa xab yaa yab
[ian@echidna lpi103-2]$ ls "$PWD"
sedtab text1 text2 text3 text4 text5 text6 xaa xab yaa yab
[ian@echidna lpi103-2]$ ls /home/ian/lpi103-2
sedtab text1 text2 text3 text4 text5 text6 xaa xab yaa yab
|
As you can see, you can give a relative or absolute directory name as a
parameter to the ls command, and it will list
the contents of that directory.
On a storage device, a file or directory is contained in a collection of
blocks. Information about a file is contained in an
inode which records information such as the owner, when the
file was last accessed, how large it is, whether it is a directory or not,
and who can read from or write to it. The inode number is also known as
the file serial number and is unique within a particular
filesystem. We can use the -l (or
--format=long) option to display some of the
information stored in the inode.
By default, the ls command does not list
special files, those whose names start with a dot (.). Every directory
other than the root directory has at least two special entries: the
directory itself (.) and the parent directory (..). The root directory
does not have a parent directory.
Listing 2 uses the -l and
-a options to display a long format listing of
all files including the . and .. directory entries.
Listing 2. Displaying a long directory listing
[ian@echidna lpi103-2]$ ls -al
total 52
drwxrwxr-x. 2 ian ian 4096 2009-08-11 21:21 .
drwx------. 35 ian ian 4096 2009-08-12 10:55 ..
-rw-rw-r--. 1 ian ian 8 2009-08-11 21:17 sedtab
-rw-rw-r--. 1 ian ian 24 2009-08-11 14:02 text1
-rw-rw-r--. 1 ian ian 25 2009-08-11 14:27 text2
-rw-rw-r--. 1 ian ian 63 2009-08-11 15:41 text3
-rw-rw-r--. 1 ian ian 26 2009-08-11 15:42 text4
-rw-rw-r--. 1 ian ian 24 2009-08-11 18:47 text5
-rw-rw-r--. 1 ian ian 98 2009-08-11 21:21 text6
-rw-rw-r--. 1 ian ian 15 2009-08-11 14:41 xaa
-rw-rw-r--. 1 ian ian 9 2009-08-11 14:41 xab
-rw-rw-r--. 1 ian ian 17 2009-08-11 14:41 yaa
-rw-rw-r--. 1 ian ian 8 2009-08-11 14:41 yab
|
In Listing 2, the first line shows the total number of disk blocks (52) used by the listed files. The remaining lines tell you about the directory entries.
- The first field (drwxrwxr-x or -rw-rw-r-- in this case) tells us whether the file is a directory (d) or a regular file (-). You may also see symbolic links (l) or other values for some special files (such as files in the /dev filesystem). You will learn more about symbolic links in the article Create and change hard and symbolic links (see the series roadmap). The type is followed by three sets of permissions (such as rwx or r--) for the owner, the members of the owner's group, and everyone. The three values, respectively, indicate whether the user, group, or everyone has read (r), write (w), or execute (x) permission. Other uses such as setuid will be covered in the article Manage file permissions and ownership (see the series roadmap).
- The next field is a number that tells us the number of hard links to the file. We said that the inode contains information about the file. The file's directory entry contains a hard link (or pointer) to the inode for the file, so every entry listed should have at least one hard link. Directory entries have an additional one for the . entry and one for each subdirectory entry. So we can see from Listing 2 that my home directory, represented by .., has quite a few subdirectories, as it has 35 hard links.
- The next two fields are the file's owner and the owner's primary group. Some systems, such as Red Hat or Fedora systems, default to providing a separate group for each user. On other systems, all users may be in one or perhaps a few groups.
- The next field contains the length of the file in bytes.
- The penultimate field contains the timestamp of the last modification.
- And the final field contains the name of the file or directory.
The -i option of the
ls
command will display the inode numbers for
you. You will see inodes again later in this article and also in the
article Create and change hard and symbolic links (see
the series roadmap).
You can also specify multiple parameters to the
ls command, where each name is either that of a
file or directory. For directory names, the ls
command lists the contents of the directory rather than information about
the directory itself. In our example, suppose we wanted information about
the lpi103-2 directory entry itself as it is listed in the parent
directory. The command ls -l ../lpi103-2 would
give us a listing like the previous example. Listing 3 shows how to add
the -d option to list information about
directory entries rather than the contents of directories and also how to
list entries for multiple files or directories.
Listing 3. Using ls -d
[ian@echidna lpi103-2]$ ls -ld ../lpi103-2 sedtab xaa
drwxrwxr-x. 2 ian ian 4096 2009-08-12 15:31 ../lpi103-2
-rw-rw-r--. 1 ian ian 8 2009-08-11 21:17 sedtab
-rw-rw-r--. 1 ian ian 15 2009-08-11 14:41 xaa
|
Note that the modification time for lpi103-2 is different from that in the previous listing. Also, as in the previous listing, it is different from the timestamps of any of the files in the directory. Is this what you would expect? Not normally. However, in developing this article, I created some extra examples and then deleted them, so the directory time stamps reflect that fact. We will talk more about file times later under Handling multiple files and directories.
By default, ls lists files alphabetically. There
are a number of options for sorting the output. For example,
ls -t will sort by modification time (newest to
oldest) while ls -lS will produce a long
listing sorted by size (largest to smallest). Adding
-r will reverse the sort order. For example,
use ls -lrt to produce a long listing sorted
from oldest to newest. Consult the man page for other ways you can list
files and directories.
Copying, moving, and deleting files
We have now learned some ways to create files, but suppose we want to make copies of files, rename files, move them around the filesystem hierarchy, or even delete them. We use three short commands for these purposes.
-
cp - is used to make a copy of one or more files or directories. You must give one (or more source names and one target name. Source or target names may include a path specification. If the target is an existing directory, then all sources are copied into the target. If the target is a directory that does not exist, then the (single) source must also be a directory and a copy of the source directory and its contents is made with the target name as the new name. If the target is a file, then the (single) source must also be a file and a copy of the source file is made with the target name as the new name, replacing any existing file of the same name. Note that there is no default assumption of the target being the current directory as in DOS and Windows operating systems.
-
mv - is used to move or rename one or more files or
directories. In general, the names you may use follow the same rules
as for copying with
cp; you can rename a single file or move a set of files into a new directory. Because the name is only a directory entry that links to an inode, it should be no surprise that the inode number does not change unless the file is moved to another filesystem, in which case moving it behaves more like a copy followed by deleting the original. -
rm - is used to remove one or more files. We will see how to remove directories shortly.
Listing 4 illustrates the use of cp and
mv to make some backup copies of our text
files. We also use ls -i to show inodes for
some of our files.
- We first make a copy of our text1 file as text1.bkp.
- We then decide to create a backup subdirectory using the
mkdircommand - We make a second backup copy of text 1, this time in the backup directory, and show that all three files have different inodes.
- We then move our text1.bkp to the backup directory and after that rename it to be more consistent with the second backup. While we could have done this with a single command, we use two here for illustration.
- We check the inodes again and confirm that text1.bkp with inode 934193 is no longer in our lpi103-2 directory, but that the inode is that of text1.bkp.1 in the backup directory.
Listing 4. Copying and moving files
[ian@echidna lpi103-2]$ cp text1 text1.bkp
[ian@echidna lpi103-2]$ mkdir backup
[ian@echidna lpi103-2]$ cp text1 backup/text1.bkp.2
[ian@echidna lpi103-2]$ ls -i text1 text1.bkp backup
933892 text1 934193 text1.bkp
backup:
934195 text1.bkp.2
[ian@echidna lpi103-2]$ mv text1.bkp backup
[ian@echidna lpi103-2]$ mv backup/text1.bkp backup/text1.bkp.1
[ian@echidna lpi103-2]$ ls -i text1 text1.bkp backup
ls: cannot access text1.bkp: No such file or directory
933892 text1
backup:
934193 text1.bkp.1 934195 text1.bkp.2
|
Normally, the cp command will copy a file over
an existing copy, if the existing file is writable. On the other hand, the
mv will not move or rename a file if the target
exists. There are several useful options relevant to this behavior of
cp and mv.
-for--force- will cause
cpto attempt to remove an existing target file even if it is not writable. -ior--interactive- will ask for confirmation before attempting to replace an existing file
-bor--backup- will make a backup of any files that would be replaced.
As usual, consult the man pages for full details on these and other options for copying and moving.
Listing 6 illustrates copying with backup and then file deletion.
Listing 5. Making backup copies and deleting files
[ian@echidna lpi103-2]$ cp text2 backup
[ian@echidna lpi103-2]$ cp --backup=t text2 backup
[ian@echidna lpi103-2]$ ls backup
text1.bkp.1 text1.bkp.2 text2 text2.~1~
[ian@echidna lpi103-2]$ rm backup/text2 backup/text2.~1~
[ian@echidna lpi103-2]$ ls backup
text1.bkp.1 text1.bkp.2
|
Note that the rm command also accepts the
-i (interactive) and
-f (force options). Once you remove a file
using rm, the filesystem no longer has access
to it. Some systems default to setting an alias
alias rm='rm -i' for the root user to help
prevent inadvertent file deletion. This is also a good idea for ordinary
users if you are nervous about what you might accidentally delete.
Before we leave this discussion, it should be noted that the
cp command defaults to creating a new timestamp
for the new file or files. The owner and group are also set to the owner
and group of the user doing the copying. The -p
option may be used to preserve selected attributes. Note that the root
user may be the only user who can preserve ownership. See the man page for
details.
Creating and removing directories
We have already seen how to create a directory with
mkdir. Now we will look further at
mkdir and introduce
rmdir, its analog for removing directories.
Suppose we are in our lpi103-2 directory and we wish to create
subdirectories dir1 and dir2. mkdir, like the
commands we have just been reviewing, will handle multiple directory
creation requests in one pass as shown in Listing 6.
Listing 6. Creating multiple directories
[ian@echidna lpi103-2]$ mkdir dir1 dir2
|
Note that there is no output on successful completion, although you could
use echo $? to confirm that the exit code is
really 0.
If, instead, you wanted to create a nested subdirectory, such as d1/d2/d3,
this would fail because the d1 and d2 directories do not exist.
Fortunately, mkdir has a
-p option that allows it to create any required
parent directories, as shown in Listing 7.
Listing 7. Creating parent directories
[ian@echidna lpi103-2]$ mkdir d1/d2/d3
mkdir: cannot create directory `d1/d2/d3': No such file or directory
[ian@echidna lpi103-2]$ echo $?
1
[ian@echidna lpi103-2]$ mkdir -p d1/d2/d3
[ian@echidna lpi103-2]$ echo $?
0
|
Removing directories using the rmdir command is
the opposite of creating them. Again, there is a
-p option to remove parents as well. You can
remove a directory with rmdir only if it is
empty as there is no option to force removal. We'll see another way to
accomplish that particular trick when we look at
recursive manipulation. Once you learn this, you
will probably seldom use rmdir on the command
line, but it is still good to know about it.
To illustrate directory removal, we copied our text1 file into the
directory d1/d2 so that it is no longer empty. We then used
rmdir to remove all the directories we just
created with mkdir. As you can see, d1 and d2
were not removed because d2 was not empty. The other directories were
removed. Once we remove the copy of text1 from d2, we can remove d1 and d2
with a single invocation of rmdir -p.
Listing 8. Removing directories
[ian@echidna lpi103-2]$ cp text1 d1/d2
[ian@echidna lpi103-2]$ rmdir -p d1/d2/d3 dir1 dir2
rmdir: failed to remove directory `d1/d2': Directory not empty
[ian@echidna lpi103-2]$ ls . d1/d2
.:
backup sedtab text2 text4 text6 xab yab
d1 text1 text3 text5 xaa yaa
d1/d2:
text1
[ian@echidna lpi103-2]$ rm d1/d2/text1
[ian@echidna lpi103-2]$ rmdir -p d1/d2
|
Handling multiple files and directories
Up to now the commands we have used have operated on a single file or perhaps a few individually named files. For the rest of this article, we will look at various operations for handling multiple files, recursively manipulating part of a directory tree, and saving or restoring multiple files or directories.
The ls command has a
-R (note upper case "R") option for listing a
directory and all its subdirectories. The recursive option applies only to
directory names; it will not find all the files called 'text1', for
example, in a directory tree. You may use other options that we have seen
already along with -R. A recursive listing of
our lpi103-2 directory, including inode numbers, is shown in Listing 9.
Listing 9. Displaying directory listings recursively
[ian@echidna lpi103-2]$ ls -iR
.:
934194 backup 933892 text1 933898 text3 933900 text5 933894 xaa 933896 yaa
933901 sedtab 933893 text2 933899 text4 933902 text6 933895 xab 933897 yab
./backup:
934193 text1.bkp.1 934195 text1.bkp.2
|
You can use the -r (or
-R or --recursive)
option to cause the cp command to descend into
source directories and copy the contents recursively. To prevent an
infinite recursion, the source directory itself may not be copied. Listing
10 shows how to copy everything in our lpi103-2 directory to a copy1
subdirectory. We use ls -R to show the
resulting directory tree.
Listing 10. Copying recursively
[ian@echidna lpi103-2]$ cp -pR . copy1
cp: cannot copy a directory, `.', into itself, `copy1'
[ian@echidna lpi103-2]$ ls -R
.:
backup copy1 sedtab text1 text2 text3 text4 text5 text6 xaa xab yaa yab
./backup:
text1.bkp.1 text1.bkp.2
./copy1:
text2 text3 text5 xaa yaa yab
|
We mentioned earlier that rmdir only removes
empty directories. We can use the -r (or
-R or --recursive)
option to cause the rm command to remove both
files and directories as shown in Listing 11 where we remove the
copy1 directory that we just created, along with its contents, including
the backup subdirectory and its contents.
Listing 11. Deleting recursively
[ian@echidna lpi103-2]$ rm -r copy1
[ian@echidna lpi103-2]$ ls -R
.:
backup sedtab text1 text2 text3 text4 text5 text6 xaa xab yaa yab
./backup:
text1.bkp.1 text1.bkp.2
|
If you have files that are not writable by you, you may need to add the
-f option to force removal. This is often done
by the root user when cleaning up, but be warned that you can lose
valuable data if you are not careful.
Often, you may need to perform a single operation on many filesystem objects, without operating on the entire tree as we just did with recursive operations. For example, you might want to find the modification times of all the text files we created in lpi103-2, without listing the split files. Although this is easy with our small directory, it is much harder in a large filesystem.
To solve this problem, use the wildcard support that is built in to the bash shell. This support, also called "globbing" (because it was originally implemented as a program called /etc/glob), lets you specify multiple files using wildcard pattern.
A string containing any of the characters '?', '*' or '[', is a wildcard pattern. Globbing is the process by which the shell (or possibly another program) expands these patterns into a list of pathnames matching the pattern. The matching is done as follows:
- ?
- matches any single character.
- *
- matches any string, including an empty string.
- [
- introduces a character class. A character class is a non-empty
string, terminated by a ']'. A match means matching any single
character enclosed by the brackets. There are a few special
considerations.
- The '*' and '?' characters match themselves. If you use these in filenames, you need to be careful about appropriate quoting or escaping.
- Because the string must be non-empty and terminated by ']', you must put ']' first in the string if you want to match it.
- The '-' character between two others represents a range that includes the two other characters and all characters between them in the collating sequence. For example, [0-9a-fA-F] represents any upper or lower case hexadecimal digit. You can match a '-' by putting it either first or last within a range.
- The '!' character specified as the first character of a range complements the range so that it matches any character except the remaining characters. For example, [!0-9] means any character except the digits 0 through 9. A '!' in any position other than the first matches itself. Remember that '!' is also used with the shell history function, so you need to be careful to properly escape it.
Note: Wildcard patterns and regular expression patterns share some characteristics, but they are not the same. Pay careful attention.
Globbing is applied separately to each component of a path name. You cannot
match a '/', nor include one in a range. You can use it anywhere that you
might specify multiple file or directory names, for example in the
ls, cp,
mv, or rm commands.
In Listing 12, we first create a couple of oddly named files and then use
the ls and rm
commands with wildcard patterns.
Listing 12. Wildcard pattern examples
[ian@echidna lpi103-2]$ echo odd1>'text[*?!1]'
[ian@echidna lpi103-2]$ echo odd2>'text[2*?!]'
[ian@echidna lpi103-2]$ ls
backup text1 text2 text3 text5 xaa yaa
sedtab text[*?!1] text[2*?!] text4 text6 xab yab
[ian@echidna lpi103-2]$ ls text[2-4]
text2 text3 text4
[ian@echidna lpi103-2]$ ls text[!2-4]
text1 text5 text6
[ian@echidna lpi103-2]$ ls text*[2-4]*
text2 text[2*?!] text3 text4
[ian@echidna lpi103-2]$ ls text*[!2-4]* # Surprise!
text1 text[*?!1] text[2*?!] text5 text6
[ian@echidna lpi103-2]$ ls text*[!2-4] # Another surprise!
text1 text[*?!1] text[2*?!] text5 text6
[ian@echidna lpi103-2]$ echo text*>text10
[ian@echidna lpi103-2]$ ls *\!*
text[*?!1] text[2*?!]
[ian@echidna lpi103-2]$ ls *[x\!]*
text1 text[*?!1] text10 text2 text[2*?!] text3 text4 text5 text6 xaa xab
[ian@echidna lpi103-2]$ ls *[y\!]*
text[*?!1] text[2*?!] yaa yab
[ian@echidna lpi103-2]$ ls tex?[[]*
text[*?!1] text[2*?!]
[ian@echidna lpi103-2]$ rm tex?[[]*
[ian@echidna lpi103-2]$ ls *b*
sedtab xab yab
backup:
text1.bkp.1 text1.bkp.2
[ian@echidna lpi103-2]$ ls backup/*2
backup/text1.bkp.2
[ian@echidna lpi103-2]$ ls -d .*
. ..
|
Notes:
- Complementation in conjunction with '*' can lead to some surprises. The pattern '*[!2-4]' matches the longest part of a name that does not have 2, 3, or 4 following it, which is matched by both text[*?!1] and text[2*?!]. So now both surprises should be clear.
- As with earlier examples of
ls, if pattern expansion results in a name that is a directory name and the-doption is not specified, then the contents of that directory will be listed (as in our example above for the pattern '*b*'). - If a filename starts with a period (.), then that character must be
matched explicitly. Notice that only the last
lscommand listed the two special directory entries (. and ..).
Remember that any wildcard characters in a command are liable to be expanded by the shell, which may lead to unexpected results. Furthermore, if you specify a pattern that does not match any filesystem objects, then POSIX requires that the original pattern string be passed to the command. Some earlier implementations passed a null list to the command, so you may run into old scripts that give unusual behavior. We illustrate these points in Listing 13.
Listing 13. Wildcard pattern surprises
[ian@echidna lpi103-2]$ echo text*
text1 text10 text2 text3 text4 text5 text6
[ian@echidna lpi103-2]$ echo "text*"
text*
[ian@echidna lpi103-2]$ echo text[[\!?]z??
text[[!?]z??
|
For more information on globbing, look at
man 7 glob. You will need the section number,
as there is also glob information in section 3. The best way to understand
all the various shell interactions is by practice, so try these wildcards
out whenever you have a chance. Remember to try
ls to check your wildcard pattern before
allowing cp, mv, or
worse, rm to do something unexpectedly.
We will now look at the touch command, which can
update file access and modification times or create empty files. In the
next part, we will see how to use this information for finding files and
directories. We will continue using the lpi103-2 directory for our
examples. We will also look at the various ways you may specify
timestamps.
The touch command with no options takes one or
more filenames as parameters and updates the modification time of
the files. This is the same timestamp normally displayed with a long
directory listing. In Listing 14, we use our old friend
echo to create a small file called f1, and then
use a long directory listing to display the modification time (or
mtime). In this case, it happens also to be the time the file
was created. We then use the sleep command to
wait for 60 seconds and run ls again. Note that
the timestamp for the file has changed by a minute.
Listing 14. Updating modification time with touch
[ian@echidna lpi103-2]$ echo xxx>f1; ls -l f1; sleep 60; touch f1; ls -l f1
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:24 f1
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
|
If you specify a filename for a file that does not exist, then
touch will normally create an empty file for
you, unless you specify the -c or
--no-create option. Listing 15 illustrates both
these commands. Note that only f2 is created.
Listing 15. Creating empty files with touch
[ian@echidna lpi103-2]$ touch f2; touch -c f3; ls -l f*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 f2
|
The touch command can also set a file's
modification time (also known as mtime) to a specific date and time
using either the -d or
-t options. The -d
is very flexible in the date and time formats that it will accept, while
the -t option needs at least an MMDDhhmm time
with optional year and seconds values. Listing 16 shows some examples.
Listing 16. Setting mtime with touch
[ian@echidna lpi103-2]$ touch -t 200908121510.59 f3
[ian@echidna lpi103-2]$ touch -d 11am f4
[ian@echidna lpi103-2]$ touch -d "last fortnight" f5
[ian@echidna lpi103-2]$ touch -d "yesterday 6am" f6
[ian@echidna lpi103-2]$ touch -d "2 days ago 12:00" f7
[ian@echidna lpi103-2]$ touch -d "tomorrow 02:00" f8
[ian@echidna lpi103-2]$ touch -d "5 Nov" f9
[ian@echidna lpi103-2]$ ls -lrt f*
-rw-rw-r--. 1 ian ian 0 2009-07-31 18:31 f5
-rw-rw-r--. 1 ian ian 0 2009-08-12 12:00 f7
-rw-rw-r--. 1 ian ian 0 2009-08-12 15:10 f3
-rw-rw-r--. 1 ian ian 0 2009-08-13 06:00 f6
-rw-rw-r--. 1 ian ian 0 2009-08-14 11:00 f4
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 f2
-rw-rw-r--. 1 ian ian 0 2009-08-15 02:00 f8
-rw-rw-r--. 1 ian ian 0 2009-11-05 00:00 f9
|
If you're not sure what date a date expression might resolve to, you can
use the date command to find out. It also
accepts the -d option and can resolve the same
kind of date formats that touch can.
You can use the -r (or
--reference) option along with a reference
filename to indicate that touch (or
date) should use the timestamp of an existing
file. Listing 17 shows some examples.
Listing 17. Timestamps from reference files
[ian@echidna lpi103-2]$ date
Fri Aug 14 18:33:48 EDT 2009
[ian@echidna lpi103-2]$ date -r f1
Fri Aug 14 18:25:50 EDT 2009
[ian@echidna lpi103-2]$ touch -r f1 f1a
[ian@echidna lpi103-2]$ ls -l f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a
|
A Linux system records both a file modification time and a file
access time. These are also known respectively as the
mtime and atime. Both timestamps are set to the same
value when a file is created, and both are reset when it is modified. If a
file is accessed at all, then the access time is updated, even if the file
is not modified. For our last example with
touch, we will look at file access
times. The -a (or
--time=atime,
--time=access or
--time=use) option specify that the access time
should be updated. Listing 18 uses the cat
command to access the f1 file and display its contents. We then use
ls -l and ls -lu to
display the modification and access times respectively for f1 and f1a,
which we created using f1 as a reference file. We then reset the access
time of f1 to that of f1a using touch -a and
verify that it was reset.
Listing 18. Access time and modification time
[ian@echidna lpi103-2]$ ls -lu f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:39 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a
[ian@echidna lpi103-2]$ ls -l f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a
[ian@echidna lpi103-2]$ touch -a -r f1a f1
[ian@echidna lpi103-2]$ ls -lu f1*
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 f1a
|
For more complete information on the many allowable time and date
specifications, see the man or info pages for the
touch and date
commands.
Now that we've covered the file and directory topic with the big recursive
hammer that hits everything, and the globbing hammer that hits more
selectively, let's look at the find command,
which can be more like a surgeon's knife. The
find command is used to find files in one or
more directory trees, based on criteria such as name, timestamp, or size.
Again, we will use the lpi103-2 directory.
The find command will search for files or
directories using all or part of the name, or by other search criteria,
such as size, type, file owner, creation date, or last access date. The
most basic find is a search by name or part of a name. Listing 19 shows an
example from our lpi103-2 directory where we first search for all files
that have either a '1' or a 'k' in their name, then perform some path
searches that are explained in the notes below.
Listing 19. Finding files by name
[ian@echidna lpi103-2]$ find . -name "*[1k]*"
./f1a
./f1
./text10
./backup
./backup/text1.bkp.1
./backup/text1.bkp.2
./text1
[ian@echidna lpi103-2]$ find . -ipath "*ACK*1"
./backup/text1.bkp.1
[ian@echidna lpi103-2]$ find . -ipath "*ACK*/*1"
[ |
Notes:
- The patterns that you may use are shell wildcard patterns like those we saw earlier under Wildcards and globbing.
- You can use
-pathinstead of-nameto match full paths instead of just base file names. In this case, the pattern may span path components, unlike ordinary wildcard matches, which match only a single part of a path. - If you want case-insensitive search as shown in the use of
ipathabove, precede thefindoptions that search on a string or pattern with an 'i'. - If you want to find a file or directory whose name begins with a dot, such as .bashrc or the current directory (.), then you must specify a leading dot as part of the pattern. Otherwise, name searches will ignore these files or directories.
In the first example above, we found both files and a directory (./backup).
Use the -type parameter along with one-letter
type to restrict the search. Use 'f' for regular files, 'd' for
directories, and 'l' for symbolic links. See the man page for
find for other possible types. Listing 20 shows
the result of searching for directories
(-type d) alone and with a file name (*, or
everything, in this case).
Listing 20. Finding files by type
[ian@echidna lpi103-2]$ find . -type d
.
./backup
[ian@echidna lpi103-2]$ find . -type d -name "*"
.
./backup
|
Note that the -type d specification without any
form of name specification displays directories that have a leading dot in
their names (only the current directory in this case), as does the
wildcard "*".
We can also search by file size, either for a specific size (n) or for
files that are either larger (+n) or smaller than a given value (-n). By
using both upper and lower size bounds, we can find files whose size is
within a given range. By default, the -size
option of find assumes a unit of 'b' for
512-byte blocks. Among other choices, specify 'c' for bytes, or 'k' for
kilobytes. In Listing 21, we first find all files with size 0, and then
all with size of either 24 or 25 bytes. Note that specifying
-empty instead of
-size 0 also finds empty files.
Listing 21. Finding files by size
[ian@echidna lpi103-2]$ find . -size 0
./f1a
./f6
./f8
./f2
./f3
./f7
./f4
./f9
./f5
[ian@echidna lpi103-2]$ find . -size -26c -size +23c -print
./text2
./text5
./backup/text1.bkp.1
./backup/text1.bkp.2
./text1
|
The second example in Listing 21 introduces the
-print option, which is an example of an
action that may be taken on the results returned by the search.
In the bash shell. This is the default action if no action is specified.
On some systems and some shells, an action is required; otherwise, there
is no output.
Other actions include -ls, which prints file
information equivalent to that from the
ls -lids command, and
-exec, which executes a command for each file.
The -exec must be terminated by a semicolon,
which must be escaped to avoid the shell interpreting it first. Also
specify {} wherever you want the returned file used in the command.
Remember that curly braces also have meaning to the shell and must be
escaped (or quoted). Listing 22 shows how the
-ls and the -exec
options can be used to list file information. Notice that the second form
does not list the inode information.
Listing 22. Finding and acting on files
[ian@echidna lpi103-2]$ find . -size -26c -size +23c -ls
933893 4 -rw-rw-r-- 1 ian ian 25 Aug 11 14:27 ./text2
933900 4 -rw-rw-r-- 1 ian ian 24 Aug 11 18:47 ./text5
934193 4 -rw-rw-r-- 1 ian ian 24 Aug 12 15:36 ./backup/text1.bkp.1
934195 4 -rw-rw-r-- 1 ian ian 24 Aug 12 15:36 ./backup/text1.bkp.2
933892 4 -rw-rw-r-- 1 ian ian 24 Aug 11 14:02 ./text1
[ian@echidna lpi103-2]$ find . -size -26c -size +23c -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 25 2009-08-11 14:27 ./text2
-rw-rw-r--. 1 ian ian 24 2009-08-11 18:47 ./text5
-rw-rw-r--. 1 ian ian 24 2009-08-12 15:36 ./backup/text1.bkp.1
-rw-rw-r--. 1 ian ian 24 2009-08-12 15:36 ./backup/text1.bkp.2
-rw-rw-r--. 1 ian ian 24 2009-08-11 14:02 ./text1
|
The -exec option can be used for as many
purposes as your imagination can dream up. For example:
find . -empty -exec rm '{}' \;
removes all the empty files in a directory tree, while
find . -name "*.htm" -exec mv '{}' '{}l' \;
renames all .htm files to .html files.
For our final examples of find, we use the
timestamps described with the touch command to
locate files having particular timestamps. Listing 23 shows three
examples:
- When used with
-mtime -2, thefindcommand finds all files modified within the last two days. A day in this case is a 24-hour period relative to the current date and time. Note that you would use-atimeif you wanted to find files based on access time rather than modification time. - Adding the
-daystartoption means that we want to consider days as calendar days, starting at midnight. Now the f3 file is excluded from the list. - Finally, we show how to use a time range in minutes rather than days to find files modified between one hour (60 minutes) and 10 hours (600 minutes) ago.
Listing 23. Finding files by timestamp
[ian@echidna lpi103-2]$ date
Sat Aug 15 00:27:36 EDT 2009
[ian@echidna lpi103-2]$ find . -mtime -2 -type f -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 ./f1a
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 ./f1
-rw-rw-r--. 1 ian ian 0 2009-08-13 06:00 ./f6
-rw-rw-r--. 1 ian ian 0 2009-08-15 02:00 ./f8
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 ./f2
-rw-rw-r--. 1 ian ian 58 2009-08-14 17:30 ./text10
-rw-rw-r--. 1 ian ian 0 2009-08-14 11:00 ./f4
-rw-rw-r--. 1 ian ian 0 2009-11-05 00:00 ./f9
[ian@echidna lpi103-2]$ find . -daystart -mtime -2 -type f -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 ./f1a
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 ./f1
-rw-rw-r--. 1 ian ian 0 2009-08-15 02:00 ./f8
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 ./f2
-rw-rw-r--. 1 ian ian 58 2009-08-14 17:30 ./text10
-rw-rw-r--. 1 ian ian 0 2009-08-14 11:00 ./f4
-rw-rw-r--. 1 ian ian 0 2009-11-05 00:00 ./f9
[ian@echidna lpi103-2]$ find . -mmin -600 -mmin +60 -type f -exec ls -l '{}' \;
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:25 ./f1a
-rw-rw-r--. 1 ian ian 4 2009-08-14 18:25 ./f1
-rw-rw-r--. 1 ian ian 0 2009-08-14 18:27 ./f2
-rw-rw-r--. 1 ian ian 58 2009-08-14 17:30 ./text10
|
The man pages for the find command can help you
learn the extensive range of options that we cannot cover in this brief
introduction.
File names often have a suffix such as gif, jpeg, or html that give a hint
of what the file might contain. Linux does not require such suffixes and
generally does not use them to identify a file type. Knowing what type of
file you are dealing with helps you know what program to use to display or
manipulate it. The file command tells you
something about the type of data in one or more files. Listing 24 shows
some examples of using the file command.
Listing 24. Identifying file contents
[ian@echidna lpi103-2]$ file backup text1 f2 ../p-ishields.jpg /bin/echo
backup: directory
text1: ASCII text
f2: empty
../p-ishields.jpg: JPEG image data, JFIF standard 1.02
/bin/echo: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.18, stripped
|
The file command attempts to classify each file
using three types of test. Filesystem tests use the results of the
stat command to determine whether a file is
empty or a directory, for example . So-called magic tests check a
file for specific contents that identify it. These signatures are also
known as magic numbers. Finally, language tests look at the content
of text files to attempt to determine if a file is an XML file, C or C++
language source, a troff file, or some other file that is considered
source for some kind of language processor. The first type that is found
is reported unless the -k or
--keep-goingoption is specified.
The file command has many options that you may
learn about using the man pages. Listing 25 shows how to use the
-i (or --mime)
option to display the file type as a MIME string instead of the normal
human-readable output.
Listing 25. Identifying file contents as MIME
[ian@echidna lpi103-2]$ file -i backup text1 f2 ../p-ishields.jpg /bin/echo
backup: application/x-directory; charset=binary
text1: text/plain; charset=us-ascii
f2: application/x-empty; charset=binary
../p-ishields.jpg: image/jpeg; charset=binary
/bin/echo: application/x-executable; charset=binary
|
The magic number files are also managed by the
file command. Again, see the man pages for more
information.
Note: The identify command, which is part
of the ImageMagick package, is an additional tool that provides more
detail when identifying image file types.
When you are backing up, archiving, or transmitting files, it is common to
compress the files. In a Linux environment, two popular compression
programs are gzip and
bzip2. The gzip
command uses the Lempel-Ziv algorithm, while
bzip2 uses the Burrows-Wheeler block sorting
algorithm.
Compression generally works well on text files. Many image formats already
compress the data, so compression may not work well on these or other
binary files. To illustrate compression on a reasonably large text file,
let's copy /etc/services to the directory we have been using and compress
it using gzip as shown in Listing 26. We use the
-p option of cp to
preserve the timestamp of /etc/services. Note that the compressed file has
the same timestamp and has a .gz suffix.
Listing 26. Compressing with gzip
[ian@echidna lpi103-2]$ cp -p /etc/services .
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna lpi103-2]$ gzip services
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 124460 2009-04-10 04:42 services.gz
|
You decompress a gzipped file using the -d
option of gzip or, more commonly, using the
gunzip command. Listing 27 shows the first of
these choices. Note that the uncompressed file now has the original file
name and timestamp.
Listing 27. Decompressing with gzip
[ian@echidna lpi103-2]$ gzip -d services.gz
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
|
The bzip2 command operates in a similar manner
to gzip as shown in Listing 28.
Listing 28. Compressing with bzip2
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna lpi103-2]$ bzip2 services
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 113444 2009-04-10 04:42 services.bz2
[ian@echidna lpi103-2]$ bunzip2 services.bz2
[ian@echidna lpi103-2]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
|
Differences between gzip and bzip2
By design, many of the bzip2 options are the
same as those of gzip, but the two commands do
not have identical options. You may have noted that in both our examples,
the uncompressed file had the same name and timestamp as the original.
However, renaming or touching the compressed file can change this
behavior. The gzip command has
-N or --name option
to force the name and timestamp to be preserved, but
bzip2 does not. The
gzip command also has a
-l option to display information about the
compressed file, including the name that will be used when it is
decompressed. Listing 29 illustrates some of these differences between the
two commands.
Listing 29. Some differences between gzip and bzip2
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna ~]$ gzip -N services
[ian@echidna ~]$ touch services.gz
[ian@echidna ~]$ mv services.gz services-x.gz
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 124460 2009-09-23 14:08 services-x.gz
[ian@echidna ~]$ gzip -l services-x.gz
compressed uncompressed ratio uncompressed_name
124460 630983 80.3% services-x
[ian@echidna ~]$ gzip -lN services-x.gz
compressed uncompressed ratio uncompressed_name
124460 630983 80.3% services
[ian@echidna ~]$ gunzip -N services-x.gz
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 630983 2009-04-10 04:42 services
[ian@echidna ~]$
[ian@echidna ~]$ bzip2 services
[ian@echidna ~]$ mv services.bz2 services-x.bz2
[ian@echidna ~]$ touch services-x.bz2
[ian@echidna ~]$ ls -l serv*
-rw-r--r--. 1 ian ian 113444 2009-09-23 14:10 services-x.bz2
[ian@echidna ~]$ bunzip2 services-x.bz2
[ian@echidna ~]$ ls -l serv*
-rw-rw-r--. 1 ian ian 630983 2009-09-23 14:10 services-x
[ian@echidna ~]$ rm services-x # Don't need this any more
|
Both gzip and bzip2
will accept input from stdin. Both support the
-c option to direct output to stdout.
There are two other commands associated with
bzip2
- The
bzcatcommand decompresses files to stdout and is equivalent tobzip2 -dc. - The
bzip2recovercommand attempts to recover data from damaged bzip2 files.
The man pages will help you learn more about the other options of
gzip and bzip2.
Two older programs, compress and
uncompress, are still frequently found on Linux
and UNIX systems.
In addition, the zip and
unzip commands from the Info-ZIP project are
implemented for Linux. These provide cross-platform compression functions
that are available on a wide range of hardware and operating systems. Be
aware that not all operating systems support the same file attributes or
filesystem capabilities. If you download a zipped product file and unzip
it on a Windows system, then transfer the resulting files to a CD or DVD
for installation on Linux, you may experience problems installing because,
for example, the Windows system does not support the symbolic links that
were part of the original uncompressed file set.
For more information on these or other compression programs, see their respective man pages .
The tar, cpio, and
dd commands are commonly used for backing up
groups of files or even whole partitions, either for archiving or for
transmission to another user or site. Exam 201, which is part of the
LPIC-2 certification, focuses on backup considerations in greater
detail.
There are three general approaches to backup:
- A differential or cumulative backup is a backup of all things that have changed since the last full backup. Recovery requires the last full backup plus the latest differential backup.
- An incremental backup is a backup of only those changes since the last incremental backup. Recovery requires the last full backup plus all of the incremental backups (in order) since the last full backup.
- A full backup is a complete backup, usually of a whole
filesystem, directory, or group of related files. This takes the
longest time to create, so it is usually used with one of the other
two approaches.
These commands, along with the other commands you have learned about in this article, give you the tools to perform any of these backup tasks.
The tar (originally from Tape ARchive)
creates an archive file, or tarfile or tarball, from a set
of input files or directories; it also restores files from such an
archive. If a directory is given as input to
tar, all files and subdirectories are
automatically included, which makes tar very
convenient for archiving subtrees of your directory structure.
Output can be to a file, a device such as tape or diskette, or stdout. The
output location is specified with the -f
option. Other common options are -c to create
an archive, -x to extract an archive,
-v for verbose output, which lists the files
being processed, -z to use gzip compression,
and -j to use bzip2 compression. Most
tar options have a short form using a single
hyphen and a long form using a pair of hyphens. The short forms are
illustrated here. See the man pages for the long form and for additional
options.
Listing 30 shows how to create a backup of our lpi103-2 directory using
tar.
Listing 30. Backing up our lpi103-2 directory using tar
[ian@echidna lpi103-2]$ tar -cvf ../lpitar1.tar .
./
./text3
./yab
...
./f5
|
Usually you will want to compress archive files to save space or reduce
transmission time. The GNU version of the tar
command allows you to do this with a single option —
-z for compression using
gzip and -b for
compression using bzip2. Listing 31 illustrates
the use of the -z option and the difference in
size between the two archives.
Listing 31. Compressing the tar archive with gzip
[ian@echidna lpi103-2]$ tar -zcvf ../lpitar2.tar ~/lpi103-2/
tar: Removing leading `/' from member names
/home/ian/lpi103-2/
/home/ian/lpi103-2/text3
/home/ian/lpi103-2/yab
...
/home/ian/lpi103-2/f5
[ian@echidna lpi103-2]$ ls -l ../lpitar*
-rw-rw-r--. 1 ian ian 30720 2009-09-24 15:38 ../lpitar1.tar
-rw-rw-r--. 1 ian ian 881 2009-09-24 15:39 ../lpitar2.tar
|
Listing 31 also shows another important feature of
tar. We used an absolute directory path, and
the first line of output tells you that tar is
removing the leading slash (/) from member names. This allows files to be
restored to some other location for verification and can be particularly
important if you are trying to restore system files. If you really want to
store absolute names, use the -p option. It is
also a good idea to avoid mixing absolute path names with relative path
names when creating an archive, since all will be relative when restoring
from the archive.
The tar command can append additional files to
an archive using the -r or
--append option. This may cause multiple copies
of a file in the archive. In such a case, the last one will be
restored during a restore operation. You can use the
--occurrence option to select a specific file
among multiples. If the archive is on a regular filesystem instead of
tape, you may use the -u or
--update option to update an archive. This
works like appending to an archive, except that the time stamps of the
files in the archive are compared with those on the filesystem, and only
files that have been modified since the archived version are appended. As
mentioned, this does not work for tape archives.
The tar command can also compare archives with
the current filesystem and restore files from archives. Use the
-d, --compare, or
--diff option to perform comparisons. The
output will show files whose contents differ, as well as files whose time
stamps differ. Normally, only files that differ, if any, are listed. Use
the -v option discussed earlier for verbose
output . The -C or
--directory option tells
tar to perform an operation starting from the
specified directory rather than the current directory.
Listing 32 shows some examples. We use touch to
modify the timestamp of the f1 file, then illustrate comparison operations
of tar before restoring f1 from one of our
archives. We use a variety of option forms for illustration.
Listing 32. Comparing and restoring using tar
[ian@echidna lpi103-2]$ touch f1
[ian@echidna lpi103-2]$ tar --diff --file ../lpitar1.tar .
./f1: Mod time differs
[ian@echidna lpi103-2]$ tar -df ../lpitar2.tar -C /
home/ian/lpi103-2/f1: Mod time differs
[ian@echidna lpi103-2]$ tar -xvf ../lpitar1.tar ./f1 # See below
./f1
[ian@echidna lpi103-2]$ tar --compare -f ../lpitar2.tar --directory /
|
The files or directories you specify for restoration must match the name in
the archive. Attempting to restore just f1 rather than ./f1 in this case
would not work. You can use globbing, but you need to be careful to avoid
restoring more or less than you want. You can use the
--list or -t option
to list archive contents if you are unsure what is in an archive. Listing
33 shows an example of a wildcard specification that would have restored
more files than just ./f1.
Listing 33. Listing archive contents with tar
[ian@echidna lpi103-2]$ tar -tf ../lpitar1.tar "*f1*"
./f1a
./f1
|
You can use the find command to select the files
for archiving and then pipe the result to tar. We'll discuss this
technique as part of the discussion of cpio,
but the same method works for tar.
As with the other commands you have studied here, there are many options that are not covered in this brief introduction. See the man or info pages for more details.
The cpio command operates in copy-out
mode to create an archive, copy-in mode to restore an archive, or
copy-pass mode to copy a set of files from one location to
another. You use the -o or
--create option for copy-out mode, the
-i or --extract
option for copy-in mode, and the -p or
--pass-through option for copy-pass mode. Input
is a list of files provided on stdin. Output is either to stdout or to a
device or file specified with the -f or
--file option.
Listing 34 shows how to generate a list of files using the
find command and then pipe the list to
cpio. Note the use of the
-print0 option on
find to generate null-terminate strings for
file names, and the corresponding --null option
on cpio to read this format. This will
correctly handle file names that have embedded blank or newline
characters. The -depth option tells
find to list directory entries before the
directory name. In this example, we simply create two archives of our
lpi103-2 directory, one with relative names and one with absolute names.
We do not use the many capabilities of find to
restrict the selected files, such as finding only the files modified this
week.
Listing 34. Backing up a directory using cpio
[ian@echidna lpi103-2]$ find . -depth -print0 | cpio --null -o > ../lpicpio.1
3 blocks
[ian@echidna lpi103-2]$ find ~/lpi103-2/ -depth -print0 | cpio --null -o > ../lpicpio.2
4 blocks
|
If you'd like to see the files listed as they are archived, add the
-v option to
cpio.
The cpio command in copy-in mode (option
-i or --extract) can
list the contents of an archive or restore selected files. When you list
the files, specifying the --absolute-filenames
option reduces the number of extraneous messages that some older versions
of cpio will otherwise issue as they strip any
leading / characters from each path that has one. This option is quietly
ignored on many current implementations. Output from selectively listing
our previous archives is shown in Listing 35.
Listing 35. Listing and restoring selected files using cpio
[ian@echidna lpi103-2]$ cpio -i --list "*backup*" < ../lpicpio.1
backup
backup/text1.bkp.1
backup/text1.bkp.2
3 blocks
[ian@echidna lpi103-2]$ cpio -i --list absolute-filenames "*text1*" < ../lpicpio.2
/home/ian/lpi103-2/text10
/home/ian/lpi103-2/backup/text1.bkp.1
/home/ian/lpi103-2/backup/text1.bkp.2
/home/ian/lpi103-2/text1
4 blocks
|
Listing 36 shows how to restore all the files with "text1" in their path
into a temporary subdirectory. Some of these are in subdirectories. Unlike
tar, you will need to specify the
-d or
--make-directories option explicitly if your
directory tree does not exist. Furthermore,
cpio will not replace any newer files on the
filesystem with archive copies unless you specify the
-u or
--unconditional option.
Listing 36. Restoring selected files using cpio
[ian@echidna lpi103-2]$ mkdir temp
[ian@echidna lpi103-2]$ cd temp
[ian@echidna temp]$ cpio -idv "*f1*" "*.bkp.1" < ../../lpicpio.1
f1a
f1
backup/text1.bkp.1
3 blocks
[ian@echidna temp]$ cpio -idv "*.bkp.1" < ../../lpicpio.1
cpio: backup/text1.bkp.1 not created: newer or same age version exists
backup/text1.bkp.1
3 blocks
[ian@echidna temp]$ cpio -id --no-absolute-filenames "*text1*" < ../../lpicpio.2
cpio: Removing leading `/' from member names
4 blocks
./home/ian/lpi103-2/backup/text1.bkp.1
./home/ian/lpi103-2/backup/text1.bkp.2
./home/ian/lpi103-2/text1
./backup/text1.bkp.1
[ian@echidna temp]$ cd ..
[ian@echidna lpi103-2]$ rm -rf temp # You may remove these after you have finished
|
For details on other options, see the man page.
In its simplest form, the dd command copies an
input file to an output file. You have already seen the
cp command, so you may wonder why have another
command to copy files. The dd command can do a
couple of things that regular cp cannot. In
particular, it can perform conversions on the file, such as converting
lowercase to uppercase or ASCII to EBCDIC. It can also reblock a file,
which may be desirable when transferring it to tape. It can skip or
include only selected blocks of a file. And finally, it can read and write
to raw devices, such as /dev/sda, which allows you to create or restore a
file that is a whole partition image. Writing to devices usually requires
root authority.
We will start with a simple example of converting a file to upper case
using the conv option as shown in Listing 37.
We use the if option to specify the input file
rather than using the default of stdin. A similar
of option is available to override the default
output to stdout. For purposes of illustration, we have specified
different input and output block sizes using the
ibs and obs options.
For large files it can be handy to use larger block sizes to speed up
operations when transferring disk to disk. Otherwise, block sizes are
mostly used with magnetic tapes. Note the three status lines at the end of
the listing showing how many complete and partial blocks were read and
written and the total amount of data transferred.
Listing 37. Converting text to upper case using dd
[ian@echidna lpi103-2]$ cat text6
1 apple
2 pear
3 banana
9 plum
3 banana
10 apple
1 apple
2 pear
3 banana
9 plum
3 banana
10 apple
[ian@echidna lpi103-2]$ dd if=text6 conv=ucase ibs=20 obs=30
1 APPLE
2 PEAR
3 BANANA
9 PLUM
3 BANANA
10 APPLE
1 APPLE
2 PEAR
3 BANANA
9 PLUM
3 BANANA
10 APPLE
4+1 records in
3+1 records out
98 bytes (98 B) copied, 0.00210768 s, 46.5 kB/s
|
Either file may be a raw device. This will usually be the case for
magnetic tape, but a whole disk partition, such as /dev/hda1 or /dev/sda2,
can be backed up to a file or tape. Ideally, the filesystem on the device
should be unmounted, or at least mounted read only, to ensure that data
does not change during the backup. Listing 39 shows an example where the
input file is a raw device, dev/sda3, and the output file is a file,
backup-1, in the root user's home directory. To dump the file to tape or
floppy disk, you would specify something like
of=/dev/fd0 or
of=/dev/st0.
Listing 38. Backing up a partition using dd
[root@echidna ~]# dd if=/dev/sda2 of=backup-1
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 24.471 s, 32.6 MB/s
|
Note that 797,852,160 bytes of data were copied and the output file is
indeed that large, even though only about 3% of this particular partition
is actually used. Unless you are copying to a tape with hardware
compression, you will probably want to compress the data. Listing 39 shows
one way to accomplish this, along with the output of
ls and df commands,
which show you the file sizes and the usage percentage of the filesystem
on /dev/sda3.
Listing 39. Backing up with compression using dd
[root@echidna ~]# dd if=/dev/sda2 |gzip >backup-2
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 23.4617 s, 34.0 MB/s
[root@echidna ~]# ls -l backup-[12]
-rw-r--r--. 1 root root 797852160 2009-09-25 17:13 backup-1
-rw-r--r--. 1 root root 995223 2009-09-25 17:14 backup-2
[root@echidna ~]# df -h /dev/sda2
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 755M 18M 700M 3% /grubfile
|
The gzip compression reduced the file size to about 20% of the uncompressed size. However, unused blocks may contain arbitrary data, so even the compressed backup may be much larger than the total data on the partition.
If you divide the total bytes copied by the number of records processed,
you will see that dd is writing 512-byte blocks
of data. When copying to a raw output device such as tape, this can result
in a very inefficient operation. As we mentioned above, specify the
obs option to change the output size or the
ibs option to specify the input block size. You
can also specify just bs to set both input and
output block sizes to a common value. When using tape, remember to use the
same block size for reading the tape as you used for writing it.
If you need multiple tapes or other removable storage to store your backup,
you will need to break it into smaller pieces using a utility such as
split. If you need to skip blocks such as disk
or tape labels, you can do so with dd. See the
man page for examples.
The dd command is not filesystem aware, so you
will need to restore a dump of a partition to find out what is on it.
Listing 40 shows how to restore the partition that was dumped in Listing
39 to a partition, /dev/sdc7, that was specially created on a removable
USB drive just for this purpose.
Listing 40. Restoring a partition using dd
[root@echidna ~]# gunzip backup-2 -c | dd of=/dev/sdc7
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 30.624 s, 26.1 MB/s
|
You may be interested to know that some CD- and DVD-burning applications
use the dd command under the covers to do the
actual device writing. If the utility you use provides a log of commands
actually used, you may find it instructive to look at the log now that you
know a little more about dd. Indeed, if you
burn an ISO image to a CD or DVD disc, one way to verify that there were
no errors is to use dd to read the disc back
and pipe the result through the cmp utility.
Listing 41 illustrates the general technique using the backup file that we
created in this article rather than an ISO image. Note that we calculate
the number of blocks to read using the file size of the image.
Listing 41. Comparing an image with a filesystem.
[root@echidna ~]# ls -l backup-1
-rw-r--r--. 1 root root 797852160 2009-09-25 17:13 backup-1
[root@echidna ~]# echo $(( 797852160 / 512 )) # calculate number of 512 byte blocks
1558305
[root@echidna ~]# dd if=/dev/sdc7 bs=512 count=1558305 | cmp - backup-1
1558305+0 records in
1558305+0 records out
797852160 bytes (798 MB) copied, 26.7942 s, 29.8 MB/s
|
Learn
- Use the
developerWorks roadmap for LPIC-1
to find the developerWorks articles to help you study for LPIC-1
certification based on the April 2009 objectives.
- At the
LPIC
Program
site, find detailed objectives, task lists, and sample questions for the
three levels of the Linux Professional Institute's Linux system
administration certification. In particular, see their April 2009
objectives for
LPI exam
101
and
LPI exam 102.
Always refer to the LPIC Program site for the latest
objectives.
- Review the entire
LPI
exam prep series
on developerWorks to learn Linux fundamentals and prepare for system
administrator certification based on earlier LPI exam objectives prior to
April 2009.
- In
"Basic
tasks for new Linux developers"
(developerWorks, March 2005), learn how to open a terminal window or shell
prompt and much more.
- The
Linux Documentation Project has a
variety of useful documents, especially its HOWTOs.
- In the
developerWorks Linux zone,
find more resources for Linux developers, and scan our
most popular articles and
tutorials.
- See all
Linux tips
and
Linux tutorials
on developerWorks.
- Stay current with
developerWorks technical events and Webcasts.
Get products and technologies
- With
IBM trial software,
available for download directly from developerWorks, build your next
development project on Linux.
Discuss
- Participate in the discussion forum.
- Get involved in the
My developerWorks community;
with your personal profile and custom home page, you can tailor
developerWorks to your interests and interact with other developerWorks
users.

Ian Shields works on a multitude of Linux projects for the developerWorks Linux zone. He is a Senior Programmer at IBM at the Research Triangle Park, NC. He joined IBM in Canberra, Australia, as a Systems Engineer in 1973, and has since worked on communications systems and pervasive computing in Montreal, Canada, and RTP, NC. He has several patents and has published several papers. His undergraduate degree is in pure mathematics and philosophy from the Australian National University. He has an M.S. and Ph.D. in computer science from North Carolina State University. Learn more about Ian in in Ian's profile on My developerWorks.




