Learn Linux, 101: File and directory management

Getting comfortable with Linux files and directories

You've probably heard that everything in Linux® is a file, so start on the right path with a solid grounding in file and directory management: finding, listing, moving, copying, and archiving. Use the material in this tutorial to study for the Linux Professional Institute LPIC-1: Linux Server Professional Certification exam 101, or just to learn for fun.

Share:

Ian Shields, Linux Author, Freelance

Ian ShieldsIan Shields is a freelance Linux writer. He retired from IBM at the Research Triangle Park, NC. Ian joined IBM in Canberra, Australia, as a systems engineer in 1973, and has worked in Montreal, Canada, and RTP, NC in both systems engineering and software development. He has been using, developing on, and writing about Linux since the late 1990s. His undergraduate degree is in pure mathematics and philosophy from the Australian National University. He has an M.S. and Ph.D. in computer science from North Carolina State University. He enjoys orienteering and likes to travel.


developerWorks Contributing author
        level

21 March 2016 (First published 15 June 2010)

Overview

This tutorial grounds you in the basic Linux commands for manipulating files and directories. Learn to:

  • List directory contents
  • Copy, move, or remove files and directories
  • Manipulate multiple files and directories recursively
  • Use wildcard patterns for manipulating files
  • Use the find command to locate and act on files based on type, size, or time
  • Compress and decompress files using gzip, bzip2, and xz
  • Archive files using tar, cpio, and dd

Linux files and directories

All files on Linux and UNIX® systems are accessed as part of a single large tree-structured filesystem that is rooted at /. You can add more branches to this tree by mounting them and remove them by unmounting them. Mounting and unmounting is covered in the tutorial on Mounting and unmounting of filesystems.

In this tutorial, we practice the commands using the files created in the tutorial "Learn Linux 101: Text streams and filters." If you followed along in that tutorial, you created a directory, lpi103-2, in your home directory. If you didn't, then you can use another directory on your system to practice the commands discussed in this tutorial.

About this series

This series of tutorials helps you learn Linux system administration tasks. You can also use the material in these tutorials to prepare for the Linux Professional Institute's LPIC-1: Linux Server Professional Certification exams.

See "Learn Linux, 101: A roadmap for LPIC-1" for a description of and link to each tutorial in this series. The roadmap is in progress and reflects the version 4.0 objectives of the LPIC-1 exams as updated April 15th, 2015. As tutorials are completed, they will be added to the roadmap.

This tutorial helps you prepare for Objective 103.3 in Topic 103 of the Linux Server Professional (LPIC-1) exam 101. The objective has a weight of 3.

Prerequisites

To get the most from the tutorials in this series, you should have a basic knowledge of Linux and a working Linux system on which you can practice the commands covered in this tutorial. Sometimes different versions of a program format output differently, so your results might not always look exactly like the listings and figures shown here.


Listing directories

File and directory names are either absolute, meaning they begin with a /, or they are relative to the current working directory, meaning they do not begin with a /. The absolute path to a file or directory consists of a / followed by series of zero or more directory names, each followed by another / and then a final name.

Unless otherwise noted, the examples in this tutorial use Ubuntu 14.04.2 LTS, with a 3.16 kernel. Your results on other systems may differ.

Listing directory entries

Given a file or directory name that is relative to the current working directory, simply concatenate the absolute name of the working directory, a /, and the relative name. For example, the directory, lpi103-2, that we created in the earlier tutorial was created in my home directory, /home/ian, so its full, or absolute, path is /home/ian/lpi103-2.

You can display the name of the current working directory with the pwd command. It is also usually available in the PWD environment variable. Listing 1 shows the use of the pwd command, and three different ways to use the ls command to list the files in this directory.

Listing 1. Listing directory entries
ian@Z61t-u14:~/lpi103-2$ echo "$PWD"
/home/ian/lpi103-2
ian@Z61t-u14:~/lpi103-2$ ls
sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab
ian@Z61t-u14:~/lpi103-2$ ls "$PWD"
sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab
ian@Z61t-u14:~/lpi103-2$ ls /home/ian/lpi103-2
sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab

As you can see, you can give a relative or absolute directory name as a parameter to the ls command, and it will list the contents of that directory.

Listing details

On a storage device, a file or directory is contained in a collection of blocks. Information about a file is contained in an inode, which records information such as the owner, when the file was last accessed, how large it is, whether it is a directory or not, and who can read from or write to it. The inode number is also known as the file serial number and is unique within a particular filesystem. You can use the -l (or --format=long) option to display some of the information stored in the inode.

By default, the ls command does not list special files, those whose names start with a dot (.). Every directory other than the root directory has at least two special entries: the directory itself (.) and the parent directory (..). The root directory does not have a parent directory.

Listing 2 uses the -l and -a options to display a long format listing of all files including the . and .. directory entries.

Listing 2. Displaying a long directory listing
ian@Z61t-u14:~/lpi103-2$ ls -al
total 52
drwxrwxr-x  2 ian ian 4096 Jun  8 17:09 .
drwxr-xr-x 15 ian ian 4096 Jun  8 13:26 ..
-rw-rw-r--  1 ian ian    8 Jun  8 17:02 sedtab
-rw-rw-r--  1 ian ian   24 Jun  8 13:26 text1
-rw-rw-r--  1 ian ian   25 Jun  8 13:36 text2
-rw-rw-r--  1 ian ian   63 Jun  8 16:19 text3
-rw-rw-r--  1 ian ian   26 Jun  8 16:19 text4
-rw-rw-r--  1 ian ian   24 Jun  8 16:42 text5
-rw-rw-r--  1 ian ian   98 Jun  8 17:09 text6
-rw-rw-r--  1 ian ian   15 Jun  8 15:48 xaa
-rw-rw-r--  1 ian ian    9 Jun  8 15:48 xab
-rw-rw-r--  1 ian ian   17 Jun  8 15:48 yaa
-rw-rw-r--  1 ian ian    8 Jun  8 15:48 yab

In Listing 2, the first line shows the total number of disk blocks (52) used by the listed files. The remaining lines tell you about the directory entries.

  • The first field (drwxrwxr-x or -rw-rw-r-- in this case) tells us whether the file is a directory (d) or a regular file (-). You may also see symbolic links (l) or other values for some special files (such as files in the /dev filesystem). You learn more about symbolic links in the tutorial Create and change hard and symbolic links (see the series roadmap). The type is followed by three sets of permissions (such as rwx or r--) for the owner, the members of the owner's group, and everyone. The three values, respectively, indicate whether the user, group, or everyone has read (r), write (w), or execute (x) permission. Other uses such as setuid are covered in the tutorial Manage file permissions and ownership.
  • The next field is a number that tells you the number of hard links to the file. I said that the inode contains information about the file. The file's directory entry contains a hard link (or pointer) to the inode for the file, so every entry listed should have at least one hard link. Directory entries have an additional one for the . entry and one for each subdirectory entry. So we can see from Listing 2 that my home directory, represented by .., has quite a few subdirectories, as it has 15 hard links.
  • The next two fields are the file's owner and the owner's primary group. Some systems, such as Red Hat or Fedora systems, default to providing a separate group for each user. On other systems, all users can be in one or perhaps a few groups.
  • The next field contains the length of the file in bytes.
  • The penultimate field contains the time stamp of the last modification. The time stamp format depends on your locale and the date itself. In my locale, time stamps for dates from the beginning of the current year to the present show a three character month abbreviation, the day of the month and a time in HH:MM format. Older files or files with dates in the future have the year substituted for the HH:MM component. You can see more examples under Touching files below.
  • And the final field contains the name of the file or directory.

The -i option of the ls command will display the inode numbers for you. You will see inodes again later in this tutorial and also in the tutorial Create and change hard and symbolic links (see the series roadmap).

Multiple files

You can also specify multiple parameters to the ls command, where each name is either that of a file or directory. For directory names, the ls command lists the contents of the directory rather than information about the directory itself. In our example, suppose we wanted information about the lpi103-2 directory entry itself as it is listed in the parent directory. The command ls -l ../lpi103-2 would give us a listing like the previous example. Listing 3 shows how to add the -d option to list information about directory entries rather than the contents of directories and also how to list entries for multiple files or directories.

Listing 3. Using ls -d
ian@Z61t-u14:~/lpi103-2$ ls -ld ../lpi103-2 sedtab xaa
drwxrwxr-x 2 ian ian 4096 Jun  9 13:01 ../lpi103-2
-rw-rw-r-- 1 ian ian    8 Jun  8 17:02 sedtab
-rw-rw-r-- 1 ian ian   15 Jun  8 15:48 xaa

Note that the modification time for lpi103-2 is different from that in the previous listing. Also, as in the previous listing, it is different from the time stamps of any of the files in the directory. Is this what you would expect? Not normally. However, in developing this tutorial, I created some extra examples and then deleted them, so the directory time stamps reflect that fact. You learn more about file times later under Handling multiple files and directories.

Sorting the output

By default, ls lists files alphabetically. There are a number of options for sorting the output. For example, ls -t sorts by modification time (newest to oldest) while ls -lS produces a long listing sorted by size (largest to smallest). Adding -r reverses the sort order. For example, use ls -lrt to produce a long listing sorted from oldest to newest. Consult the man page for other ways you can list files and directories.


Copying, moving, and deleting files

You have now learned some ways to create files, but suppose you want to make copies of files, rename files, move them around the filesystem hierarchy, or even delete them. You use three short commands for these purposes.

cp
is used to make a copy of one or more files or directories. You must give one (or more) source names and one target name. Source or target names can include a path specification. If the target is an existing directory, then all sources are copied into the target. If the target is a directory that does not exist, then the (single) source must also be a directory and a copy of the source directory and its contents is made with the target name as the new name. If the target is a file, then the (single) source must also be a file and a copy of the source file is made with the target name as the new name, replacing any existing file of the same name. Note that there is no default assumption of the target being the current directory as in DOS and Windows® operating systems.
mv
is used to move or rename one or more files or directories. In general, the names you can use follow the same rules as for copying with cp; you can rename a single file or move a set of files into a new directory. Because the name is only a directory entry that links to an inode, it should be no surprise that the inode number does not change unless the file is moved to another filesystem, in which case moving it behaves more like a copy followed by deleting the original.
rm
is used to remove one or more files. I show how to remove directories shortly.

Where's the rename command?

If you are used to a DOS or Windows system, you might find it strange to use mv to rename a file. Linux does have a rename command, but it has different syntax from the DOS and Windows commands of the same name. See the man page for details on how to use it.

Listing 4 illustrates the use of cp and mv to make some backup copies of our text files. You also use ls -i to show inodes for some of your files.

  1. You first make a copy of your text1 file as text1.bkp.
  2. You then decide to create a backup subdirectory using the mkdir command.
  3. You make a second backup copy of text 1, this time in the backup directory, and show that all three files have different inodes.
  4. You then move your text1.bkp to the backup directory and after that rename it to be more consistent with the second backup. While you could have done this with a single command, you use two here for illustration.
  5. You check the inodes again and confirm that text1.bkp with inode 934193 is no longer in your lpi103-2 directory, but that the inode is that of text1.bkp.1 in the backup directory.
Listing 4. Copying and moving files
ian@Z61t-u14:~/lpi103-2$ cp text1 text1.bkp
ian@Z61t-u14:~/lpi103-2$ mkdir backup
ian@Z61t-u14:~/lpi103-2$ cp text1 backup/text1.bkp.2
ian@Z61t-u14:~/lpi103-2$ ls -i text1 text1.bkp backup
787425 text1  787445 text1.bkp

backup:
787447 text1.bkp.2
ian@Z61t-u14:~/lpi103-2$ mv text1.bkp backup
ian@Z61t-u14:~/lpi103-2$ mv backup/text1.bkp backup/text1.bkp.1
ian@Z61t-u14:~/lpi103-2$ ls -i text1 text1.bkp backup
ls: cannot access text1.bkp: No such file or directory
787425 text1

backup:
787445 text1.bkp.1  787447 text1.bkp.2

Normally, the cp command copies a file over an existing copy, if the existing file is writable. On the other hand, the mv does not move or rename a file if the target exists. There are several useful options relevant to this behavior of cp and mv.

-f or --force
causes cp to attempt to remove an existing target file even if it is not writable.
-i or --interactive
asks for confirmation before attempting to replace an existing file.
-b or --backup
makes a backup of any files that would be replaced.

As usual, consult the man pages for full details on these and other options for copying and moving.

Listing 6 illustrates copying with backup and then file deletion.

Listing 5. Making backup copies and deleting files
ian@Z61t-u14:~/lpi103-2$ cp text2 backup
ian@Z61t-u14:~/lpi103-2$ cp --backup=t text2 backup
ian@Z61t-u14:~/lpi103-2$ ls backup
text1.bkp.1  text1.bkp.2  text2  text2.~1~
ian@Z61t-u14:~/lpi103-2$ rm backup/text2 backup/text2.~1~
ian@Z61t-u14:~/lpi103-2$ ls backup
text1.bkp.1  text1.bkp.2

Note that the rm command also accepts the -i (interactive) and -f (force options). Once you remove a file using rm, the filesystem no longer has access to it. Some systems default to setting an alias alias rm='rm -i' for the root user to help prevent inadvertent file deletion. This is also a good idea for ordinary users if you are nervous about what you might accidentally delete.

Before this discussion concludes, you should note that the cp command defaults to creating a new time stamp for the new file or files. The owner and group are also set to the owner and group of the user doing the copying. You can use the -p option to preserve selected attributes. Note that the root user can be the only user who can preserve ownership. See the man page for details.


Creating and removing directories

You have already seen how to create a directory with mkdir. Now, let's look further at mkdir and introduce rmdir, its analog for removing directories.

Mkdir

Suppose you are in our lpi103-2 directory and you want to create subdirectories dir1 and dir2. mkdir, like the commands you have just been reviewing, handles multiple directory creation requests in one pass as shown in Listing 6.

Listing 6. Creating multiple directories
ian@Z61t-u14:~/lpi103-2$ mkdir dir1 dir2

Note that there is no output on successful completion, although you could use echo $? to confirm that the exit code is really 0.

If, instead, you wanted to create a nested subdirectory, such as d1/d2/d3, this would fail because the d1 and d2 directories do not exist. Fortunately, mkdir has a -p option that allows it to create any required parent directories, as shown in Listing 7.

Listing 7. Creating parent directories
ian@Z61t-u14:~/lpi103-2$ mkdir d1/d2/d3
mkdir: cannot create directory ‘d1/d2/d3’: No such file or directory
ian@Z61t-u14:~/lpi103-2$ echo $?
1
ian@Z61t-u14:~/lpi103-2$ mkdir -p d1/d2/d3
ian@Z61t-u14:~/lpi103-2$ echo $?
0

Rmdir

Removing directories using the rmdir command is the opposite of creating them. Again, there is a -p option to remove parents as well. You can remove a directory with rmdir only if it is empty as there is no option to force removal. You'll see another way to accomplish that particular trick when you look at recursive manipulation. Once you learn this, you will probably seldom use rmdir on the command line, but it is still good to know about it.

To illustrate directory removal, you copied your text1 file into the directory d1/d2 so that it is no longer empty. You then used rmdir to remove all the directories you just created with mkdir. As you can see, d1 and d2 were not removed because d2 was not empty. The other directories were removed. Once you remove the copy of text1 from d2, you can remove d1 and d2 with a single invocation of rmdir -p.

Listing 8. Removing directories
ian@Z61t-u14:~/lpi103-2$ cp text1 d1/d2
ian@Z61t-u14:~/lpi103-2$ rmdir -p d1/d2/d3 dir1 dir2
rmdir: failed to remove directory ‘d1/d2’: Directory not empty
ian@Z61t-u14:~/lpi103-2$ ls . d1/d2
.:
backup  sedtab  text2  text4  text6  xab  yab
d1      text1   text3  text5  xaa    yaa

d1/d2:
text1
ian@Z61t-u14:~/lpi103-2$ rm d1/d2/text1
ian@Z61t-u14:~/lpi103-2$ rmdir -p d1/d2

Handling multiple files and directories

Up to now, the commands you have used have operated on a single file or perhaps a few individually named files. For the rest of this tutorial, you look at various operations for handling multiple files, recursively manipulating part of a directory tree, and saving or restoring multiple files or directories.


Recursive manipulation

Recursive listing

The ls command has a -R (note uppercase "R") option for listing a directory and all its subdirectories. The recursive option applies only to directory names; it does not find all the files called 'text1', for example, in a directory tree. You can use other options that you have seen already along with -R. A recursive listing of our lpi103-2 directory, including inode numbers, is shown in Listing 9.

Listing 9. Displaying directory listings recursively
ian@Z61t-u14:~/lpi103-2$ ls -iR
.:
787446 backup  787425 text1  787431 text3  787433 text5  787427 xaa  787429 yaa
787434 sedtab  787426 text2  787432 text4  787435 text6  787428 xab  787430 yab

./backup:
787445 text1.bkp.1  787447 text1.bkp.2

Recursive copy

You can use the -r (or -R or --recursive) option to cause the cp command to descend into source directories and copy the contents recursively. To prevent an infinite recursion, you cannot copy the source directory itself. Listing 10 shows how to copy everything in your lpi103-2 directory to a copy1 subdirectory. You use ls -R to show the resulting directory tree.

Listing 10. Copying recursively
ian@Z61t-u14:~/lpi103-2$ cp -pR . copy1
cp: cannot copy a directory, ‘.’, into itself, ‘copy1’
ian@Z61t-u14:~/lpi103-2$ ls -R
.:
backup  sedtab  text2  text4  text6  xab  yab
copy1   text1   text3  text5  xaa    yaa

./backup:
text1.bkp.1  text1.bkp.2

./copy1:
backup  text2  text4  text5  text6  xaa  yaa

./copy1/backup:
text1.bkp.1  text1.bkp.2

Recursive deletion

I mentioned earlier that rmdir only removes empty directories. You can use the -r (or -R or --recursive) option to cause the rm command to remove both files and directories as shown in Listing 11, where you remove the copy1 directory that you just created, along with its contents, including the backup subdirectory and its contents.

Listing 11. Deleting recursively
ian@Z61t-u14:~/lpi103-2$ rm -r copy1
ian@Z61t-u14:~/lpi103-2$ ls -R
.:
backup  sedtab  text1  text2  text3  text4  text5  text6  xaa  xab  yaa  yab

./backup:
text1.bkp.1  text1.bkp.2

If you have files that are not writable by you, you might need to add the -f option to force removal. This is often done by the root user when cleaning up, but be warned that you can lose valuable data if you are not careful.


Wildcards and globbing

Often, you might need to perform a single operation on many filesystem objects, without operating on the entire tree as you just did with recursive operations. For example, you might want to find the modification times of all the text files you created in lpi103-2, without listing the split files. Although this is easy with our small directory, it is much harder in a large filesystem.

To solve this problem, use the wildcard support that is built in to the bash shell. This support, also called "globbing" (because it was originally implemented as a program called /etc/glob), lets you specify multiple files using a wildcard pattern.

A string containing any of the characters '?', '*' or '[', is a wildcard pattern. Globbing is the process by which the shell (or possibly another program) expands these patterns into a list of pathnames matching the pattern. The matching is done as follows:

?
matches any single character.
*
matches any string, including an empty string.
[
introduces a character class. A character class is a non-empty string, terminated by a ']'. A match means matching any single character enclosed by the brackets. There are a few special considerations:
  • The '*' and '?' characters match themselves. If you use these in filenames, you need to be careful about appropriate quoting or escaping.
  • Because the string must be non-empty and terminated by ']', you must put ']' first in the string if you want to match it.
  • The '-' character between two others represents a range that includes the two other characters and all characters between them in the collating sequence. For example, [0-9a-fA-F] represents any upper or lower case hexadecimal digit. You can match a '-' by putting it either first or last within a range.
  • The '!' character specified as the first character of a range complements the range so that it matches any character except the remaining characters. For example, [!0-9] means any character except the digits 0 through 9. A '!' in any position other than the first matches itself. Remember that '!' is also used with the shell history function, so you need to be careful to properly escape it.

Note: Wildcard patterns and regular expression patterns share some characteristics, but they are not the same. Pay careful attention.

Globbing is applied separately to each component of a path name. You cannot match a '/', nor include one in a range. You can use it anywhere that you might specify multiple file or directory names, for example in the ls, cp, mv, or rm commands. In Listing 12, you first create a couple of oddly named files and then use the ls and rm commands with wildcard patterns.

Listing 12. Wildcard pattern examples
ian@Z61t-u14:~/lpi103-2$ echo odd1>'text[*?!1]'
ian@Z61t-u14:~/lpi103-2$ echo odd2>'text[2*?!]'
ian@Z61t-u14:~/lpi103-2$ ls
backup  text1       text2       text3  text5  xaa  yaa
sedtab  text[*?!1]  text[2*?!]  text4  text6  xab  yab
ian@Z61t-u14:~/lpi103-2$ ls text[2-4]
text2  text3  text4
ian@Z61t-u14:~/lpi103-2$ ls text[!2-4]
text1  text5  text6
ian@Z61t-u14:~/lpi103-2$ ls text*[2-4]*
text2  text[2*?!]  text3  text4
ian@Z61t-u14:~/lpi103-2$ ls text*[!2-4]* # Surprise!
text1  text[*?!1]  text[2*?!]  text5  text6
ian@Z61t-u14:~/lpi103-2$ ls text*[!2-4] # Another surprise!
text1  text[*?!1]  text[2*?!]  text5  text6
ian@Z61t-u14:~/lpi103-2$ echo text*>text10
ian@Z61t-u14:~/lpi103-2$ ls *\!*
text[*?!1]  text[2*?!]
ian@Z61t-u14:~/lpi103-2$ ls *[x\!]*
text1       text10  text[2*?!]  text4  text6  xab
text[*?!1]  text2   text3       text5  xaa
ian@Z61t-u14:~/lpi103-2$ ls *[y\!]*
text[*?!1]  text[2*?!]  yaa  yab
ian@Z61t-u14:~/lpi103-2$ ls tex?[[]*
text[*?!1]  text[2*?!]
ian@Z61t-u14:~/lpi103-2$ rm tex?[[]*
ian@Z61t-u14:~/lpi103-2$ ls *b*
sedtab  xab  yab

backup:
text1.bkp.1  text1.bkp.2
ian@Z61t-u14:~/lpi103-2$ ls backup/*2
backup/text1.bkp.2
ian@Z61t-u14:~/lpi103-2$ ls -d .*
.  ..

Notes:

  1. Complementation in conjunction with '*' can lead to some surprises. The pattern '*[!2-4]' matches the longest part of a name that does not have 2, 3, or 4 following it, which is matched by both text[*?!1] and text[2*?!]. So now both surprises should be clear.
  2. As with earlier examples of ls, if pattern expansion results in a name that is a directory name and the -d option is not specified, then the contents of that directory are listed (as in our example above for the pattern '*b*').
  3. If a filename starts with a period (.), then that character must be matched explicitly. Notice that only the last ls command listed the two special directory entries (. and ..).

Remember that any wildcard characters in a command are liable to be expanded by the shell, which can lead to unexpected results. Furthermore, if you specify a pattern that does not match any filesystem objects, then POSIX requires that the original pattern string be passed to the command. Some earlier implementations passed a null list to the command, so you might run into old scripts that give unusual behavior. Listing 13 illustrates these points.

Listing 13. Wildcard pattern surprises
ian@Z61t-u14:~/lpi103-2$ echo text*
text1 text10 text2 text3 text4 text5 text6
ian@Z61t-u14:~/lpi103-2$ echo "text*"
text*
ian@Z61t-u14:~/lpi103-2$ echo text[[\!?]z??
text[[!?]z??

For more information on globbing, look at man 7 glob. You need the section number, as there is also glob information in section 3. The best way to understand all the various shell interactions is by practice, so try these wildcards out whenever you have a chance. Remember to try ls to check your wildcard pattern before allowing cp, mv, or worse, rm to do something unexpectedly.


Touching files

Now, let's look at the touch command, which can update file access and modification times or create empty files. In the next part, you can see how to use this information for finding files and directories. You continue using the lpi103-2 directory for the examples. You also look at the various ways you can specify time stamps.

touch

The touch command with no options takes one or more filenames as parameters and updates the modification time of the files. This is the same time stamp normally displayed with a long directory listing. Listing 14 uses our old friend echo to create a small file called f1, and then uses a long directory listing to display the modification time (or mtime). In this case, it happens also to be the time the file was created. It then uses the sleep command to wait for 60 seconds and runs ls again. Note that the time stamp for the file has changed by a minute.

Listing 14. Updating modification time with touch
ian@Z61t-u14:~/lpi103-2$ echo xxx>f1; ls -l f1; sleep 60; touch f1; ls -l f1
-rw-rw-r-- 1 ian ian 4 Jun  9 17:03 f1
-rw-rw-r-- 1 ian ian 4 Jun  9 17:04 f1

If you specify a filename for a file that does not exist, then touch normally creates an empty file for you, unless you specify the -c or --no-create option. Listing 15 illustrates both these commands. Note that only f2 is created.

Listing 15. Creating empty files with touch
ian@Z61t-u14:~/lpi103-2$ touch f2; touch -c f3; ls -l f*
-rw-rw-r-- 1 ian ian 4 Jun  9 17:04 f1
-rw-rw-r-- 1 ian ian 0 Jun  9 17:17 f2

The touch command can also set a file's modification time (also known as mtime) to a specific date and time using either the -d or -t options. The -d is very flexible in the date and time formats that it accepts, while the -t option needs at least an MMDDhhmm time with optional year and seconds values. Listing 16 shows some examples.

Listing 16. Setting mtime with touch
[ian@atticf20 lpic-1]$ touch -t 201408121510.59 f3
[ian@atticf20 lpic-1]$ touch -d 11am f4
[ian@atticf20 lpic-1]$ touch -d "last fortnight" f5
[ian@atticf20 lpic-1]$ touch -d "yesterday 6am" f6
[ian@atticf20 lpic-1]$ touch -d "380 days ago 12:00" f7
[ian@atticf20 lpic-1]$ touch -d "tomorrow 02:00" f8
[ian@atticf20 lpic-1]$ touch -d "5 Nov" f9
[ian@atticf20 lpic-1]$ ls -lrt f*
-rw-rw-r--. 1 ian ian 0 May 25  2014 f7
-rw-rw-r--. 1 ian ian 0 Aug 12  2014 f3
-rw-rw-r--. 1 ian ian 0 May 26 17:22 f5
-rw-rw-r--. 1 ian ian 0 Jun  8 06:00 f6
-rw-rw-r--. 1 ian ian 0 Jun  9 11:00 f4
-rw-rw-r--. 1 ian ian 0 Jun 10  2015 f8
-rw-rw-r--. 1 ian ian 0 Nov  5  2015 f9

If you're not sure what date a date expression might resolve to, you can use the date command to find out. Use the -d option with a date string to resolve the same kind of date formats that touch can. Note the different date formats in the listing for dates from a prior year or dates in the future.

You can use the -r (or --reference) option along with a reference filename to indicate that touch (or date) should use the time stamp of an existing file. Listing 17 shows some examples.

Listing 17. Timestamps from reference files
ian@Z61t-u14:~/lpi103-2$ date
Tue Jun  9 17:35:02 EDT 2015
ian@Z61t-u14:~/lpi103-2$ date -r f1
Tue Jun  9 17:04:04 EDT 2015
ian@Z61t-u14:~/lpi103-2$ touch -r f1 f1a
ian@Z61t-u14:~/lpi103-2$ ls -l f1*
-rw-rw-r-- 1 ian ian 4 Jun  9 17:04 f1
-rw-rw-r-- 1 ian ian 0 Jun  9 17:04 f1a

A Linux system records both a file modification time and a file access time. These are also known respectively as the mtime and atime. Both time stamps are set to the same value when a file is created, and both are reset when it is modified. If a file is accessed at all, then the access time is updated, even if the file is not modified. For our last example with touch, you look at file access times. The -a (or --time=atime, --time=access or --time=use) option specify that the access time should be updated. Listing 18 uses the cat command to access the f1 file and display its contents. You then use ls -l and ls -lu to display the modification and access times respectively for f1 and f1a, which you created using f1 as a reference file. You then reset the access time of f1 to that of f1a using touch -a and verify that it was reset.

Listing 18. Access time and modification time
ian@Z61t-u14:~/lpi103-2$ cat f1
xxx
ian@Z61t-u14:~/lpi103-2$ ls -lu f1*
-rw-rw-r-- 1 ian ian 4 Jun  9 17:39 f1
-rw-rw-r-- 1 ian ian 0 Jun  9 17:04 f1a
ian@Z61t-u14:~/lpi103-2$ ls -l f1*
-rw-rw-r-- 1 ian ian 4 Jun  9 17:04 f1
-rw-rw-r-- 1 ian ian 0 Jun  9 17:04 f1a
ian@Z61t-u14:~/lpi103-2$ touch -a -r f1a f1
ian@Z61t-u14:~/lpi103-2$ ls -lu f1*
-rw-rw-r-- 1 ian ian 4 Jun  9 17:04 f1
-rw-rw-r-- 1 ian ian 0 Jun  9 17:04 f1a

For more complete information on the many allowable time and date specifications, see the man or info pages for the touch and date commands.


Finding files

Now that we've covered the file and directory topic with the big recursive hammer that hits everything, and the globbing hammer that hits more selectively, let's look at the find command, which can be more like a surgeon's knife. The find command is used to find files in one or more directory trees, based on criteria such as name, time stamp, or size. Again, you will use the lpi103-2 directory.

find

The find command searches for files or directories using all or part of the name, or by other search criteria, such as size, type, file owner, creation date, or last access date. The most basic find is a search by name or part of a name. Listing 19 shows an example from your lpi103-2 directory where you first search for all files that have either a '1' or a 'k' in their name, then perform some path searches that are explained in the following notes.

Listing 19. Finding files by name
ian@Z61t-u14:~/lpi103-2$ find . -name "*[1k]*"
./text10
./f1a
./backup
./backup/text1.bkp.2
./backup/text1.bkp.1
./f1
./text1
ian@Z61t-u14:~/lpi103-2$ find . -iwholename "*ACK*1"
./backup/text1.bkp.1
ian@Z61t-u14:~/lpi103-2$ find . -iwholename "*ACK*/*1"
./backup/text1.bkp.1

Notes:

  1. The patterns that you can use are shell wildcard patterns like those you saw earlier under Wildcards and globbing. Be careful with wildcards. You need to quote them, so that the shell passes the string to find, rather than passing a list of files that match the string.
  2. You can use -wholename instead of -name to match full paths instead of just base file names. In this case, the pattern may span path components, unlike ordinary wildcard matches, which match only a single part of a path.
  3. If you want a case-insensitive search as shown in the use of iwholename previously, precede the find options that search on a string or pattern with an 'i'. Note: You might sometimes see path and ipath instead of wholename or iwholename, but these are now deprecated.
  4. If you want to find a file or directory whose name begins with a dot, such as .bashrc or the current directory (.), then you must specify a leading dot as part of the pattern. Otherwise, name searches ignore these files or directories.

In the first example above, you found both files and a directory (./backup). Use the -type parameter along with one-letter type to restrict the search. Use 'f' for regular files, 'd' for directories, and 'l' for symbolic links. See the man page for find for other possible types. Listing 20 shows the result of searching for directories (-type d) alone and with a file name (*, or everything, in this case).

Listing 20. Finding files by type
ian@Z61t-u14:~/lpi103-2$ find . -type d
.
./backup
ian@Z61t-u14:~/lpi103-2$ find . -type d -name "*"
.
./backup

Note that the -type d specification without any form of name specification displays directories that have a leading dot in their names (only the current directory in this case), as does the wildcard "*".

You can also search by file size, either for a specific size (n) or for files that are either larger (+n) or smaller than a given value (-n). By using both upper and lower size bounds, you can find files whose size is within a given range. By default, the -size option of find assumes a unit of 'b' for 512-byte blocks. Among other choices, specify 'c' for bytes, or 'k' for kilobytes. In Listing 21, you first find all files with size 0, and then all with size of either 24 or 25 bytes. Note that specifying -empty instead of -size 0 also finds empty files.

Listing 21. Finding files by size
ian@Z61t-u14:~/lpi103-2$ find . -size 0
./f1a
./f2
ian@Z61t-u14:~/lpi103-2$ find . -size -26c -size +23c -print
./text2
./text5
./backup/text1.bkp.2
./backup/text1.bkp.1
./text1

The second example in Listing 21 introduces the -print option, which simply prints the output to stdout. This an example of an action that can be taken on the results returned by the search. In the bash shell -print is the default action if no action is specified. On some systems and some shells, an action is required; otherwise, there is no output.

Other actions include -ls, which prints file information equivalent to that from the ls -lids command, and -exec, which executes a command for each file. The -exec has to be terminated by a semicolon, which must be escaped to avoid the shell interpreting it first. Also specify {} wherever you want the returned file used in the command. Remember that curly braces also have meaning to the shell and must be escaped (or quoted). Listing 22 shows how the -ls and the -exec options can be used to list file information. Notice that the second form does not list the inode information.

Listing 22. Finding and acting on files
ian@Z61t-u14:~/lpi103-2$ find . -size -26c -size +23c -ls
787426    4 -rw-rw-r--   1 ian      ian            25 Jun  8 13:36 ./text2
787433    4 -rw-rw-r--   1 ian      ian            24 Jun  8 16:42 ./text5
787447    4 -rw-rw-r--   1 ian      ian            24 Jun  9 13:09 ./backup/text1.bkp.2
787445    4 -rw-rw-r--   1 ian      ian            24 Jun  9 13:09 ./backup/text1.bkp.1
787425    4 -rw-rw-r--   1 ian      ian            24 Jun  8 13:26 ./text1
ian@Z61t-u14:~/lpi103-2$ find . -size -26c -size +23c -exec ls -l '{}' \;
-rw-rw-r-- 1 ian ian 25 Jun  8 13:36 ./text2
-rw-rw-r-- 1 ian ian 24 Jun  8 16:42 ./text5
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.2
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.1
-rw-rw-r-- 1 ian ian 24 Jun  8 13:26 ./text1

You can use the -exec option for as many purposes as your imagination can dream up. For example:

find . -empty -exec rm '{}' \;

removes all the empty files in a directory tree, while

find . -name "*.htm" -exec mv '{}' '{}l' \;

renames all .htm files to .html files.

For our final examples of find, you use the time stamps described with the touch command to locate files having particular time stamps whose names start with 't'. You start by setting the time stamp of text2 to the day before yesterday, so that you can see differences in the output. Listing 23 shows three examples:

  1. When used with -mtime -2, the find command finds all files modified within the last two days. A day in this case is a 24-hour period relative to the current date and time. Note that you would use -atime if you wanted to find files based on access time rather than modification time.
  2. Adding the -daystart option means that you want to consider days as calendar days, starting at midnight. Now the text2 file is excluded from the list.
  3. Finally, you show how to use a time range in minutes rather than days to find files modified between one hour (60 minutes) and 12 hours (720 minutes) ago.
Listing 23. Finding files by time stamp
ian@Z61t-u14:~/lpi103-2$ date
Tue Jun  9 23:25:31 EDT 2015
ian@Z61t-u14:~/lpi103-2$ touch -d "2 days ago 23:45" text2
ian@Z61t-u14:~/lpi103-2$ find . -mtime -2 -type f -name "t*" -exec ls -l '{}' \;
-rw-rw-r-- 1 ian ian 25 Jun  7 23:45 ./text2
-rw-rw-r-- 1 ian ian 26 Jun  8 16:19 ./text4
-rw-rw-r-- 1 ian ian 58 Jun  9 16:46 ./text10
-rw-rw-r-- 1 ian ian 98 Jun  8 17:09 ./text6
-rw-rw-r-- 1 ian ian 24 Jun  8 16:42 ./text5
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.2
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.1
-rw-rw-r-- 1 ian ian 63 Jun  8 16:19 ./text3
-rw-rw-r-- 1 ian ian 24 Jun  8 13:26 ./text1
ian@Z61t-u14:~/lpi103-2$ find . -daystart -mtime -2 -type f -name "t*" -exec ls -l '{}' \;
-rw-rw-r-- 1 ian ian 26 Jun  8 16:19 ./text4
-rw-rw-r-- 1 ian ian 58 Jun  9 16:46 ./text10
-rw-rw-r-- 1 ian ian 98 Jun  8 17:09 ./text6
-rw-rw-r-- 1 ian ian 24 Jun  8 16:42 ./text5
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.2
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.1
-rw-rw-r-- 1 ian ian 63 Jun  8 16:19 ./text3
-rw-rw-r-- 1 ian ian 24 Jun  8 13:26 ./text1
ian@Z61t-u14:~/lpi103-2$ find . -mmin -720 -mmin +60 -type f -name "t*" -exec ls -l '{}' \;
-rw-rw-r-- 1 ian ian 58 Jun  9 16:46 ./text10
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.2
-rw-rw-r-- 1 ian ian 24 Jun  9 13:09 ./backup/text1.bkp.1

The man pages for the find command can help you learn the extensive range of options that this brief introduction cannot cover.


Identifying files

File names often have a suffix such as gif, jpeg, or html that gives a hint of what the file might contain. Linux does not require such suffixes and generally does not use them to identify a file type. Knowing what type of file you are dealing with helps you know what program to use to display or manipulate it. The file command tells you something about the type of data in one or more files. Listing 24 shows some examples of using the file command.

Listing 24. Identifying file contents
ian@Z61t-u14:~/lpi103-2$ ile backup text1 f2 ~/p-ishields.jpg /bin/echo
backup:                   directory 
text1:                    ASCII text
f2:                       empty 
/home/ian/p-ishields.jpg: JPEG image data, JFIF standard 1.02
/bin/echo:                ELF 32-bit LSB  executable, Intel 80386, version 1 (SY
SV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=
6aea7ab0e5ad49166cc6c53e121a79174a59ae50, stripped

The file command attempts to classify each file using three types of test. Filesystem tests use the results of the stat command to determine whether a file is empty or a directory, for example. So-called magic tests check a file for specific contents that identify it. These signatures are also known as magic numbers. Finally, language tests look at the content of text files to attempt to determine if a file is an XML file, C or C++ language source, a troff file, or some other file that is considered source for some kind of language processor. The first type that is found is reported unless the -k or --keep-goingoption is specified.

The file command has many options that you can learn about using the man pages. Listing 25 shows how to use the -i (or --mime) option to display the file type as a MIME string instead of the normal human-readable output.

Listing 25. Identifying file contents as MIME
ian@Z61t-u14:~/lpi103-2$ file -i backup text1 f2 ~/p-ishields.jpg /bin/echo
backup:                   inode/directory; charset=binary
text1:                    text/plain; charset=us-ascii
f2:                       inode/x-empty; charset=binary
/home/ian/p-ishields.jpg: image/jpeg; charset=binary
/bin/echo:                application/x-executable; charset=binary

The magic number files are also managed by the file command. Again, see the man pages for more information.

Note: The identify command, which is part of the ImageMagick package, is an additional tool that provides more detail when identifying image file types. Listing 26 shows an example.

Listing 26. Using ImageMagick's identify command
ian@Z61t-u14:~/lpi103-2$ identify ~/p-ishields.jpg
/home/ian/p-ishields.jpg JPEG 64x80 64x80+0+0 8-bit DirectClass 3.58KB 0.000u 0:00.019

Compressing files

When you are backing up, archiving, or transmitting files, it is common to compress the files. In a Linux environment, three popular compression programs are gzip, bzip2, and xz. The gzip command uses the Lempel-Ziv algorithm, bzip2 uses the Burrows-Wheeler block sorting algorithm, and xz uses the Lempel–Ziv–Markov chain algorithm (LZMA). All three are lossless compression tools.

Using gzip and gunzip

Compression generally works well on text files. Many image formats already compress the data, so compression might not work well on these or other binary files. To illustrate compression on a reasonably large text file, let's copy /etc/services to the directory you have been using and compress it using gzip as shown in Listing 27. You use the -p option of cp to preserve the time stamp of /etc/services. Note that the compressed file has the same time stamp and has a .gz suffix.

Listing 27. Compressing with gzip
ian@Z61t-u14:~/lpi103-2$ cp -p /etc/services .
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services
ian@Z61t-u14:~/lpi103-2$ gzip services
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 7538 Dec 30  2013 services.gz

You decompress a gzipped file using the -d option of gzip or, more commonly, using the gunzip command. Listing 28 shows the first of these choices. Note that the uncompressed file now has the original file name and time stamp.

Listing 28. Decompressing with gzip
ian@Z61t-u14:~/lpi103-2$ gzip -d services.gz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services

Using bzip2 and bunzip2

The bzip2 command operates in a similar manner to gzip as shown in Listing 29.

Listing 29. Compressing with bzip2
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services
ian@Z61t-u14:~/lpi103-2$ bzip2 services
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 7208 Dec 30  2013 services.bz2
ian@Z61t-u14:~/lpi103-2$ bunzip2 services.bz2 
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services

Using xz and unxz

The xz command is a newer compression command that uses a Lempel–Ziv–Markov chain algorithm (LZMA), first used in the 7-Zip archiver. LZMA2 is a container format that can hold data compressed with possibly different LZMA parameters and plain uncompressed data. The xz command's native format (.xz) is a container for a single compressed stream, making it similar to both gzip and bzip2 in this regard. See Resources for more information.

Not surprisingly, the xz command operates in a similar manner to gzip and bzip2 as shown in Listing 30.

Listing 30. Compressing with xz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services
ian@Z61t-u14:~/lpi103-2$ xz services 
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 7144 Dec 30  2013 services.xz
ian@Z61t-u14:~/lpi103-2$ unxz services.xz 
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services

Differences among compressors

Speed — higher compression rratios often require more time to do the compression.

By design, many of the bzip2 and xz options are the same as those of gzip, but the commands do not all have identical options. You might have noted that in both our examples, the uncompressed file had the same name and time stamp as the original. However, renaming or touching the compressed file can change this behavior. The gzip command has the -N or --name option to force the name and time stamp to be preserved, but bzip2 and xzdo not. The gzip and xzcommands also have a -l option to display information about the compressed file, including the name that will be used when it is decompressed. Listing 31 illustrates some of these differences between the commands. For all three commands, the -v or --verbose provides information about the compression.

Listing 31. Some differences between gzip, bzip2 and xz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services
ian@Z61t-u14:~/lpi103-2$ gzip -Nv services
services:	 61.6% -- replaced with services.gz
ian@Z61t-u14:~/lpi103-2$ touch services.gz
ian@Z61t-u14:~/lpi103-2$ mv services.gz services-x.gz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 7538 Jun 12 13:11 services-x.gz
ian@Z61t-u14:~/lpi103-2$ gzip -l services-x.gz
         compressed        uncompressed  ratio uncompressed_name
               7538               19558  61.6% services-x
ian@Z61t-u14:~/lpi103-2$ gzip -lN services-x.gz
         compressed        uncompressed  ratio uncompressed_name
               7538               19558  61.6% services
ian@Z61t-u14:~/lpi103-2$ gunzip -N services-x.gz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services
ian@Z61t-u14:~/lpi103-2$ bzip2 -v services
  services:  2.713:1,  2.948 bits/byte, 63.15% saved, 19558 in, 7208 out.
ian@Z61t-u14:~/lpi103-2$ mv services.bz2 services-x.bz2
ian@Z61t-u14:~/lpi103-2$ touch services-x.bz2
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 7208 Jun 12 13:12 services-x.bz2
ian@Z61t-u14:~/lpi103-2$ bunzip2 services-x.bz2
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 19558 Jun 12 13:12 services-x
ian@Z61t-u14:~/lpi103-2$ ls -l serv* # New date and name
-rw-r--r-- 1 ian ian 19558 Jun 12 13:12 services-x
ian@Z61t-u14:~/lpi103-2$ rm services-x; cp -p /etc/services . # get fresh copy
ian@Z61t-u14:~/lpi103-2$ ls -l serv* 
-rw-r--r-- 1 ian ian 19558 Dec 30  2013 services
ian@Z61t-u14:~/lpi103-2$ xz -v services
services (1/1)
  100 %          7,144 B / 19.1 KiB = 0.365                                    
ian@Z61t-u14:~/lpi103-2$ mv services.xz services-x.xz
ian@Z61t-u14:~/lpi103-2$ touch services-x.xz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*
-rw-r--r-- 1 ian ian 7144 Jun 12 13:15 services-x.xz
ian@Z61t-u14:~/lpi103-2$ unxz services-x.xz
ian@Z61t-u14:~/lpi103-2$ ls -l serv*  # New date and name
-rw-r--r-- 1 ian ian 19558 Jun 12 13:15 services-x
ian@Z61t-u14:~/lpi103-2$ rm services-x # Don't need this any more

All three compressors accept input from stdin using redirection the - parameter in a pipeline. All support the -c or --stdoutoption to direct output to stdout. If input is from stdin, then output goes to stdout by default. Listing 32 shows some examples.

Listing 32. Using compressors with stdin and stdout
ian@Z61t-u14:~/lpi103-2$ cat /etc/services | gzip - -c >servg.gz
ian@Z61t-u14:~/lpi103-2$ gzip -l servg.gz 
         compressed        uncompressed  ratio uncompressed_name
               7529               19558  61.6% servg
ian@Z61t-u14:~/lpi103-2$ xz - </etc/services > servx.xz
ian@Z61t-u14:~/lpi103-2$ xz -l servx.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1      7,144 B     19.1 KiB  0.365  CRC64   servx.xz
ian@Z61t-u14:~/lpi103-2$ unxz servx.xz
ian@Z61t-u14:~/lpi103-2$ ls *serv*
servg.gz  servx
ian@Z61t-u14:~/lpi103-2$ ls -l *serv*
-rw-rw-r-- 1 ian ian  7529 Jun 12 13:39 servg.gz
-rw-rw-r-- 1 ian ian 19558 Jun 12 13:40 servx
ian@Z61t-u14:~/lpi103-2$ rm serv* # Don't need these now

There are two other commands associated with bzip2:

  1. The bzcat command decompresses files to stdout and is equivalent to bzip2 -dc.
  2. The bzip2recover command attempts to recover data from damaged bzip2 files.

There are also four other commands associated with xz:

  1. The xzcat command is equivalent to xz --decompress --stdout.
  2. The lzma command is equivalent to xz --format=lzma.
  3. The unlzma command is equivalent to xz --format=lzma --decompress.
  4. The lzcat command is equivalent to xz --format=lzma --decompress --stdout.

The man pages can help you learn more about the other options of gzip, bzip2, and xz.

Other compression tools

Two older programs, compress and uncompress, are still frequently found on Linux and UNIX systems.

In addition, the zip and unzip commands from the Info-ZIP project are implemented for Linux. These provide cross-platform compression functions that are available on a wide range of hardware and operating systems. Be aware that not all operating systems support the same file attributes or filesystem capabilities. If you download a zipped product file and unzip it on a Windows system and then transfer the resulting files to a CD or DVD for installation on Linux, you might experience problems installing because, for example, the Windows system does not support the symbolic links that were part of the original uncompressed file set.

For more information on these or other compression programs, see their respective man pages.


Archiving files

The tar, cpio, and dd commands are commonly used for backing up groups of files or even whole partitions, either for archiving or for transmission to another user or site. Exam 201, which is part of the LPIC-2: Linux Network Professional Certification, focuses on backup considerations in greater detail.

There are three general approaches to backup:

  • A differential or cumulative backup is a backup of all things that have changed since the last full backup. Recovery requires the last full backup plus the latest differential backup.
  • An incremental backup is a backup of only those changes since the last incremental backup. Recovery requires the last full backup plus all of the incremental backups (in order) since the last full backup.
  • A full backup is a complete backup, usually of a whole filesystem, directory, or group of related files. This takes the longest time to create, so it is usually used with one of the other two approaches.

These commands, along with the other commands you have learned about in this tutorial, give you the tools to perform any of these backup tasks.

Using tar

The tar (originally from Tape ARchive) creates an archive file, or tarfile or tarball, from a set of input files or directories; it also restores files from such an archive. If a directory is given as input to tar, all files and subdirectories are automatically included, which makes tar very convenient for archiving subtrees of your directory structure.

Output can be to a file, a device such as tape or diskette, or stdout. The output location is specified with the -f option. Other common options are -c to create an archive, -x to extract an archive, -v for verbose output, which lists the files being processed, -z to use gzip compression, and -j to use bzip2 compression. Most tar options have a short form using a single hyphen and a long form using a pair of hyphens. The short forms are illustrated here. See the man pages for the long form and for additional options.

Listing 33 shows how to create a backup of the lpi103-2 directory using tar.

Listing 33. Backing up our lpi103-2 directory using tar
[ian@echidna lpi103-2]$ tar -cvf ../lpitar1.tar .
ian@Z61t-u14:~/lpi103-2$ tar -cvf ../lpitar1.tar .
./
./text2
./text4
./text10
...
./text1
./xab

Usually you want to compress archive files to save space or reduce transmission time. With the GNU version of the tar command, you can do this with a single option:

  • -z ( or --gzip, --gunzip --ungzip) for compression using gzip
  • -b (or --bzip2) for compression using bzip2
  • -J (or --xz) for compression using xz

Listing 34 illustrates the use of the -z option and the difference in size between the two archives.

Listing 34. Compressing the tar archive with gzip
ian@Z61t-u14:~/lpi103-2$ tar -zcvf ../lpitar2.tar ~/lpi103-2/
tar: Removing leading `/' from member names
/home/ian/lpi103-2/
/home/ian/lpi103-2/text2
/home/ian/lpi103-2/text4
/home/ian/lpi103-2/text10
...
/home/ian/lpi103-2/text1
/home/ian/lpi103-2/xab
ian@Z61t-u14:~/lpi103-2$ ls -l ../lpitar*
-rw-rw-r-- 1 ian ian 20480 Jun 12 16:39 ../lpitar1.tar
-rw-rw-r-- 1 ian ian   703 Jun 12 16:51 ../lpitar2.tar

Note: Tar files that have been gzipped frequently have a .tgz ending rather than the plain .tar, which is used more for uncompressed files. This is a convention — the tar command does not depend on the file extension.

Listing 34 also shows another important feature of tar. It uses an absolute directory path, and the first line of output tells you that tar is removing the leading slash (/) from member names. This allows files to be restored to some other location for verification and can be particularly important if you are trying to restore system files. If you really want to store absolute names, use the -p option. It is also a good idea to avoid mixing absolute path names with relative path names when creating an archive, because all will be relative when restoring from the archive.

The tar command can append additional files to an archive using the -r or --append option. This can cause multiple copies of a file in the archive. In such a case, the last one will be restored during a restore operation. You can use the --occurrence option to select a specific file among multiples. If the archive is on a regular filesystem instead of tape, you can use the -u or --update option to update an archive. This works like appending to an archive, except that the time stamps of the files in the archive are compared with those on the filesystem, and only files that have been modified since the archived version are appended. As mentioned, this does not work for tape archives.

The tar command can also compare archives with the current filesystem and restore files from archives. Use the -d, --compare, or --diff option to perform comparisons. The output shows files whose contents differ, as well as files whose time stamps differ. Normally, only files that differ, if any, are listed. Use the -v option discussed earlier for verbose output. The -C or --directory option tells tar to perform an operation starting from the specified directory rather than the current directory.

Listing 35 shows some examples. It uses touch to modify the time stamp of the f1 file, then illustrates comparison operations of tar before restoring f1 from one of your archives. Listing 35 uses a variety of option forms for illustration.

Listing 35. Comparing and restoring using tar
ian@Z61t-u14:~/lpi103-2$ touch f1
ian@Z61t-u14:~/lpi103-2$ tar --diff --file ../lpitar1.tar .
./f1: Mod time differs
ian@Z61t-u14:~/lpi103-2$ tar -df ../lpitar2.tar -C /
home/ian/lpi103-2/f1: Mod time differs
ian@Z61t-u14:~/lpi103-2$ tar -xvf ../lpitar1.tar ./f1 # See below
./f1
ian@Z61t-u14:~/lpi103-2$ tar --compare -f ../lpitar2.tar --directory /

The files or directories you specify for restoration must match the name in the archive. Attempting to restore just f1 rather than ./f1 in this case would not work. You can use globbing, but you need to be careful to avoid restoring more or less than you want. You can use the --list or -t option to list archive contents if you are unsure what is in an archive. Use the --wildcards option if you want to use wildcard file names. Listing 36 shows an example of a wildcard specification that would have restored more files than just ./f1.

Listing 36. Listing archive contents with tar
ian@Z61t-u14:~/lpi103-2$ tar -tf  ../lpitar1.tar --wildcards "*f1*"
./f1a
./f1

You can use the find command to select the files for archiving and then pipe the result to tar. I discuss this technique as part of the discussion of cpio, but the same method works for tar.

As with the other commands you have studied here, there are many options that are not covered in this brief introduction. See the man or info pages for more details.

Using cpio

The cpio command operates in copy-out mode to create an archive, copy-in mode to restore an archive, or copy-pass mode to copy a set of files from one location to another. You use the -o or --create option for copy-out mode, the -i or --extract option for copy-in mode, and the -p or --pass-through option for copy-pass mode. Input is a list of files provided on stdin. Output is either to stdout or to a device or file specified with the -f or --file option.

Listing 37 shows how to generate a list of files using the find command and then pipe the list to cpio. Note the use of the -print0 option on find to generate null-terminated strings for file names, and the corresponding --null option on cpio to read this format. This correctly handles file names that have embedded blank or newline characters. The -depth option tells find to list directory entries before the directory name. In this example, you simply create two archives of our lpi103-2 directory, one with relative names and one with absolute names. You do not use the many capabilities of find to restrict the selected files, such as finding only the files modified this week.

Listing 37. Backing up a directory using cpio
ian@Z61t-u14:~/lpi103-2$ find . -depth -print0 | cpio --null -o > ../lpicpio.1
3 blocks
ian@Z61t-u14:~/lpi103-2$ find ~/lpi103-2/ -depth -print0 | cpio --null -o > ../lpicpio.2
3 blocks

If you'd like to see the files listed as they are archived, add the -v option to cpio.

The cpio command in copy-in mode (option -i or --extract) can list the contents of an archive or restore selected files. When you list the files, specifying the --absolute-filenames option reduces the number of extraneous messages that some older versions of cpio otherwise issues as they strip any leading / characters from each path that has one. This option is quietly ignored on many current implementations. Output from selectively listing your previous archives is shown in Listing 38.

Listing 38. Listing and restoring selected files using cpio
ian@Z61t-u14:~/lpi103-2$ cpio  -i --list  "*backup*" < ../lpicpio.1
backup/text1.bkp.2
backup/text1.bkp.1
backup
3 blocks
ian@Z61t-u14:~/lpi103-2$ cpio  -i --list absolute-filenames "*text1*" < ../lpicpio.2
/home/ian/lpi103-2/text10
/home/ian/lpi103-2/backup/text1.bkp.2
/home/ian/lpi103-2/backup/text1.bkp.1
/home/ian/lpi103-2/text1
3 blocks

Listing 39 shows how to restore all the files with "text1" in their path into a temporary subdirectory. Some of these are in subdirectories. Unlike tar, you need to specify the -d or --make-directories option explicitly if your directory tree does not exist. Furthermore, cpio does not replace any newer files on the filesystem with archive copies unless you specify the -u or --unconditional option.

Listing 39. Restoring selected files using cpio
ian@Z61t-u14:~/lpi103-2$ mkdir temp
ian@Z61t-u14:~/lpi103-2$ cd temp
ian@Z61t-u14:~/lpi103-2/temp$ cpio  -idv "*f1*" "*.bkp.1" < ../../lpicpio.1
f1a
backup/text1.bkp.1
f1
3 blocks
ian@Z61t-u14:~/lpi103-2/temp$ cpio  -idv "*.bkp.1" < ../../lpicpio.1
cpio: backup/text1.bkp.1 not created: newer or same age version exists
backup/text1.bkp.1
3 blocks
ian@Z61t-u14:~/lpi103-2/temp$ cpio  -id -v --no-absolute-filenames "*text1*" < ../../lpicpio.2
cpio: Removing leading `/' from member names
home/ian/lpi103-2/text10
home/ian/lpi103-2/backup/text1.bkp.2
home/ian/lpi103-2/backup/text1.bkp.1
home/ian/lpi103-2/text1
3 blocks
ian@Z61t-u14:~/lpi103-2/temp$ find .
.
./home
./home/ian
./home/ian/lpi103-2
./home/ian/lpi103-2/text10
./home/ian/lpi103-2/backup
./home/ian/lpi103-2/backup/text1.bkp.2
./home/ian/lpi103-2/backup/text1.bkp.1
./home/ian/lpi103-2/text1
./f1a
./backup
./backup/text1.bkp.1
./f1
ian@Z61t-u14:~/lpi103-2/temp$ cd ..
ian@Z61t-u14:~/lpi103-2$ rm -rf temp # You may remove these after you have finished

For details on other options, see the man page.

The dd command

In its simplest form, the dd command copies an input file to an output file. You have already seen the cp command, so you might wonder why have another command to copy files. The dd command can do a couple of things that regular cp cannot. In particular, it can perform conversions on the file, such as converting lowercase to uppercase or ASCII to EBCDIC. It can also reblock a file, which can be desirable when transferring it to tape. It can skip or include only selected blocks of a file. And finally, it can read and write to raw devices, such as /dev/sda, which allows you to create or restore a file that is a whole partition image. Writing to devices usually requires root authority.

Let's start with a simple example of converting a file to uppercase using the conv option as shown in Listing 40. You use the if option to specify the input file rather than using the default of stdin. A similar of option is available to override the default output to stdout. For purposes of illustration, you specify different input and output block sizes using the ibs and obs options. For large files, it can be handy to use larger block sizes to speed up operations when transferring disk to disk. Otherwise, block sizes are mostly used with magnetic tapes. Note the three status lines at the end of the listing showing how many complete and partial blocks were read and written and the total amount of data transferred.

Listing 40. Converting text to upper case using dd
ian@Z61t-u14:~/lpi103-2$ cat text6
1 apple
2 pear
3 banana
9	plum
3	banana
10	apple
1 apple
2 pear
3 banana
9	plum
3	banana
10	apple
ian@Z61t-u14:~/lpi103-2$ dd if=text6 conv=ucase ibs=20 obs=30
1 APPLE
2 PEAR
3 BANANA
9	PLUM
3	BANANA
10	APPLE
1 APPLE
2 PEAR
3 BANANA
9	PLUM
3	BANANA
10	APPLE
4+1 records in
3+1 records out
98 bytes (98 B) copied, 0.000421841 s, 232 kB/s

Either file can be a raw device. This is usually the case for magnetic tape, but a whole disk partition, such as /dev/hda1 or /dev/sda2, can be backed up to a file or tape. Ideally, the filesystem on the device should be unmounted, or at least mounted read only, to ensure that data does not change during the backup. Listing 42 shows an example where the input file is a raw device, /dev/sda3, and the output file is a file, backup-1, in the root user's home directory. To dump the file to tape or floppy disk, you would specify something like of=/dev/fd0 or of=/dev/st0.

Listing 41. Backing up a partition using dd
ian@Z61t-u14:~/lpi103-2$ sudo -s
[sudo] password for ian: 
root@Z61t-u14:~/lpi103-2# cd /root
root@Z61t-u14:/root# dd if=/dev/sda2 of=backup-1
9450000+0 records in
9450000+0 records out
4838400000 bytes (4.8 GB) copied, 144.189 s, 33.6 MB/s

Note that 4.8GB of data were copied and the output file is indeed that large, even though only about 3.9GB of this particular partition is actually used. Unless you are copying to a tape with hardware compression, you probably want to compress the data. Listing 42 shows one way to accomplish this by piping the dd output through xz. You repeat using gzip as the compressor for comparison. Then, look at the results using the output of ls to show you the three backup file sizes, and finally you mount the partition and use df command to show the usage percentage of the filesystem on /dev/sda2. Note that gzip and xz assume input from stdin if no file is specified, so you do not need to explicitly specify the - option.

Listing 42. Backing up with compression using dd and gzip
ian@Z61t-u14:~/lpi103-2$ sudo -s
[sudo] password for ian: 
root@Z61t-u14:~/lpi103-2# cd /root
root@Z61t-u14:/root# dd if=/dev/sda2 of=backup-1
9450000+0 records in
9450000+0 records out
4838400000 bytes (4.8 GB) copied, 144.189 s, 33.6 MB/s
root@Z61t-u14:/root# dd if=/dev/sda2 |xz >backup-2
9450000+0 records in
9450000+0 records out
4838400000 bytes (4.8 GB) copied, 4637.01 s, 1.0 MB/s
root@Z61t-u14:/root# dd if=/dev/sda2 |gzip >backup-3
9450000+0 records in
9450000+0 records out
4838400000 bytes (4.8 GB) copied, 474.107 s, 10.2 MB/s
root@Z61t-u14:/root# ls -l backup-*
-rw-r--r-- 1 root root 4838400000 Jun 12 18:02 backup-1
-rw-r--r-- 1 root root 3721409720 Jun 12 19:49 backup-2
-rw-r--r-- 1 root root 3790762349 Jun 12 21:56 backup-3
root@Z61t-u14:/root# mkdir /mnt/sda2
root@Z61t-u14:/root# mount /dev/sda2 /mnt/sda2
root@Z61t-u14:/root# df -h /dev/sda2
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       4.5G  3.9G  613M  87% /mnt/sda2

The gzip compression reduced the file size to about 78% of the uncompressed size and xz achieved about 77%, albeit taking considerably longer to compress. However, unused blocks can contain arbitrary data, so even the compressed backup can be larger than you expect if you consider only the data on the partition.

If you divide the total bytes copied by the number of records processed, you see that dd is writing 512-byte blocks of data. When copying to a raw output device such as tape, this can result in a very inefficient operation. As mentioned previously, specify the obs option to change the output size or the ibs option to specify the input block size. You can also specify just bs to set both input and output block sizes to a common value. When using tape, remember to use the same block size for reading the tape as you used for writing it.

If you need multiple tapes or other removable storage to store your backup, you need to break it into smaller pieces using a utility such as split. If you need to skip blocks such as disk or tape labels, you can do so with dd. See the man page for examples.

The dd command is not filesystem aware, so you need to restore a dump of a partition to find out what is on it. Listing 43 shows how to restore the partition that was dumped in Listing 42. You restore it to an empty partition (for example, /dev/sda7) so you can check it.

Listing 43. Restoring a partition using dd
root@Z61t-u14:/root# gunzip backup-3 -c | dd  of=/dev/sda7
9450000+0 records in
9450000+0 records out
4838400000 bytes (4.8 GB) copied, 445.272 s, 10.9 MB/s

You might be interested to know that some CD- and DVD-burning applications use the dd command under the covers to do the actual device writing. If the utility you use provides a log of commands actually used, you might find it instructive to look at the log now that you know a little more about dd. Indeed, if you burn an ISO image to a CD or DVD disc, one way to verify that there were no errors is to use dd to read the disc back and pipe the result through the cmp utility. Listing 44 illustrates the general technique using the backup file that you created in this tutorial rather than an ISO image. Note that you calculate the number of blocks to read using the file size of the image.

Listing 44. Comparing an image with a filesystem.
root@Z61t-u14:/root# ls -l backup-1
-rw-r--r-- 1 root root 4838400000 Jun 12 18:02 backup-1
root@Z61t-u14:/root#  echo $(( 4838400000 / 512 )) # calculate number of 512 byte blocks
9450000
root@Z61t-u14:/root# dd if=/dev/sda7 bs=512 count=9450000 | cmp - backup-1
9450000+0 records in
9450000+0 records out
4838400000 bytes (4.8 GB) copied, 264.092 s, 18.3 MB/s

One thing that's not obvious from these examples is that I restored a FAT32 partition over what had been a larger EXT4 partition. You need to use some of the partition maangement tools discussed in another tutorial in this series - Learn Linux, 101: Create partitions and filesystems to correct this situation.

That completes your introduction to file and directory management.

Resources

Learn

Discuss

  • Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=1028679
ArticleTitle=Learn Linux, 101: File and directory management
publish-date=03212016