 | Level: Intermediate Martin Streicher (martin.streicher@gmail.com), Chief Technology Officer, McClatchy Interactive
05 Sep 2006 Discover three essential UNIX® utilities that deliver the entire Internet to your command line.
The UNIX® command line is a WYTIWYG interface -- that is, What you type is what
you get. UNIX provides hundreds, if not thousands, of commands with which you can
manipulate a large variety of resources available in the kernel and user space.
Need to monitor CPU usage? Try top or ps.
Need to remove all files ending in .bak? Try rm *.bak. Want
help with a new command? Run man.
But what do you do when the resources you need reside on a remote system on your wide-area
network (WAN) and on the global Internet? To quote The Hitchhiker's Guide to the Galaxy,
"don't panic!". The UNIX command line readily downloads and uploads files, connects to
remote computers, and interrogates the state of far-flung servers and networks. Grab
your towel: It's time for a trip to extra-solar systems.
Work locally, transfer globally
In Part
1 and Part
2 of this series, you learned how much you can accomplish with the UNIX command
line. With just a few keystrokes -- including a pipe (|) or redirection -- you can
create an impromptu data processing machine more powerful than the sum of its
parts.
While some of the resources you use daily are likely to be local -- that is,
resident on your own workstation -- a significant and growing number of
assets -- files, e-mail messages, and tools -- are likely to be stored at a
distance (say, on a machine connected to your WAN or to the Internet). Web
browsers provide almost universal access to such resources, with one caveat:
Point-and-click can quickly become tedious, even onerous, especially if you
must retrieve more than one handful of items. Moreover, if you want to
script -- essentially, capture and replay -- repetitive or error-prone tasks,
a windowing browser is a difficult ally.
Much like ls, cp,
mail, uptime,
du manage, and query local resources,
UNIX has a suite of command-line tools to access remote resources, too. This
article introduces you to a few of those tools, including a useful technique that
both facilitates access to remote systems and protects your authentication
credentials. Specifically, you'll learn wget,
curl, and Secure shell (ssh).
The wget and curl tools
transfer files; using ssh, you can securely
log in to remote systems and transfer files quickly and easily.
The trouble with Telnet (and others)
If any of your systems run rsh (or its variants -- rcp,
rexec, rlogin, or
Irdist) or telnet, disable
and remove them and the accompanying daemons immediately. In addition, if you
don't permit anonymous File Transfer Protocol (FTP), disable the FTP software, too.
Although rsh and telnet are
longtime UNIX stalwarts, attackers can leverage either utility to (easily)
compromise your system. You or your system administrator should halt and remove
this software wherever it's found running and replace the features of those
packages with ssh.
For privileged FTP access, use sftp. Replace
rdist with the more advanced rsync.
Or, if you must provide anonymous FTP (or downloads over HTTP), be sure to use
firewall hardware and software to isolate all publicly accessible computers from
sensitive internal servers.
But first, let's discuss the pesky problems that passwords present.
"You don't need no stinkin' passwords!"
Access to most computers and services is typically protected. In some cases,
authenticating your identity (and hence your privilege to access the system) might
require a complex challenge-response exchange, a Secure Sockets Layer (SSL)
certificate, or even a biometric scan. Typically, however, a password suffices
to gain access. Much like your personal identification number (PIN), your password
is your secret; if you choose your password well, it's likely difficult for others
to guess it at random. The combination of your name and a strong password provide
sufficient corroboration.
Of course, strong passwords can be difficult to remember, and the strain only worsens
as you collect and memorize yet another eight-character key (of, say, numbers,
punctuation, and mixed case). Typing a password over and over again can be downright
annoying -- worse, it presents a significant obstacle to hands-off automation.
Recognizing this encumbrance, many command-line utilities allow you to provide your
username and password as command-line arguments. For example, you can log in to
an FTP site without intervention by using a command such as:
ftp ftp://joe:passwd@www.example.com
|
However, using such a facility can reveal your credentials to other users sharing
your computer. (Try ps -Aeww, for example, to see the
complete command line and environment of every process on the system.)
To provide the same convenience as command-line options without the inherent risks,
many programs can read your credentials from a special file called a .netrc
(pronounced net-r-c) file, which typically resides in ~/.netrc. Your .netrc file
must be owner read-write mode (mode 0600 or
-rw-------) only, and each entry in the file must adhere
to this simple syntax:
machine ftp.example.com login zaphod password I()Trillian!
machine www.magazine.com login abner password MmG8y*tr
default login anonymous password zaphod@heartofgold.com
|
The first two lines provide the machine keyword and the
computer's domain name, the login keyword and your
login name on the computer, and the password
keyword followed by the password associated with your login. The credentials on the last
line provide a default for any system not specifically named. The
default line must be the last line in your .netrc file.
(For the full extent of .netrc file configuration options, type
man 5 netrc to see the .netrc man page.)
Obviously, if any file has any identity data in it, secure it with user read-write only
mode (mode 600) or user read-only mode (mode
400) to prevent you from accidentally overwriting or
removing it. You might also want to protect your home directory with mode
700.
Now, whenever you launch a .netrc-enabled application, including those applications
I discuss next, the appropriate login name and password are passed automatically to
the desired service, with nary a single peck at the keyboard. You can typically
disable this auto-login feature with the -n option.
The process of transference
Along with HTTP and HTTP over SSL (HTTPS) for Web pages, FTP is one of the most
often used Internet application protocols. Through FTP, a client can connect to a
server, acquire a list of directories and files, and either download a file (that
is, request a file from the server) or upload a file (that is, send the file a
server to persist). URLs of the forms ftp://ftp.example.com/path/to/anotherfile.zip
and ftp://user:password@ftp.example.com/path/to/file.zip imply, using the
FTP protocol, connect to ftp.example.com and download the file /path/to/anotherfile.zip.
The latter URL simply adds credentials for login.
On most desktop computers, such URLs launch the browser or the default FTP
application to download the specified file. However, you can use the same URLs
with the wget command-line utility -- a robust utility
for downloading files over HTTP, HTTPS, and FTP. It supports .netrc files and is
entirely non-interactive, making it ideal for automation. If your system does not
have wget, you can download its source code from the
GNU Software Foundation. It builds
readily on all UNIX variants with just a few simple commands, and you can place
the utility in your personal bin directory or in a central directory.
Assuming that you have a .netrc file in place, let's look at some examples of what
wget can do. (In the examples below, the line number
is provided for reference; you don't need to type the numbers.) Listing
1 shows how to use wget to download files without
leaving the comfort of the command line.
Listing 1. Using wget to download files at the command line
1 $ wget http://ftp.gnu.org/pub/gnu/wget/wget-1.10.2.tar.gz
--16:02:29-- http://ftp.gnu.org/pub/gnu/wget/wget-1.10.2.tar.gz
=> `wget-1.10.2.tar.gz'
Resolving ftp.gnu.org... 199.232.41.7
Connecting to ftp.gnu.org[199.232.41.7]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,213,056 [application/x-tar]
100%[=====================>] 1,213,056 531.22K/s
16:02:37 (529.57 KB/s) - `wget-1.10.2.tar.gz' saved [1213056/1213056]
2 $ wget -q ftp://mirror.linux.duke.edu/pub/centos/4.3/os\
/i386/RELEASE-NOTES-en.html
3 $ cat url_list.txt
http://www.wikipedia.com
http://valdez.barebones.com/pub/freeware/TextWrangler_2.1.3.dmg
4 $ wget -i -nv url_list.txt
16:06:00 URL:http://www.wikipedia.org/ [33606] -> "index.html" [1]
16:06:41 URL:http://valdez.barebones.com/pub/freeware/
TextWrangler_2.1.3.dmg [9488296/9488296] ->
"TextWrangler_2.1.3.dmg" [1]
FINISHED --16:06:41--
Downloaded: 9,521,902 bytes in 2 files
5 $ ls
RELEASE-NOTES-en.html index.html wget-1.10.2.tar.gz
TextWrangler_2.1.3.dmg url_list.txt
|
Command 1 downloads the most recent wget source code
from its project home page over HTTP. By default, wget
apprises you of progress. You can disable all messages with the
-q (for quiet mode) option. Command 2 retrieves a
version of the CentOS release notes over FTP, albeit very quietly.
 |
Keeping URLs intact
Here's a tip: Many HTTP URLs contain characters that are also special to your
shell. For example, many URLs contain a question mark (?), which separates the
host name and path from a list of arguments. However, the shell interprets the
question mark as a wildcard.
To bypass interpretation by your shell, simply put the URL in single quotation
marks. To avoid strange and long filenames, use wget -o
to name the output file. Here's an example:
$ wget -o sharkey \
'http://www.example.com/\
download.cgi?proj=science&file=sharkey'
|
|
|
If you have a long list of URLs to download, you need not place each one on the
command line. Instead, you can create (or rather, generate) a list of URLs to
download. Command 3 shows url_list.txt, a simple text catalog containing two URLs;
Command 4 downloads the two URLs. Use the -i option
when you provide a list. The -nv option -- an acronym
for not verbose -- provides more concise messages.
Unless you provide a file name for the download file (using the -o
option), wget creates a new, local file with the same
name as the remote file, omitting the entire leading URL. Command 5 shows the
four files downloaded in commands 1 through 3.
The wget utility has many options and features. It can
spider an FTP or Web site and download a complete hierarchy of files. You can
also set a quota for automatic downloads, provide cookies, and continue a previous
download that was interrupted. Read the wget man page
to learn about the tool's many tricks.
Going up
The wget utility is invaluable for hands-off downloads,
but it can't upload files. Nor can it interoperate with secure FTP,
telnet, and a host of other (older and less-used)
Internet protocols. For those kinds of transfers, you must turn to the veritable
Swiss Army knife of networking: curl.
The curl command-line utility can get and put data,
so it's ideal for transferring local files to remote servers. Better yet, the
underpinning of curl -- the libcurl library -- has a
rich application programming interface (API) that allows you to interrogate all the
features of curl directly into your own applications.
The C, C++, PHP, and Perl
programming languages are just four of the many languages that can leverage
libcurl. If your system lacks curl and libcurl, you
can download the source code from the libcurl
home page.
Because curl can copy local files to remote servers,
it's ideal for small backups. For example, Listing 2 shows
a shell script that copies a directory full of database dumps to a remote FTP
server for safekeeping.
Listing 2. Using curl to store database dumps remotely
foreach db (mydns mysql cms tv radio)
/usr/bin/mysqldump --ppassword --add-drop-table -Q --complete-insert $db > $db.sql
end
find dbs -mtime -1 -type f -name '*.sql' -print | foreach file (`xargs`)
curl -n -T $file ftp://ftp1.archive.example.com
end
|
The curl -n command forces curl
to read your .netrc file. The -T option tells curl
to upload the named file(s) to the given URL. If you omit the target file name,
curl simply reuses the name of the file being uploaded.
As you might guess, curl has even more options than
wget. It's worthwhile to read the curl
man page and keep it in mind. The curl project also
maintains a list of uses,
including instructions on how to use the HTTP POST and
PUT commands, how to provide login credentials, how to
use SSL certificates, and how to debug your curl
requests. A quick tip: Try curl -v --trace-ascii ... to
generate tracing information.
Six addresses of separation
Modern computing depends largely on countless, spindly interconnections among
machines of all shapes, sizes, and services. Indeed, even in a small computing
environment, one computer might be dedicated to e-mail, another to serving Web
pages, and others to performing yet more specialized tasks. In this environment --
typically connected by a local area network (LAN), WAN, or Virtual Private Network
(VPN) -- it's quite common and necessary to log in to several computers per day.
Systems administrators bounce from one computer to another each and every hour,
but it's common for developers and other users to log in to require remote access
for a critical application.
The X Window System and current desktop software make remote access fairly
transparent: A window is a window, and the underlying application could be running
on any computer. But again, the command line has a special place, even in a
mouse-centric world. For example, how can you run the same command on multiple
computers painlessly? Or, more simply, how do you launch an xterm
window on the remote system?
Remote system access is the purview of ssh and its
derivatives, scp and sftp.
ssh is the secure version of rsh,
while scp and sftp are secure
replacements for rcp and FTP, respectively. Why is it
secure? ssh and its variants provide stronger
authentication mechanisms and encrypt all traffic using your choice of several
ciphers. Even if someone sniffed your network, ssh
traffic would look like so much gobblygook.
The simplest use of ssh is ssh hostname
.
This command connects to hostname and presents you with a login and
password prompt. Provide the right credentials, and you're in:
(www.joe.com) $ ssh web.example.com
Login: arthur
Password: ******
( web.example.com) $
|
If you simply want to run one command on a remote system, you don't need to log in.
Just provide the command as an argument to ssh. For
example, the command shown in Listing 3 runs
hostname -a -v on the remote computer.
Listing 3. Use ssh to run a command on a remote system
(www.joe.com) $ ssh db.linux-mag.com hostname -a -v
Login: vogon
Password: ******
db
gethostname()=`db.linux-mag.com'
Resolving `db.linux-mag.com' ...
Result: h_name=`db.linux-mag.com'
Result: h_aliases=`db'
Result: h_addr_list=`64.34.170.230'
|
ssh opened a connection to db.linux-mag.com and passed the
hostname -a -v arguments to the remote computer, which
ran the command and returned the output to the local computer.
ssh also provides a convenient way to copy files and
entire directories from one computer to another. scp
is almost as easy to use as cp. Here's an example:
(www.joe.com) $ scp -p -r ~/myproject web.example.com:
|
This command copies the ~/myproject directory to web.example.com. If you omit a
destination path name, files are copied to your home directory. The
-p option preserves the date and time stamps on all
the files, while -r enables recursive mode, where
scp descends and copies all subdirectories, as well.
By the way, the previous scp command is the equivalent
of running:
(www.joe.com) $ tar czf - ~/myproject | ssh www.example.com tar xvzf -
Login: deepthought
Password: ******
|
Yes, you can pipe the output of a local command to a remote command (or vice versa).
Chances are, you're already tired of all those password prompts. Again, the repeated
prompts simply slow down work and prevent automation. You might also be tired of
typing long host names over and over again. Luckily, ssh
supports public or private key authentication and system aliases.
Let's set up a public or private key pair using the DSA encryption scheme. To do so,
you must generate the key pair, copy the public key to the remote system, add it
to the list of known keys, and verify that everything works, as shown in
Listing 4.
Listing 4. Creating and installing a public or private key
1 $ cd ~
2 $ mkdir .ssh
3 $ chmod 700 .ssh
4 $ cd .ssh
5 $ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/mstreicher/.ssh/id_dsa): ./id_dsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in ./id_dsa.
Your public key has been saved in ./id_dsa.pub.
The key fingerprint is:
40:6c:26:e7:53:df:d1:7b:c4:79:c5:a8:cd:6b:fe:8e mstreicher@db.linux-mag.com
6 $ ls
id_dsa id_dsa.pub
7 $ chmod 600 *
8 $ scp id_dsa.pub www.example.com:
Login: marvin
Password: ******
id_dsa 100% 668 0.7KB/s 00:00
9 $ ssh www.example.com
Login: marvin
Password: ******
A $ mkdir .ssh
B $ chmod 700 .ssh
C $ cd .ssh
D $ cat ../id_dsa.pub >> authorized_keys
E $ rm ../id_dsa.pub
F $ chmod 600 *
G $ logout
10 $ ssh www.example.com
a $ hostname
www.example.com
b $ logout
|
Commands 1 through 3 create a private local directory named .ssh in your home directory.
This directory must be mode 700, or
ssh won't use public or private key authentication. (You
can see the same sequence of commands run on the remote computer in steps A through C.)
Command 5 creates the key pair using DSA. For now, leave the two passphrases blank.
(They provide an extra level of security but add another authentication step.)
ssh-keygen generates two files: id_dsa (the private key)
and id_dsa.pub (the public key). Step 6 shows the files, while Step 7 protects both
keys. Your keys must be mode 0600 or mode
0400.
 |
Passing wildcards to remote shells
Let's say you want to list all the C source files in your
remote home directory. Locally, you'd type something like ls -l *.c,
so you try that through ssh:
$ ssh www.example.com ls -l *.c
|
Two things might happen: If you don't have any C files in
the current working directory on your local computer, the shell complains,
zsh: no matches found: *.c, or, if you have C
files in the current working directory that aren't in your home directory on the
remote computer, the shell on the remote computer might complain,
ls: whosit.c: No such file or directory. Scratching
your head?
The problem is that the wildcard * is being expanded
by the local shell first, before the ssh command is
sent. What you intended was for * to be expanded on the remote system.
To do that, you must prevent the local shell from interpreting the wildcard (again).
You can place the * in single quotation marks or use a backslash
(\) to escape the asterisk. Then, the asterisk is
passed as a regular character to the remote shell, where it is interpreted in the
context of the remote computer.
Here are those two approaches -- use the one that suits each situation:
$ ssh www.example.com ls -l \*.c
$ ssh www.example.com ls -l '*'.c
|
|
|
Step 8 copies the public key to the remote computer. For now, you must type your
password, but this is the last time. Commands A through C create the private .ssh directory,
and Step D adds the public key to the list of authorized keys. The name of the
file -- authorized_keys -- is intentional. Do not name the file differently.
Step E removes the copy of the key; Step F protects the files, as in Step 7.
When you log out and log back in, a password is no longer required. ssh
(and scp and sftp) tests
your private key against the remote public key. If a match is found, your
credentials are sound and you can log in without further identification.
Some systems will always require a password; other systems might prefer RSA over DSA.
Check with your systems administrator to find out how to log in to a specific
computer. Systems administrators can set some global settings, too, to make a
system more accessible.
Online, everywhere, all the time
Nowadays, the Internet connects people far and wide in ways not experienced before
in human history. Whether sharing the details of your day in your blog or
downloading source code for your next project, wires have replaced tires as the
way to get around.
Web surfing is still a popular sport, but to make time for real surfing, developers
have created ways to automate file transfers of all kinds. Using scripts and a few
UNIX utilities, you can keep your external Web and download sites current. You can
download and upload files with just a few keystrokes, making the process quick and
easy. And if you create a .netrc file, you can hasten the effort even more. No more
passwords.
Now that your mind is clear, put the top down and take a road trip on the information
superhighway. See you at the Restaurant at the End of the Fiber. Last one there picks
up the tab!
Resources Learn
-
Speaking UNIX: Check out other parts in this series.
-
curl
: Discover clever ways to use Curl to download a variety of resources, using many of the most common Internet protocols.
-
AIX and UNIX: Visit the developerWorks AIX and UNIX zone to expand your UNIX skills.
-
New to AIX and UNIX: Visit the New to AIX and UNIX page to learn more about AIX and UNIX.
-
developerWorks
technical events and webcasts: Stay current with developerWorks technical events and webcasts.
-
AIX 5L Wiki: A collaborative environment for technical information related to AIX.
-
Podcasts: Tune in and catch up with IBM technical experts.
Get products and technologies
Discuss
-
Participate in the AIX and UNIX forums:
- Participate in the developerWorks
blogs and get involved in the developerWorks community.
About the author  | 
|  | Martin Streicher is the Chief Technology Officer of McClatchy Interactive and the Editor-in-Chief of
Linux Magazine
. Martin holds a Masters of Science degree in computer science from Purdue University and has been programming UNIX-like systems since 1986. You can reach Martin at martin.streicher@gmail.com. |
Rate this page
|  |