UNIX tips: Become a better blogger with UNIX

Use the benefits of UNIX to your blogging advantage

Did you know that blogging and UNIX® go hand in hand? The native Web and text-processing tools of UNIX enable you to create your blogs quickly and easily. Discover some handy tips for improving your UNIX blogging skills.

Michael Stutz, Author, Freelance Developer

Michael Stutz is author of The Linux Cookbook, which he also designed and typeset using only open source software. His research interests include digital publishing and the future of the book. He has used various UNIX operating systems for 20 years. You can reach him at stutz@dsl.org.



10 October 2006

Also available in Chinese Russian

UNIX® and weblogs, or blogs, have a lot in common. Besides being the native environment of most Web servers and the preferred environment for many Web developers, UNIX can be an ideal environment to blog with because of its Web and text-processing power. Take advantage of the command-line tools and features inherent to UNIX to make you a better blogger. Here are a few tips to help you do just that.

Serve fresh content constantly

The cardinal rule of blogging is to do it as much as you can. The general idea is that your blog should more resemble a scrolling ticker tape, or even the motions of television than a fragment of some etching pulled up in an archaeological dig. It should be always growing, and readers should get that sense of fresh motion when they visit. When it comes to the medium of Web sites, as much as visitors are actively reading them, they're also watching them -- following the links, reloading, and returning. To be successful at such sites, you must accommodate this.

While you don't need to install any special software to do so, this is the quickest and most important way to improving your weblog: You must constantly add new content! Even if you start your blog today, you'll have more people reading it by the end of the week if you're updating it a dozen times a day than if you've had it for a year and updated it only when the mood strikes.

This tip relates to all the others that follow in this article, because they'll show you how your UNIX system helps serve that fresh-blogged content quicker and better than before. You must know which of your content is the most popular, know who's reading it and where they're coming from, make your text load more quickly and better, and automate your blog updates. Here, you'll get a quick look at some UNIX-based content management solutions that might be better than what you've been using to produce your blog.

Look at your logs

Your logs are your lifeblood. They tell you who's looking and where, how many, and how often. If you actively publish a weblog, you should look at your logs no less than once a day. You've got the power to see who's reading your publication, exactly what they're reading, and when they're doing it. So, why ignore it?

You can use command-line tools to extract meaningful data from your logs, but special UNIX tools exist to automatically analyze logs in the most popular formats, including those written by the Apache Web server. One such tool is the popular open source analog command.

React to what's popular

Use analog to check your links and see what people are following. First, get a general report showing statistics -- how many unique requests are being made, whether there are any failed requests, how many distinct hosts are being served, and so on:

$ analog -A www.20060901 | lynx -stdin

This command produces code such as that shown in Listing 1.

Listing 1. Sample output of the analog tool
                  Web Server Statistics for BigBlog
                                                                    
  Program started at Mon-25-Sep-2006 14:46.                        
  Analyzed requests from Fri-01-Sep-2006 00:01 to Fri-01-Sep-2006 23:59 (1.00 days).
____________________________________________________________________________

General Summary

  (Go To: Top: General Summary)

  This report contains overall statistics.
                                           
  Successful requests: 3,400              
  Average successful requests per day: 3,403
  Successful requests for pages: 2,015
  Average successful requests for pages per day: 2,016
  Failed requests: 3
  Redirected requests: 963
  Distinct files requested: 101
  Distinct hosts served: 950
  Data transferred: 65.338 megabytes
  Average data transferred per day: 65.429 megabytes
____________________________________________________________________________

   This analysis was produced by analog 6.0.
   Running time: Less than 1 second.         
                                     
   (Go To: Top: General Summary)

Pay particular attention to the Search Word Report, which displays the most popular query words and the number of times they were requested, and the Directory Report, which shows you the most popular directories on your site. (It's always good to see which archived blog entries are currently of interest to your readers.) Finally, the Request Report shows the most requested files on the site. Your blog logo and any graphics that appear frequently throughout the site are bound to be near the top but, by looking at the actual content files (such as .html files), you can get a good idea of which pages or archived blog entries are most popular with your readers.

Daily and period spikes can occur, and you should react to them. However, it's also wise to look out over the long-tail trends. If you keep your daily logs in an archive directory, that's easy to do. Simply concatenate them and send them all to analog to process at once. Do this weekly, monthly, and even annually to track trends. Use zcat (which is named gzcat on some systems) to both decompress and concatenate any compressed logs. For example, to get complete reporting on all the logs for the month of September 2006, use the command:

$ zcat www.200609* | analog - | lynx -stdin

Know who your readers are

It can be helpful to know where your readers are coming from -- what domains, IP addresses, and countries. To find all the hosts in your weblogs, you can use a few command-line tools to get a quick report of every hostname. If you have Apache-style logging, for example, the requesting IP addresses are the first field of every line:

$ for i in `cut -d " " -f1 www.200609* | sort -u`; { host $i; }

If your log is compressed, first use zcat to pipe the decompressed text. Or, if your daily log is available in an access.log file, you can use the same principle to see whether your colleague at badblog.example.com has looked at your site yet:

$ for i in `cut -d " " -f1 access.log | sort -u | head`; \
> { host $i; } | fgrep badblog.example.com

You can output the total number of unique domains that have visited your /blog directory based on the compressed logs you have in the web/logs/ directory:

$ zcat web/logs/* | fgrep "/blog" | cut -d " " -f1 | sort -u | wc -l

Know where they're coming from

If a site is sending a lot of readers your way, you want to acknowledge it. That means that you should pay close attention to your referrers -- the URLs that contain links to your pages and appear in the headers. This data is saved in your logs, and you can use analog to extract it. The analog tool lists the referrers in the Referrer Report, as shown in Listing 2. Use the +f flag to turn this report on.

Listing 2. Sample from a Referrer Report page
                             Web Server Statistics for BigBlog
Referrer Report

   (Go To: Top: General Summary: Monthly Report: Daily Summary: Hourly Summary: Domain
   Report: Organization Report: Referrer Report: Search Word Report: Operating System 
   Report: Status Code Report: File Size Report: File Type Report: Directory Report: 
   Request Report)
   
   This report lists the referrers (where people followed links from, or pages which
   included this site's images).                                                    
   
   Listing referring URLs with at least 20 requests, sorted by the number of requests.
reqs: URL
----: ---
 814: http://www-128.ibm.com/developerworks/
 359: http://www.google.com/search
 114: http://badblog.example.com/
 102: http://badblog.example.com/2006/09/01/
  81: http://www.google.co.uk/search             
 530: [not listed: 485 URLs]
     ________________________________________________________________________________

You can also bypass the use of reporting software and get the referrers right from the command line. In Apache-style logs, the referrers are enclosed in double quotation marks and come after the IP address, date and time, and actual request (also enclosed in quotation marks). Use awk to extract the referrers; with a double-quote character as a field separator, they'll be the fourth field of each line. Because Apache writes a hyphen as the referrer when a request has no referring URL, use grep with the -v option to omit those lines. As a final touch, sort it by popularity of unique referrers:

$ awk ' BEGIN { FS="\""}; {print $4}' log.daily|grep -v "^-$"|sort|uniq -c|sort -r

Presize your images

The HEIGHT and WIDTH attributes of the Hypertext Markup Language (HTML) <img> tag are important. These parameters specify the dimensions of the given image. When present, most browsers automatically make room for the image in the window where the page is rendering before any of the image is loaded. Without these tags, the image must be completely downloaded before the text around the image is displayed.

So, when you're putting images in your blogs, it's to your advantage to include these parameters in their <img> tags, especially when you begin to have a lot of images on a single page, as it improves the loading of your blog page dramatically. Visitors will be able to begin reading as soon as the page starts to load without waiting for the entire page and all its images to go over the wire.

But having to determine the exact HEIGHT and WIDTH values each time you use an image and then put them in the <img> tag itself is a horrible bother. Fortunately, a tool exists to automate the entire task for you. The imgsizer utility (see Resources) reads any .html files you give it, checks all the source images referenced in those files, determines their heights and widths, and writes the proper values in all the <img> tags contained in the given files:

$ imgsizer index.html

It's as easy as that -- you don't have to load any of the images or do anything else to them. After imgsizer has added these tags, you'll be surprised at how much more quickly the page loads. Few bloggers use this simple technique, but it's one that readers will appreciate.

Automate your updates

Rare is the blogger who produces a blog right on the live page itself. Most work on a local copy where new entries are first roughed out and polished. Then, when ready to go live with the new index.html file, the blogger uploads this file to the server hosting the actual site.

The process can take 30 seconds to a minute of constrained attention, as the blogger opens a File Transfer Protocol (FTP) connection, types the password, changes to the local weblog root directory, changes to the server root directory, uploads the file, and logs out (see Listing 3 for an example).

As you can imagine, this process is prone to user error. If you're aiming to be a big-shot, A-list blogger with a good 10 updates per day, this upload process takes a full five minutes out of your day -- or, well over 30 hours per year! That's a lot of time that could be better spent building your information technology (IT) repertoire with developerWorks articles.

Listing 3. Manual update of a weblog root page
develbox$ ftp bigblog.example.com
Connected to bigblog.example.com.
220 bigblog.example.com NcFTPd Server (licensed copy) ready.
Name (bigblog.example.com:joe): joe_blogger
331 User joe_blogger okay, need password.
Password: secret
230 You are user #1 of 2 simultaneous users allowed.
230 Logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> lcd ~/blog
Local directory now /home/joe/blog
ftp> cd public_html
250 "/usr/www/users/joe_blogger" is new cwd.
ftp> put index.html
local: index.html remote: index.html
200 PORT command successful.
150 Opening BINARY mode data connection.
226 Transfer completed.
ftp> bye
221 Goodbye.
develbox$

A better way to do this is to use the Expect language, which is designed for scripting interactive sessions (see Resources). For bloggers who manually update their sites over FTP, it's a natural way to make an automated update script. Listing 4 shows an example that automates the session shown in Listing 3.

Listing 4. Expect program to automate weblog updates
#!/usr/bin/expect
# update a weblog index page
# puts ~/blog/index.html in remote ~/public_html/

exp_version -exit 5.0

if $argc!=0 {
	send_user "usage: bloggit\n"
	exit
}

set timeout 60
log_user 0
spawn ftp bigblog.example.com
expect "Name*:"
send "joe_blogger\r"
expect "Password:"
send "secret\r"
expect "ftp>"
send "lcd ~/blog/\r"
expect "ftp>"
send "cd public_html/\r"
expect "ftp>"
send "put index.html\r"
expect "226*ftp>"
send "bye\r"
send_user "blogged it.\n"
close

Now, when you're ready to go live with an update, it takes a lot less time to do:

$ bloggit
blogged it.
$

Use a content management system

When it comes to development and putting out products, UNIX people have a tendency toward rolling their own. But equally so, they are lazy and don't care to reinvent the wheel if a usable solution already exists; there are too many new ideas to develop.

In the early years of blogging, the most successful weblogs were hand-coded HTML -- that's much more uncommon today. Now, most blogs are database-driven, hand-configured sites that are powered by a CMS.

If there's a weblog application, it's the CMS, and it can give you a considerable number of essential blogging features that are not trivial to program -- category sorting; archiving by date, category, and media type; ease of collaborative accounts; layout templates and formatting; standard or rolling images or themes; and content availability in various formats and channels (such as RSS).

There are too many CMSs to even attempt a complete listing of them -- hundreds are currently in use, and some are described in detail elsewhere in developerWorks (see Resources). But it's worthwhile to list some of the better and more popular open source CMSs that work well on UNIX and can be configured to develop and run a weblog. These are listed in Table 1, but there are many others, so a solution is undoubtedly out there for your particular needs.

Table 1. Popular open source CMSs for UNIX
CMSDescription
BlosxomBlosxom is a Perl-based weblog publishing system featuring a plug-in architecture and virtual directories.
DrupalDrupal is a modular CMS for building weblogs with comments and trackbacks.
TextpatternTextpattern is a document management system with attention to fine Web typography; uses PHP V4.3 or later and MySQL V3.23 or later.
WordPressIt is one of the more popular open source CMS packages for publishing with UNIX.

Summary

The UNIX environment is really a natural for blogging. From the Web-friendly infrastructure to the powerful command-line tools, there's plenty in there to help you improve your blogging lot in life. This article showed some of the ways that you can use UNIX to make your blogging go better and faster.

Resources

Learn

Get products and technologies

  • Analog: Download a free copy of the Web log analysis tool.
  • imgsizer tool: Download a free copy of this tool. The identify tool is required and is available as part of the open source ImageMagick suite.
  • Download a free copy of the open source CMSs mentioned in this article, including:
  • Expect language: Download a free copy of the Expect language from its main distribution site.
  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=166722
ArticleTitle=UNIX tips: Become a better blogger with UNIX
publish-date=10102006