UNIX® and weblogs, or blogs, have a lot in common. Besides being the native environment of most Web servers and the preferred environment for many Web developers, UNIX can be an ideal environment to blog with because of its Web and text-processing power. Take advantage of the command-line tools and features inherent to UNIX to make you a better blogger. Here are a few tips to help you do just that.
Serve fresh content constantly
The cardinal rule of blogging is to do it as much as you can. The general idea is that your blog should more resemble a scrolling ticker tape, or even the motions of television than a fragment of some etching pulled up in an archaeological dig. It should be always growing, and readers should get that sense of fresh motion when they visit. When it comes to the medium of Web sites, as much as visitors are actively reading them, they're also watching them -- following the links, reloading, and returning. To be successful at such sites, you must accommodate this.
While you don't need to install any special software to do so, this is the quickest and most important way to improving your weblog: You must constantly add new content! Even if you start your blog today, you'll have more people reading it by the end of the week if you're updating it a dozen times a day than if you've had it for a year and updated it only when the mood strikes.
This tip relates to all the others that follow in this article, because they'll show you how your UNIX system helps serve that fresh-blogged content quicker and better than before. You must know which of your content is the most popular, know who's reading it and where they're coming from, make your text load more quickly and better, and automate your blog updates. Here, you'll get a quick look at some UNIX-based content management solutions that might be better than what you've been using to produce your blog.
Your logs are your lifeblood. They tell you who's looking and where, how many, and how often. If you actively publish a weblog, you should look at your logs no less than once a day. You've got the power to see who's reading your publication, exactly what they're reading, and when they're doing it. So, why ignore it?
You can use command-line tools to extract meaningful data from your logs, but special UNIX tools exist to automatically analyze logs in the most popular formats, including those written by the Apache Web server. One such tool is the popular open source analog command.
Use analog to check your links and see what people are following. First, get a general report showing statistics -- how many unique requests are being made, whether there are any failed requests, how many distinct hosts are being served, and so on:
$ analog -A www.20060901 | lynx -stdin |
This command produces code such as that shown in Listing 1.
Listing 1. Sample output of the analog tool
Web Server Statistics for BigBlog
Program started at Mon-25-Sep-2006 14:46.
Analyzed requests from Fri-01-Sep-2006 00:01 to Fri-01-Sep-2006 23:59 (1.00 days).
____________________________________________________________________________
General Summary
(Go To: Top: General Summary)
This report contains overall statistics.
Successful requests: 3,400
Average successful requests per day: 3,403
Successful requests for pages: 2,015
Average successful requests for pages per day: 2,016
Failed requests: 3
Redirected requests: 963
Distinct files requested: 101
Distinct hosts served: 950
Data transferred: 65.338 megabytes
Average data transferred per day: 65.429 megabytes
____________________________________________________________________________
This analysis was produced by analog 6.0.
Running time: Less than 1 second.
(Go To: Top: General Summary)
|
Pay particular attention to the Search Word Report, which displays the most popular query words and the number of times they were requested, and the Directory Report, which shows you the most popular directories on your site. (It's always good to see which archived blog entries are currently of interest to your readers.) Finally, the Request Report shows the most requested files on the site. Your blog logo and any graphics that appear frequently throughout the site are bound to be near the top but, by looking at the actual content files (such as .html files), you can get a good idea of which pages or archived blog entries are most popular with your readers.
Daily and period spikes can occur, and you should react to them. However, it's also wise to look out over the long-tail trends. If you keep your daily logs in an archive directory, that's easy to do. Simply concatenate them and send them all to analog to process at once. Do this weekly, monthly, and even annually to track trends. Use zcat (which is named gzcat on some systems) to both decompress and concatenate any compressed logs. For example, to get complete reporting on all the logs for the month of September 2006, use the command:
$ zcat www.200609* | analog - | lynx -stdin |
It can be helpful to know where your readers are coming from -- what domains, IP addresses, and countries. To find all the hosts in your weblogs, you can use a few command-line tools to get a quick report of every hostname. If you have Apache-style logging, for example, the requesting IP addresses are the first field of every line:
$ for i in `cut -d " " -f1 www.200609* | sort -u`; { host $i; } |
If your log is compressed, first use zcat to pipe the decompressed text. Or, if your daily log is available in an access.log file, you can use the same principle to see whether your colleague at badblog.example.com has looked at your site yet:
$ for i in `cut -d " " -f1 access.log | sort -u | head`; \
> { host $i; } | fgrep badblog.example.com |
You can output the total number of unique domains that have visited your /blog directory based on the compressed logs you have in the web/logs/ directory:
$ zcat web/logs/* | fgrep "/blog" | cut -d " " -f1 | sort -u | wc -l |
Know where they're coming from
If a site is sending a lot of readers your way, you want to acknowledge it. That means that you should pay close attention to your referrers -- the URLs that contain links to your pages and appear in the headers. This data is saved in your logs, and you can use analog to extract it. The analog tool lists the referrers in the Referrer Report, as shown in Listing 2. Use the +f flag to turn this report on.
Listing 2. Sample from a Referrer Report page
Web Server Statistics for BigBlog
Referrer Report
(Go To: Top: General Summary: Monthly Report: Daily Summary: Hourly Summary: Domain
Report: Organization Report: Referrer Report: Search Word Report: Operating System
Report: Status Code Report: File Size Report: File Type Report: Directory Report:
Request Report)
This report lists the referrers (where people followed links from, or pages which
included this site's images).
Listing referring URLs with at least 20 requests, sorted by the number of requests.
reqs: URL
----: ---
814: http://www-128.ibm.com/developerworks/
359: http://www.google.com/search
114: http://badblog.example.com/
102: http://badblog.example.com/2006/09/01/
81: http://www.google.co.uk/search
530: [not listed: 485 URLs]
________________________________________________________________________________
|
You can also bypass the use of reporting software and get the referrers right from the command line. In Apache-style logs, the referrers are enclosed in double quotation marks and come after the IP address, date and time, and actual request (also enclosed in quotation marks). Use awk to extract the referrers; with a double-quote character as a field separator, they'll be the fourth field of each line. Because Apache writes a hyphen as the referrer when a request has no referring URL, use grep with the -v option to omit those lines. As a final touch, sort it by popularity of unique referrers:
$ awk ' BEGIN { FS="\""}; {print $4}' log.daily|grep -v "^-$"|sort|uniq -c|sort -r |
The HEIGHT and WIDTH attributes of the Hypertext Markup Language (HTML) <img> tag are important. These parameters specify the dimensions of the given image. When present, most browsers automatically make room for the image in the window where the page is rendering before any of the image is loaded. Without these tags, the image must be completely downloaded before the text around the image is displayed.
So, when you're putting images in your blogs, it's to your advantage to include these parameters in their <img> tags, especially when you begin to have a lot of images on a single page, as it improves the loading of your blog page dramatically. Visitors will be able to begin reading as soon as the page starts to load without waiting for the entire page and all its images to go over the wire.
But having to determine the exact HEIGHT and WIDTH values each time you use an image and then put them in the <img> tag itself is a horrible bother. Fortunately, a tool exists to automate the entire task for you. The imgsizer utility (see Resources) reads any .html files you give it, checks all the source images referenced in those files, determines their heights and widths, and writes the proper values in all the <img> tags contained in the given files:
$ imgsizer index.html |
It's as easy as that -- you don't have to load any of the images or do anything else to them. After imgsizer has added these tags, you'll be surprised at how much more quickly the page loads. Few bloggers use this simple technique, but it's one that readers will appreciate.
Rare is the blogger who produces a blog right on the live page itself. Most work on a local copy where new entries are first roughed out and polished. Then, when ready to go live with the new index.html file, the blogger uploads this file to the server hosting the actual site.
The process can take 30 seconds to a minute of constrained attention, as the blogger opens a File Transfer Protocol (FTP) connection, types the password, changes to the local weblog root directory, changes to the server root directory, uploads the file, and logs out (see Listing 3 for an example).
As you can imagine, this process is prone to user error. If you're aiming to be a big-shot, A-list blogger with a good 10 updates per day, this upload process takes a full five minutes out of your day -- or, well over 30 hours per year! That's a lot of time that could be better spent building your information technology (IT) repertoire with developerWorks articles.
Listing 3. Manual update of a weblog root page
develbox$ ftp bigblog.example.com Connected to bigblog.example.com. 220 bigblog.example.com NcFTPd Server (licensed copy) ready. Name (bigblog.example.com:joe): joe_blogger 331 User joe_blogger okay, need password. Password: secret 230 You are user #1 of 2 simultaneous users allowed. 230 Logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp> lcd ~/blog Local directory now /home/joe/blog ftp> cd public_html 250 "/usr/www/users/joe_blogger" is new cwd. ftp> put index.html local: index.html remote: index.html 200 PORT command successful. 150 Opening BINARY mode data connection. 226 Transfer completed. ftp> bye 221 Goodbye. develbox$ |
A better way to do this is to use the Expect language, which is designed for scripting interactive sessions (see Resources). For bloggers who manually update their sites over FTP, it's a natural way to make an automated update script. Listing 4 shows an example that automates the session shown in Listing 3.
Listing 4. Expect program to automate weblog updates
#!/usr/bin/expect
# update a weblog index page
# puts ~/blog/index.html in remote ~/public_html/
exp_version -exit 5.0
if $argc!=0 {
send_user "usage: bloggit\n"
exit
}
set timeout 60
log_user 0
spawn ftp bigblog.example.com
expect "Name*:"
send "joe_blogger\r"
expect "Password:"
send "secret\r"
expect "ftp>"
send "lcd ~/blog/\r"
expect "ftp>"
send "cd public_html/\r"
expect "ftp>"
send "put index.html\r"
expect "226*ftp>"
send "bye\r"
send_user "blogged it.\n"
close
|
Now, when you're ready to go live with an update, it takes a lot less time to do:
$ bloggit blogged it. $ |
Use a content management system
When it comes to development and putting out products, UNIX people have a tendency toward rolling their own. But equally so, they are lazy and don't care to reinvent the wheel if a usable solution already exists; there are too many new ideas to develop.
In the early years of blogging, the most successful weblogs were hand-coded HTML -- that's much more uncommon today. Now, most blogs are database-driven, hand-configured sites that are powered by a CMS.
If there's a weblog application, it's the CMS, and it can give you a considerable number of essential blogging features that are not trivial to program -- category sorting; archiving by date, category, and media type; ease of collaborative accounts; layout templates and formatting; standard or rolling images or themes; and content availability in various formats and channels (such as RSS).
There are too many CMSs to even attempt a complete listing of them -- hundreds are currently in use, and some are described in detail elsewhere in developerWorks (see Resources). But it's worthwhile to list some of the better and more popular open source CMSs that work well on UNIX and can be configured to develop and run a weblog. These are listed in Table 1, but there are many others, so a solution is undoubtedly out there for your particular needs.
Table 1. Popular open source CMSs for UNIX
| CMS | Description |
|---|---|
Blosxom | Blosxom is a Perl-based weblog publishing system featuring a plug-in architecture and virtual directories. |
Drupal | Drupal is a modular CMS for building weblogs with comments and trackbacks. |
Textpattern | Textpattern is a document management system with attention to fine Web typography; uses PHP V4.3 or later and MySQL V3.23 or later. |
WordPress | It is one of the more popular open source CMS packages for publishing with UNIX. |
The UNIX environment is really a natural for blogging. From the Web-friendly infrastructure to the powerful command-line tools, there's plenty in there to help you improve your blogging lot in life. This article showed some of the ways that you can use UNIX to make your blogging go better and faster.
Learn
- "Getting started with an open source CMS" (developerWorks, August 2005): This is a multi-part tutorial series that shows you how to implement an open source CMS using Apache Tomcat and Jakarta Slide.
- "Domino
blogging: Domino Blog" (developerWorks, September 2004): Learn how to develop a blog with Domino Blog.
- "Blogging in IBM Workplace Collaboration Services" (developerWorks, January 2006): Discover how to blog with IBM Workplace Collaboration Services.
- "Planet Blog" (developerWorks, January 2004): In this article, Edd Dumbill shows how to aggregate weblog RSS feeds.
- Technology bookstore: Browse this store for additional blogging books and other technical topics.
- AIX and UNIX: Visit the developerWorks AIX and UNIX zone to expand your UNIX skills.
- New to AIX and UNIX: Visit the New to AIX and UNIX page to learn more about AIX and UNIX.
- developerWorks
technical events and webcasts: Stay current with developerWorks technical events and webcasts.
- Podcasts: Tune in and catch up with IBM technical experts.
Get products and technologies
- Analog: Download a free copy of the Web log analysis tool.
- imgsizer tool: Download a free copy of this tool. The
identifytool is required and is available as part of the open source ImageMagick suite. - Download a free copy of the open source CMSs mentioned in this article, including:
- Expect language: Download a free copy of the Expect language from its main distribution site.
- IBM trial software: Build your next development project with software for download directly from developerWorks.
Discuss
-
Participate in the AIX and UNIX forums:
- AIX 5L -- technical forum
- AIX for Developers Forum
- Cluster Systems Management
- IBM Support Assistant
- Performance Tools -- technical
- Virtualization -- technical
- More AIX and UNIX forums
- Participate in the developerWorks
blogs and get involved in the developerWorks community.
Michael Stutz is author of The Linux Cookbook, which he also designed and typeset using only open source software. His research interests include digital publishing and the future of the book. He has used various UNIX operating systems for 20 years. You can reach him at stutz@dsl.org.



