Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

UNIX tips: Become a better blogger with UNIX

Use the benefits of UNIX to your blogging advantage

Michael Stutz, Author, Freelance Developer
Michael Stutz is author of The Linux Cookbook, which he also designed and typeset using only open source software. His research interests include digital publishing and the future of the book. He has used various UNIX operating systems for 20 years. You can reach him at stutz@dsl.org.

Summary:  Did you know that blogging and UNIX® go hand in hand? The native Web and text-processing tools of UNIX enable you to create your blogs quickly and easily. Discover some handy tips for improving your UNIX blogging skills.

View more content in this series

Date:  10 Oct 2006
Level:  Intermediate
Also available in:   Chinese  Korean  Russian

Activity:  11792 views
Comments:  

UNIX® and weblogs, or blogs, have a lot in common. Besides being the native environment of most Web servers and the preferred environment for many Web developers, UNIX can be an ideal environment to blog with because of its Web and text-processing power. Take advantage of the command-line tools and features inherent to UNIX to make you a better blogger. Here are a few tips to help you do just that.

Serve fresh content constantly

The cardinal rule of blogging is to do it as much as you can. The general idea is that your blog should more resemble a scrolling ticker tape, or even the motions of television than a fragment of some etching pulled up in an archaeological dig. It should be always growing, and readers should get that sense of fresh motion when they visit. When it comes to the medium of Web sites, as much as visitors are actively reading them, they're also watching them -- following the links, reloading, and returning. To be successful at such sites, you must accommodate this.

While you don't need to install any special software to do so, this is the quickest and most important way to improving your weblog: You must constantly add new content! Even if you start your blog today, you'll have more people reading it by the end of the week if you're updating it a dozen times a day than if you've had it for a year and updated it only when the mood strikes.

This tip relates to all the others that follow in this article, because they'll show you how your UNIX system helps serve that fresh-blogged content quicker and better than before. You must know which of your content is the most popular, know who's reading it and where they're coming from, make your text load more quickly and better, and automate your blog updates. Here, you'll get a quick look at some UNIX-based content management solutions that might be better than what you've been using to produce your blog.

Look at your logs

Your logs are your lifeblood. They tell you who's looking and where, how many, and how often. If you actively publish a weblog, you should look at your logs no less than once a day. You've got the power to see who's reading your publication, exactly what they're reading, and when they're doing it. So, why ignore it?

You can use command-line tools to extract meaningful data from your logs, but special UNIX tools exist to automatically analyze logs in the most popular formats, including those written by the Apache Web server. One such tool is the popular open source analog command.

React to what's popular

Use analog to check your links and see what people are following. First, get a general report showing statistics -- how many unique requests are being made, whether there are any failed requests, how many distinct hosts are being served, and so on:

$ analog -A www.20060901 | lynx -stdin

This command produces code such as that shown in Listing 1.


Listing 1. Sample output of the analog tool

                  Web Server Statistics for BigBlog
                                                                    
  Program started at Mon-25-Sep-2006 14:46.                        
  Analyzed requests from Fri-01-Sep-2006 00:01 to Fri-01-Sep-2006 23:59 (1.00 days).
____________________________________________________________________________

General Summary

  (Go To: Top: General Summary)

  This report contains overall statistics.
                                           
  Successful requests: 3,400              
  Average successful requests per day: 3,403
  Successful requests for pages: 2,015
  Average successful requests for pages per day: 2,016
  Failed requests: 3
  Redirected requests: 963
  Distinct files requested: 101
  Distinct hosts served: 950
  Data transferred: 65.338 megabytes
  Average data transferred per day: 65.429 megabytes
____________________________________________________________________________

   This analysis was produced by analog 6.0.
   Running time: Less than 1 second.         
                                     
   (Go To: Top: General Summary)     

Pay particular attention to the Search Word Report, which displays the most popular query words and the number of times they were requested, and the Directory Report, which shows you the most popular directories on your site. (It's always good to see which archived blog entries are currently of interest to your readers.) Finally, the Request Report shows the most requested files on the site. Your blog logo and any graphics that appear frequently throughout the site are bound to be near the top but, by looking at the actual content files (such as .html files), you can get a good idea of which pages or archived blog entries are most popular with your readers.

Daily and period spikes can occur, and you should react to them. However, it's also wise to look out over the long-tail trends. If you keep your daily logs in an archive directory, that's easy to do. Simply concatenate them and send them all to analog to process at once. Do this weekly, monthly, and even annually to track trends. Use zcat (which is named gzcat on some systems) to both decompress and concatenate any compressed logs. For example, to get complete reporting on all the logs for the month of September 2006, use the command:

$ zcat www.200609* | analog - | lynx -stdin

Know who your readers are

It can be helpful to know where your readers are coming from -- what domains, IP addresses, and countries. To find all the hosts in your weblogs, you can use a few command-line tools to get a quick report of every hostname. If you have Apache-style logging, for example, the requesting IP addresses are the first field of every line:

$ for i in `cut -d " " -f1 www.200609* | sort -u`; { host $i; }

If your log is compressed, first use zcat to pipe the decompressed text. Or, if your daily log is available in an access.log file, you can use the same principle to see whether your colleague at badblog.example.com has looked at your site yet:

$ for i in `cut -d " " -f1 access.log | sort -u | head`; \
> { host $i; } | fgrep badblog.example.com

You can output the total number of unique domains that have visited your /blog directory based on the compressed logs you have in the web/logs/ directory:

$ zcat web/logs/* | fgrep "/blog" | cut -d " " -f1 | sort -u | wc -l

Know where they're coming from

If a site is sending a lot of readers your way, you want to acknowledge it. That means that you should pay close attention to your referrers -- the URLs that contain links to your pages and appear in the headers. This data is saved in your logs, and you can use analog to extract it. The analog tool lists the referrers in the Referrer Report, as shown in Listing 2. Use the +f flag to turn this report on.


Listing 2. Sample from a Referrer Report page

                             Web Server Statistics for BigBlog
Referrer Report

   (Go To: Top: General Summary: Monthly Report: Daily Summary: Hourly Summary: Domain
   Report: Organization Report: Referrer Report: Search Word Report: Operating System 
   Report: Status Code Report: File Size Report: File Type Report: Directory Report: 
   Request Report)
   
   This report lists the referrers (where people followed links from, or pages which
   included this site's images).                                                    
   
   Listing referring URLs with at least 20 requests, sorted by the number of requests.
reqs: URL
----: ---
 814: http://www-128.ibm.com/developerworks/
 359: http://www.google.com/search
 114: http://badblog.example.com/
 102: http://badblog.example.com/2006/09/01/
  81: http://www.google.co.uk/search             
 530: [not listed: 485 URLs]
     ________________________________________________________________________________


You can also bypass the use of reporting software and get the referrers right from the command line. In Apache-style logs, the referrers are enclosed in double quotation marks and come after the IP address, date and time, and actual request (also enclosed in quotation marks). Use awk to extract the referrers; with a double-quote character as a field separator, they'll be the fourth field of each line. Because Apache writes a hyphen as the referrer when a request has no referring URL, use grep with the -v option to omit those lines. As a final touch, sort it by popularity of unique referrers:

$ awk ' BEGIN { FS="\""}; {print $4}' log.daily|grep -v "^-$"|sort|uniq -c|sort -r

Presize your images

The HEIGHT and WIDTH attributes of the Hypertext Markup Language (HTML) <img> tag are important. These parameters specify the dimensions of the given image. When present, most browsers automatically make room for the image in the window where the page is rendering before any of the image is loaded. Without these tags, the image must be completely downloaded before the text around the image is displayed.

So, when you're putting images in your blogs, it's to your advantage to include these parameters in their <img> tags, especially when you begin to have a lot of images on a single page, as it improves the loading of your blog page dramatically. Visitors will be able to begin reading as soon as the page starts to load without waiting for the entire page and all its images to go over the wire.

But having to determine the exact HEIGHT and WIDTH values each time you use an image and then put them in the <img> tag itself is a horrible bother. Fortunately, a tool exists to automate the entire task for you. The imgsizer utility (see Resources) reads any .html files you give it, checks all the source images referenced in those files, determines their heights and widths, and writes the proper values in all the <img> tags contained in the given files:

$ imgsizer index.html

It's as easy as that -- you don't have to load any of the images or do anything else to them. After imgsizer has added these tags, you'll be surprised at how much more quickly the page loads. Few bloggers use this simple technique, but it's one that readers will appreciate.

Automate your updates

Rare is the blogger who produces a blog right on the live page itself. Most work on a local copy where new entries are first roughed out and polished. Then, when ready to go live with the new index.html file, the blogger uploads this file to the server hosting the actual site.

The process can take 30 seconds to a minute of constrained attention, as the blogger opens a File Transfer Protocol (FTP) connection, types the password, changes to the local weblog root directory, changes to the server root directory, uploads the file, and logs out (see Listing 3 for an example).

As you can imagine, this process is prone to user error. If you're aiming to be a big-shot, A-list blogger with a good 10 updates per day, this upload process takes a full five minutes out of your day -- or, well over 30 hours per year! That's a lot of time that could be better spent building your information technology (IT) repertoire with developerWorks articles.


Listing 3. Manual update of a weblog root page
develbox$ ftp bigblog.example.com
Connected to bigblog.example.com.
220 bigblog.example.com NcFTPd Server (licensed copy) ready.
Name (bigblog.example.com:joe): joe_blogger
331 User joe_blogger okay, need password.
Password: secret
230 You are user #1 of 2 simultaneous users allowed.
230 Logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> lcd ~/blog
Local directory now /home/joe/blog
ftp> cd public_html
250 "/usr/www/users/joe_blogger" is new cwd.
ftp> put index.html
local: index.html remote: index.html
200 PORT command successful.
150 Opening BINARY mode data connection.
226 Transfer completed.
ftp> bye
221 Goodbye.
develbox$

A better way to do this is to use the Expect language, which is designed for scripting interactive sessions (see Resources). For bloggers who manually update their sites over FTP, it's a natural way to make an automated update script. Listing 4 shows an example that automates the session shown in Listing 3.


Listing 4. Expect program to automate weblog updates
#!/usr/bin/expect
# update a weblog index page
# puts ~/blog/index.html in remote ~/public_html/

exp_version -exit 5.0

if $argc!=0 {
	send_user "usage: bloggit\n"
	exit
}

set timeout 60
log_user 0
spawn ftp bigblog.example.com
expect "Name*:"
send "joe_blogger\r"
expect "Password:"
send "secret\r"
expect "ftp>"
send "lcd ~/blog/\r"
expect "ftp>"
send "cd public_html/\r"
expect "ftp>"
send "put index.html\r"
expect "226*ftp>"
send "bye\r"
send_user "blogged it.\n"
close

Now, when you're ready to go live with an update, it takes a lot less time to do:

$ bloggit
blogged it.
$

Use a content management system

When it comes to development and putting out products, UNIX people have a tendency toward rolling their own. But equally so, they are lazy and don't care to reinvent the wheel if a usable solution already exists; there are too many new ideas to develop.

In the early years of blogging, the most successful weblogs were hand-coded HTML -- that's much more uncommon today. Now, most blogs are database-driven, hand-configured sites that are powered by a CMS.

If there's a weblog application, it's the CMS, and it can give you a considerable number of essential blogging features that are not trivial to program -- category sorting; archiving by date, category, and media type; ease of collaborative accounts; layout templates and formatting; standard or rolling images or themes; and content availability in various formats and channels (such as RSS).

There are too many CMSs to even attempt a complete listing of them -- hundreds are currently in use, and some are described in detail elsewhere in developerWorks (see Resources). But it's worthwhile to list some of the better and more popular open source CMSs that work well on UNIX and can be configured to develop and run a weblog. These are listed in Table 1, but there are many others, so a solution is undoubtedly out there for your particular needs.


Table 1. Popular open source CMSs for UNIX
CMSDescription
BlosxomBlosxom is a Perl-based weblog publishing system featuring a plug-in architecture and virtual directories.
DrupalDrupal is a modular CMS for building weblogs with comments and trackbacks.
TextpatternTextpattern is a document management system with attention to fine Web typography; uses PHP V4.3 or later and MySQL V3.23 or later.
WordPressIt is one of the more popular open source CMS packages for publishing with UNIX.

Summary

The UNIX environment is really a natural for blogging. From the Web-friendly infrastructure to the powerful command-line tools, there's plenty in there to help you improve your blogging lot in life. This article showed some of the ways that you can use UNIX to make your blogging go better and faster.


Resources

Learn

Get products and technologies

  • Analog: Download a free copy of the Web log analysis tool.

  • imgsizer tool: Download a free copy of this tool. The identify tool is required and is available as part of the open source ImageMagick suite.

  • Download a free copy of the open source CMSs mentioned in this article, including:
  • Expect language: Download a free copy of the Expect language from its main distribution site.

  • IBM trial software: Build your next development project with software for download directly from developerWorks.

Discuss

About the author

Michael Stutz is author of The Linux Cookbook, which he also designed and typeset using only open source software. His research interests include digital publishing and the future of the book. He has used various UNIX operating systems for 20 years. You can reach him at stutz@dsl.org.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=166722
ArticleTitle=UNIX tips: Become a better blogger with UNIX
publish-date=10102006
author1-email=stutz@dsl.org
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Try IBM PureSystems. No charge.

Special offers