Skip to main content

Inheriting Web sites: Getting a Web site to a maintainable state

How to take someone else's Web site and maintain it, without overburdening yourself

Brett D. McLaughlin, Sr., Author and Editor, O'Reilly Media, Inc.
Photo of Brett McLaughlin
Brett McLaughlin has worked in computers since the Logo days. (Remember the little triangle?) In recent years, he's become one of the most well-known authors and programmers in the Java and XML communities. He's worked for Nextel Communications, implementing complex enterprise systems; at Lutris Technologies, actually writing application servers; and most recently at O'Reilly Media, Inc., where he continues to write and edit books that matter. Brett's upcoming book, Head Rush Ajax, brings the award-winning and innovative Head First approach to Ajax. His last book, Java 1.5 Tiger: A Developer's Notebook, was the first book available on the newest version of Java technology. And his classic Java and XML remains one of the definitive works on using XML technologies in the Java language.

Summary:  In a perfect world, you'd create every Web site you were ever assigned to maintain, improve, and redesign. Unfortunately, in the real world, you're often forced to take on a site someone else designed or constructed.

Date:  28 Feb 2008
Level:  Intermediate PDF:  A4 and Letter (597KB | 20 pages)Get Adobe® Reader®
Activity:  1666 views

New jobs, new projects, new assignments, and new responsibilities all carry with them risk. And while you may have a new boss or a new office or a new set of colleagues, nothing is quite so intimidating as being handed a Web site that is now yours to care and feed, having no idea how that Web site was put together. More often than not—especially if you're taking over for a less-experienced Web designer, or worse yet, someone who was just "filling in"—the sites you'll inherit are a tangled jumble of content, presentation, and even some activity. And, of course, the pages are ugly. If there aren't blink tags, there are jumbled styles, font tags mixed with CSS styles, unclosed and misused HTML tags... it can be a mess.

Thankfully, there's a logical approach to turning these sorts of pages into finely tuned, easily maintainable Web sites. This two-part series provides a road map for turning messy and unruly pages into well-structured, organized, maintainable code. In Part 1, you'll learn how to make a site maintainable. In Part 2, you'll organize your site's layout, increase its efficiency, and make sure you're in control.

It turns out that some tried-and-true techniques make a task like this manageable and put you in a position to not just succeed, but to excel. They'll also ensure that you spend the least amount of effort reworking a site, and the most amount of time improving it and keeping it working. Then, if you later get the opportunity to refresh or even redesign a site like this, you can start from well-segmented HTML and CSS, not a mess of convoluted and style-laden HTML.

Evaluation: What do you have?

The first thing you should do when dealing with someone else's work is figure out exactly what you have. This flies in the face of instinct, which is to go in charging with notepad (the Windows® kind) and editor in hand. All you'll do at this point by making changes, though, is screw things up. You need to understand what you're dealing with, and a few minutes of time—five or ten, tops—will allow you to charge in with a lot more knowledge.

Look of the site

Open up the site in a Web browser. Don't break open the HTML and CSS, don't worry about whether images are stored as JPGs or GIFs, just crack open the site. Get a notepad next to your monitor, and scribble down what you see. But—and this is the hard part—focus only on the big picture. Don't get bent out of shape that a one-pixel blank space appears between two tables on Internet Explorer 7. Instead, jot down initial impressions: Banner across top overwhelms page. Horizontal scrolling is annoying (as it always is!), and so on.

The point here isn't to have a twenty-two step list of little things to change in your HTML. You just want big overall impressions. Did you like the site? Did you hate it? Was it too full of content? Are the images grainy? You should form a very general opinion here of what should be done on the design of the site. You'll deal with the implementation in a little bit, but this is about getting the big picture figured out.

Making sense of the specs

Two basic flavors of HTML are in use today: HTML and XHTML. HTML is what you've probably learned over the last ten years. XHTML is an XML-ized version of HTML that introduces several new rules and wrinkles. For HTML, you're dealing with 4.01. Within that version, you can go with transitional, which is a bit looser (intended for people moving to 4.01 from an older version) and strict, which makes sure you really get everything right. In XHTML, you have to use lots of XML-specific things like always having an end tag, things that HTML doesn't require. In XHTML, you've got 1.0 transitional again, for moving from HTML to XHTML, and 1.0 strict, which means everything is dead on with your XHTML. There are 1.1 specs, but they're not as useful. For tons more on all things HTML and XHTML, check out Head First HTML with CSS & XHTML, a book I can recommend because I edited it, referenced in the Resources section of the article.

You're also hoping at this stage that you don't have a lot of design work to do. If the site looks pretty good here, then your job just got a lot easier. It's also why I encourage you to check out the site before digging into HTML and CSS. Lots of sites have a disaster when it comes to the implementation, but the site itself looks pretty good. The downside is that if you see a mess of HTML and CSS intermingled, and then look at the site, you can end up polluting your view and doing unnecessary redesign—all because the site's implementation left a bad taste in your mouth. Look at the site first, make a decision, and then dive into the HTML and CSS.

Standardization

You should take one more step before digging into your code (although this definitely brings things closer to implementation): Validate your Web page (or pages). Validation is seeing if your Web page is HTML, XHTML, and to what versions of those specs your page might conform. In general, you'll want to stick to whatever format your Web page is already in.

Keep in mind here that the goal is to reduce effort, because you've got to get this site maintainable as quickly as possible, and to expend as little effort as is reasonable. If you're redesigning the site, then you might want to move to whatever you're most comfortable with. For example, if you're an XHTML guru, you could change the site to XHTML, because that's your standard of choice. But if the goal is efficiency, staying with the current format of your page is probably your best choice.

Find out what you've got

Hop over to the W3C Markup Validator (see Resources) and give the site the URL for your own site (see Figure 1).


Figure 1. The W3C Validator
The Validator can handle existing sites as well as uploaded files.

The Validator will check over your page, and spit out something that looks like Figure 2.


Figure 2. Validation summary
The Validator figures out what spec your page (probably) conforms to

In this case, the Validator figured out what encoding and doctype your pages are, and then validated against them.

Bring your doctype in line with your page

The Validator actually does its best to detect what your page is, even if that doesn't match an existing doctype in your page. So if you've got a doctype that claims XHTML 1.0 strict, and your page really is HTML 4.01 transitional, the Validator will tell you that. Rather than trying to match an arbitrary doctype (perhaps entered in by an overzealous Web designer?), go with what the Validator suspects your page is. That's the most efficient approach, and will get you to a maintainable, valid site quickly.

The Validator also reports errors

In addition to figuring out what your page is, the Validator reports errors based on that assumption. In almost every case—especially if you're inheriting a site, meaning you're not sure how well the page was kept up before you—you'll have errors (Figure 3 shows several errors that weren't visible in Figure 2).


Figure 3. Validation errors
Each error in the source is displayed for you to view (and eventually fix)

Now, again, I'm going to give you hard-to-follow advice: Don't start fixing these errors!. I know, I know . . . you're eager to get to it. But here's the thing: If the Web site is a mess—CSS isn't kept separate from HTML, line breaks (<br>) are everywhere—then all of your hard work is going to get changed up anyway. You've got to split your CSS from your HTML, so why fix errors resulting from that now? Do the split, then come back and handle validation issues.

The goal at this stage is to figure out what you're aiming for. If your page was targeted at XHTML 1.0, you'll probably be able to easily get it back to XHTML 1.0. I'm also making an assumption: The designer probably chose XHTML 1.0 for a reason, so even if you don't know the reason, be safe and stick with that decision.

If your site comes back as HTML 4.01 transitional—basically, the loosest possible choice—then you're good, anyway, because you've got a low bar to meet. The idea here is to find the shortest possible path to a solid site—both in design and implementation—that meets whatever requirements your original designer was given. Sometimes you won't know what those requirements were, so validation is one way you can reverse-engineer them.

But I have 1,000 pages!

So far, the assumption is that you run each page in your site through the Validator by hand. Ultimately, that's the best way to ensure you catch every error on every page and give each page proper attention. As you work through the first few pages, you'll often make changes that apply to your entire site (perhaps reworking a header section you can do a massive search-and-replace on, for example), making later pages easier.

However, as the emphasis has been on efficiency, there are options for batch processing validation. In those cases, you're going to have to look beyond the W3C Validator and go with something like the WDG HTML Validator (which allows you to enter multiple URLs in a text box—better, although not great), or the CSE HTML Validator (an inexpensive program that does batch processing.). (Find links to both in the Resources section.) The problem with batch validation is that while you save time going in, it's easier to miss details because you're often faced with hundreds—if not thousands—of lines of errors to contend with over a large site.

Why bother validating?

If you're like most of the rest of the Web development world, validation is at best a nuisance, and at worst, a full-blown nightmare. The first time you run your page through the Validator and see hundreds of red-shaded error messages, it's pretty overwhelming. But you get some benefits from validation that are well worth the effort it takes to get your pages validating.

First and foremost, a valid page is going to display properly. Invalid pages often have missing closing tags, improper nesting of your HTML tags, and even problems with CSS style tags. These aren't things that are "nice-to-have" items; they're the essence of a site that behaves like you'd expect it to. One b or h1 tag out of place or left open, and suddenly your entire page is in a heavy font that reads too big... and users are leaving your site in droves. So validation is a way of stamping your page with "predictable." If you use tags properly, your page will be structurally sound and easy to use and browse.

Additionally, validation provides a baseline functionality that you can always check changes and updates to your page against. If you get your page validating, and then it stops validating, you know you've got a change that's not only not been checked, but may really screw up the structure and even presentation of your site. So validation serves as a check—sort of like compiling your code every time you make a change. If compilation (or validation) fails, then something is not as it should be.

Last, but (maybe!) not least, there's a debate raging about whether valid pages are more easily indexable by search engines like Google, Yahoo!, and MSN. The algorithms used by these search engines are highly secretive, and constantly being changed, so you'll find materials that say validation matters, then that it doesn't matter, and then that it's not required, but increases your search score or index. The bottom line is that except for a few folks at each company, we don't know, but it's certainly true that a page that is structured and displayed properly (something that validation assists in) is going to be visited more, indexed more, and ultimately appear higher in any search rankings.


Implementation: Cleaning up your CSS

At this point, you should have a gut feel for the overall site design and a knowledge of what specs the site was designed for (HTML, XHTML, and so on). Now it's finally time to get into changing the site itself.

Get the CSS out

Backing up? Testing? Staging?

Each company and project will have different backup and development procedures. But you should have something in place allowing you to work on a site without worrying about cratering the existing site. That may mean a backup of the site, or you working on a duplicate of the site sitting on another machine (or on the same machine with a different DNS, like http://staging.my-company.com). Just make sure that while you're improving, you're not killing the actual site at the same time.

Whether you're striving for HTML transitional, HTML strict, XHTML transitional, or XHTML strict, you need to pull any style out of your page and get it into external CSS stylesheets. There's simply no way to keep a consistent site that is easily maintainable when you've got presentation and style mixed in with your content.

Tons of articles are on the Web about splitting CSS and HTML, so I won't bore you with the tedious details (See links in the Resources section). However, here are some things you should watch out for:

  • This should be a largely mechanical process. You're not trying to improve the look of the page. You're just trying to get the CSS out of the HTML.
  • Don't worry about optimization and condensing your CSS rules. Just get them out of the HTML. You'll do some finagling of the rules in the next step.
  • You're still not worrying about validation, so don't get bent out of shape over <br /> and unclosed <p> tags. You'll get to all that, just not yet.

When you're done, the top of your HTML should look more like this:

<![CDATA[
  <head>
    <title>The Starbuzz Bean Machine</title>
    <link rel="stylesheet" type="text/css" href="starbuzz.css" />
    <link rel="stylesheet" type="text/css" href="styledform.css" />
  </head>]]>

This example has multiple stylesheets. That's optional, although I recommend it if you've got specialized sections of a site, in addition to a core look and feel.

Optimize your CSS

Once you've got all the styling out of your HTML, you should have one or more CSS stylesheets. (Yes, I realize that CSS stylesheets is redundant—cascading style sheet stylesheets. I have no idea how else to write it, though, so enjoy the humor.) Take each of those, and work through them bit by bit. If you have anything that looks like this:

h1 {
  color: #933;
  font: Georgia;
}

h2 {
  color: #933;
  font: Georgia;
}

—then consider condensing into something like this:

h1, h2 {
  color: #933;
  font: Georgia;
}

Not anything astounding, is it? It only looks easy, though. Consider the more realistic case, where the h1 and h2 statements are hundreds of lines apart in a massive CSS stylesheet. Then, you change one, and wonder why the fonts in another heading are wrong.

One way to get a handle on this is to alphabetize your declarations. Yes, it's a pain, but it will ensure that things like h1 and h2 end up together, and that table, td, and tr end up close to each other, as well. It's worth the trouble to bring similar elements together, and then condense the stylings.

Similarly, take declarations like this:

h1 {
  color: #933;
  font: 16px Georgia;
}

h2 {
  color: #933;
  font: 12px Georgia;
}

You should still strip out the common elements and centralize them:

h1, h2 {
  color: #933;
  font: Georgia;
}

h1 {
  font-size: 16px;
}

h2 {
  font-size: 12px;
}

Even though there are still two declarations here, you've got commonality in a single place (in this case, the h1).

Work through all of your CSS stylesheets, and ensure that you've got them tuned as much as possible. Condense your rules so you don't have lots of duplication, and bubble up any rules you can. For example, if every high-level element has a font setting of Georgia, consider moving that rule into your body section, and remove all the individual rules. Simpler is better.

Validate your CSS

Here's an optional step, but one that will really reduce your headache. Once you've got your CSS ready to go, validate it (see Figure 4) (find links to the CSS Validator in the Resources section). Like the page validator, the CSS Validator will ensure your CSS is just like it should be. No syntax errors, no typos (see Figure 5).


Figure 4. The CSS Validator
The W3C CSS Validator is a complement to their Web page validator

Figure 5. Successful validation of CSS
Valid CSS means all your rules are correct, and your syntax is perfect.

CSS validation isn't technically as important as making sure your overall page is valid. In fact, you can have a valid HTML (or XHTML) Web page with invalid CSS (go figure). Still, CSS validation will ensure you've got everything nice and clean, and your manual work will ensure you've got your CSS optimized.


Implementation: Add a Doctype

Now go validate your site again. You should see a lot of errors go away, with your CSS pulled apart from the HTML (and cleaned up and validated). Now, don't forget to document what you've done. If you don't have a doctype, you need to add one, as well as a character encoding. These will make it much easier for the next person tasked with maintenance.

Add a doctype declaration

When you get a response from the Validator, focus on the doctype (see Figure 6). This tells you at what version—according to the Validator's best guess—your Web page is targeted.


Figure 6. The Validator works to figure out your doctype
For this Web page, the document is probably XHTML 1.0 Transitional

In this case, the page is most likely XHTML 1.0 transitional. This might be different from when you ran validation earlier, but is probably the same (CSS won't usually affect an HTML versus XHTML comparison).

Right now, though, the validator is guessing at this (in most cases). You should add a doctype declaration, though, and make this official. Open up your HTML and at the top, add something like this:

<![CDATA[
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">

<!-- Rest of your HTML -->
]]>

This example is for XHTML 1.0 transitional. For XHTML 1.0 strict, you'd use this:

<![CDATA[
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
]]>

Here's what you'd use for HTML 4.01 transitional:

<![CDATA[
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
                      "http://www.w3.org/TR/html4/loose.dtd">
<html>
]]>

And here's HTML 4.01 strict:

<![CDATA[
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
                      "http://www.w3.org/TR/html4/strict.dtd">
]]>

Note that with the HTML doctypes, you don't need the xmlns attribute on your html element. That's specific to XHTML (it's an XML namespace, but that's a subject for another article).

All of these declare to the Validator—and anyone else who may have to inherit your Web site—what the target for your document is. You can re-run the Validator, and you shouldn't notice a difference, unless you chose a doctype that didn't match what the Validator thought your page was aimed at. Then, you might see more (or fewer) errors.

For the XHTML crowd: content type

If you've used one of the XHTML doctypes, you've got one more simple step. You need to add in a meta tag telling the document your content type. That looks like this:

<![CDATA[
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
 <head>]]>
  <b><meta http-equiv="Content-Type" content="text/html; charset=UTF=8" /></b><![CDATA[
  <title>My Page Title</title>
 </head>
 <!-- etc. -->
]]>

This ensures the Validator (and other programs, like XHTML-compliant Web browsers) knows what to expect when reading your content. You can add in a content type with your HTML pages, too, although it's only required for HTML 4.01 strict. (Note that in HTML, you should lose the trailing slash at the end of the meta tag; that's an XHTML thing.)


Get your site to validate

With CSS and HTML separate, an understanding of what version of HTML or XHTML you're going for, and a doctype and content type, you're set to clean up any lingering errors.

A simple error

Validate again—now you've specified what you're targeting—and start working through your errors. Figure 7 shows a typical problem you might find.


Figure 7. Working one error at a time
This particular error refers to a missing action attribute on the form     element

This is easy enough to fix. This page has a form element that's missing an action attribute. Add in that attribute, and this error will go away. Easy enough, right?

Fix one error at a time

Once you've fixed your first error, validate again. This is a bit annoying—and will take some time, fixing an error, validating, fixing another error, and so on—but it's the only effective way to use the validator. Many errors result in many more errors, and lots of times you'll fix one thing and end up fixing nine or ten other things.

I'd urge you to have your HTML editor up, as well as two browser windows (or tabs): your site and the Validator. You can make a fix, save your file in the editor, and refresh both browser tabs to get a new look at the site and an updated validation report.

Get a baseline and lock it in

When you've got a valid page, you need to lock that version in. Back it up, and copy it to a ZIP drive and USB keychain ( just make sure it exists in several places and is clearly marked). You've just established a baseline for your site.

Now, anytime any changes are made, you should be able to do several things:

  • Make sure your page still validates. Since you started out valid, you can handle this incrementally, rather than dealing with both old code and your new code at the same time.
  • Make sure your page looks right. Again, with a baseline, it's trivial to isolate any problems.
  • If you run into a disaster, you don't have to repeat everything in this article. You can start with a valid, self-documented version.

When do I get to improve things?

This whole article has focused (by most accounts) on boring, tedious tasks—and very little of what you've done should have affected how a Web site actually looks. That's because the goal here is to deal with a site you have to maintain, one that you were given with no previous knowledge.

In those cases, you're often not looking to improve things. You're looking to reduce overhead. If you then want to go on and make design changes, you're welcome to. But now you've got a good starting place, rather than trying to mix in design with cleaning up the CSS and figuring out if you're writing XHTML or HTML strict or whatever else.

If you're not looking to improve the site, well, you've done that anyway. You've got a valid site. But more importantly, you've made the site easy to maintain, and if problems occur, it will be easy to isolate and fix them.


Conclusion

Obviously, there's not much appeal in dealing with someone else's Web site. You're stuck with their design choices, their (potentially bad) HTML and CSS, their color choices and markup choices and structures. The point, then, is to spend as little time as possible doing routine maintenance, allowing you time to either redesign the site or work on other projects that you do enjoy.

The best way to do that is to find the shortest path between the site you are given and one that is stable and understandable. That means figuring out what you have, identifying where you need to go, and then taking simple steps to get there. This article lays out an approach that won't turn your inherited sites into design winners, but it will free you up to work on design winners. Sticking with the site's existing format— HTML or XHTML, strict or traditional—and getting into a full-blown CSS and HTML implementation will save you a tremendous amount of time, both medium- and long-term.

In the second part of this article, I'll address speed, accessibility, and organization. There are a lot of different considerations, although the goal of simple maintenance is the same. So do a little work to make sure your Web sites won't cause you headaches when you could be working on cool new designs. Then, come back next month for more.


Resources

Learn

Get products and technologies

  • Head First HTML with CSS & XHTML (Elizabeth and Eric Freeman, O'Reilly Media, Inc.): Learn more about standardized HTML and XHTML, and how CSS can be applied to HTML.

  • JavaScript: The Definitive Guide (David Flanagan, O'Reilly Media, Inc.): Includes extensive instruction on working with JavaScript, dynamic Web pages, and the upcoming edition adds two chapters on Ajax.

  • Download IBM product evaluation versions and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.

Discuss

About the author

Photo of Brett McLaughlin

Brett McLaughlin has worked in computers since the Logo days. (Remember the little triangle?) In recent years, he's become one of the most well-known authors and programmers in the Java and XML communities. He's worked for Nextel Communications, implementing complex enterprise systems; at Lutris Technologies, actually writing application servers; and most recently at O'Reilly Media, Inc., where he continues to write and edit books that matter. Brett's upcoming book, Head Rush Ajax, brings the award-winning and innovative Head First approach to Ajax. His last book, Java 1.5 Tiger: A Developer's Notebook, was the first book available on the newest version of Java technology. And his classic Java and XML remains one of the definitive works on using XML technologies in the Java language.

Comments (Undergoing maintenance)



Trademarks  |  My developerWorks terms and conditions

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Web development, XML
ArticleID=292241
ArticleTitle=Inheriting Web sites: Getting a Web site to a maintainable state
publish-date=02282008
author1-email=brett@newInstance.com
author1-email-cc=nora@us.ibm.com

My developerWorks community

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere).

My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).

Rate a product. Write a review.

Special offers