HTML stands for Hypertext Markup Language and is the language most used on the web. A markup language means that some of the text is "marked up" with some sort of annotation to separate contextual text from instruction text.
To create an HTML page, open a text editor. Type
Hello World and save it with the name
HelloWorld.html. Check to ensure that the editor didn't save the file as
HelloWorld.html.txt by appending the .txt extension. Now, find the file in
your file system, right-click on it, and select Open
with... > [browser] to open it with your
browser.
The browser needs a way to determine which blocks of text should be large,
bold, italicized, and so on. You provide this information to the browser
by surrounding the text with opening and closing HTML tags. These tags are
enclosed within a less-than sign (<) and a greater-than sign (>);
for example, <p>. A closing tag is the
same as the opening tag with the addition of a forward slash before the
tag name; for example, </p>.
Create a new file named test.html, and enter the markup shown in Listing 1.
Listing 1. Simple markup
<div><h1>TODO List</h1><p><span>I highly </span><strong>recommend</strong> we <em>walk</em>.</p></div> |
The first tag used in Listing 1 is <div>,
which is typically used to divide content. Next is
<h1>, which is a heading. There are six
available heading tags, from <h1> to
<h6>, and the native browser styling
starts with <h1> as the largest and
<h6> as the smallest. The
<p> tag is a paragraph, and within that,
displayed natively as bolded text is a
<strong> tag. Finally there is the
<em> tag, which is short for emphasis,
displayed natively as italicized text.
When you view the markup from Listing 1 in a browser, the result looks something like Figure 1.
Figure 1. The HTML code from Listing 1 rendered in a browser
Display styles: block and inline
There are many display styles with subtle differences, but there are two
basic types: block and inline. The block styles are for blocks of text,
such as headings or paragraphs, that start on a new line. The
<div>,
<h1>, and
<p> tags are block styles. The inline
styles are for styles within blocks, such as bold and italics.
The <span>,
<strong>, and
<em> are inline styles.
Semantics is the study of meaning and relationships between words or symbols. Applied to an HTML page, semantics means using the appropriate tag name to describe the content it contains.
When users read your web page, they don't need to know what tags you used.
But there are other things that read your web page besides humans, such as
search engines. When your page is being indexed for search, it is broken
into sections that are given different priority. An
<h1> tag is considered the highest
priority on your page, followed by <h2>,
and so on, down to the paragraphs.
Screen readers are devices used by people who have vision
challenges. When a screen reader encounters a
<strong> or an
<em> tag, it pronounces that content more
strongly or with more emphasis. This is why the use of
<strong> and
<em> is encouraged over the use of
<b> and
<i>, respectively.
To add an image to your test.html file, first you need an image. You can use one from somewhere on your computer or you can grab an image from the Internet by right-clicking on the image, selecting Save Image As..., and saving the image file to the same folder where your test file is located. Add the image to your test file using the code shown in Listing 2.
Listing 2. Image markup example
<img src="MyImage.jpg" /> |
This example tells the browser to render an image using the source found
in the path of the src attribute.
You might also notice that there is no closing tag in Listing 2. There are
two types of tags: paired and unpaired. Paired tags might contain
textual content. Unpaired tags never contain content. For
instance, you can't use
<img>An image is here</img> because
the <img> tag is used to display an
image, not text. If a tag does not contain text, a closing tag is
unnecessary, so you can simply finish it with a forward slash, as shown in
Listing 2.
Tags can contain attributes that tell the browser how to render the
content contained within the tags. For instance, in Listing 2, the src attribute defines the
path to MyImage.jpg to create the image. One of the more useful attributes
is id, which you can use to find and manipulate
an element in a page with the JavaScript language or to apply styles to an
element with Cascading Style Sheets (CSS).
An anchor tag can connect to locations in multiple ways, such as a hash
tag (#), absolute Uniform Resource Locators
(URLs), or relative URLs.
The first way is to target a location within the same document by
referencing the id or
name attribute of the target, preceded by the
hash symbol, as shown in Listing 3:
Listing 3. Using a hash to reference another part of the document
<a href="#anwserA">See Answer A</a> <a href="#anwserB">See Answer B</a> <div id="anwserA">The answer is 41</div> <div id="anwserB">The answer is 43</div> |
When a user clicks on a link that contains a hash reference, the browser scrolls to that point in the document. If the document is too short to scroll, there is no noticeable change, except that the browser address changes to reflect the hash location, as shown in Figure 2:
Figure 2. The browser address bar showing the hash location
A link can, of course, also reference other pages on the Internet by using a URL, as shown in Listing 4:
Listing 4. Common Internet address link
<a href="http://www.ibm.com/developerworks/">developerWorks</a> |
The link shown in Listing 4 is known as an absolute URL because the address begins with the domain of the website. A relative URL targets a page relative to another page within the same site. Think of it in the same way that you access your files within the folders on your computer. Figure 3 shows a simple website structure. The only difference between files and folders on your computer and those on the Internet is that a web server is referencing these files, making them available on the Internet, and calling them a website.
Figure 3. A simple website structure
Listing 5 shows what relative URL links in pageA.html might look like.
Listing 5. Relative URL examples
<a href="pageB.html">See Page B</a> <a href="subpages/subA.html">See Sub Page A</a> <a href="subpages/subB.html#section3">See Sub Page B, Section 3</a> |
Notice the first link in Listing 5 does not contain the protocols you are
used to seeing in an Internet address, such as
http or www; it's
just a page name. The second link shows how you target a page within a
folder: you use folder name, forward slash, file name. The third link
shows how you can target a section within a page using a hash.
The examples shown in Listing 5 use page names in the URLs, but the absolute example in Listing 4 doesn't. To simplify web addresses, web servers have default pages. If no page name is given, one of the default pages is accessed. These default pages can have any name, but most often they are named index.html.
If the default page or the page referenced is not there, the server throws a 404 error message, which indicates the page was not found.
Well-formed HTML simply means HTML markup that follows the rules. The two basic rules you should adhere to are:
- If you open it, close it.
- Don't overlap tags.
The first rule means remember to close your tags. Listing 6 shows an example of an open tag.
Listing 6. Markup with an open tag
<strong>My markup <em>should be well-formed</<strong> |
The intention is probably to emphasize the word "should," but the browser
doesn't know that and will probably italicize everything in the page after
the <em> tag.
Listing 7 shows well-formed, properly nested markup.
Listing 7. Well-formed markup example
<p>I want my markup to be <strong>really</strong> well-<em>formed</em></p> |
Listing 8 shows improperly formed markup with overlapping tags.
Listing 8. Improper markup with overlapping tags
<p>I want my markup to be <strong>really <em>well</strong>-formed</em></p> |
You might wonder why Listing 8 is improper. If you try it, it might render as you expect in your browser with the word "well" being both bolded and italicized. HTML is intended for the common person, as opposed to other markup languages, such as Extensible Markup Language (XML), which are intended for professionals. Therefore, browser implementors work hard to guess how improperly formed HTML should render. However, even if your improper HTML looks correct in your browser, it may look completely different in another browser, or in a future version of your browser.
It might just look like text to you, but the browser sees markup as objects or elements. These elements use a parent-child hierarchal relationship. In computer science, a parent can have multiple children, but a child can have only one parent. Figure 4 shows how the browser sees the well-formed markup from Listing 7.
Figure 4. How a browser views your markup
Trying to convert the improper markup from Listing 8
doesn't work because the strong and
em objects collide. What happens is the browser
rewrites your code to create its objects. It has to guess what you mean,
and, therefore, your page may not render as you intend.
To prevent improper HTML, it helps to write your markup using indentation, as shown in Listing 9:
Listing 9. Indented markup
<div>
<p>
<span>
<strong>
Bold Text
</strong>
</span>
<span>
<em>
Italic Text
</em>
</span>
</p>
</div>
|
Until now, the focus of this article has been on the textual section of an HTML document. But there are some meta elements as well. Listing 10 shows a small but valid HTML document.
Listing 10. Sample HTML document
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Example page</title>
</head>
<body>
<h1>Hello world</h1>
</body>
</html>
|
The first line (continued to the second) in Listing 10 shows something called a doctype. A
doctype triggers a different rendering mode in
the browser. The rules for how to render HTML markup are determined by a
set of standards that change and evolve over the years. The use of
doctypes help browsers keep up with the changes
without breaking pages written with older standards.
After the doctype is the
html tag, which wraps everything in the
document. There are two child elements in the
html element: the
head and the body.
The head is where the meta information goes,
starting with the title element. Whatever text
you put in title is what you see in the title
bar of the browser and is how the page is recognized by search engines and
bookmarks. The head may also contain various
meta information, such as keywords, a description, style sheets, or
scripts. The body is where you put all your
displayed textual content.
A stylesheet is an external file that contains the style definitions for
the page. The stylesheet is associated with the page with a
link tag and targeted with the
href attribute. Because of the various ways
link tags are used, you must also include the
rel attribute set to
"stylesheet", as shown in Listing 11:
Listing 11. Example of a linked stylesheet
<link rel="stylesheet" href="stylesheets/MyStyles.css" /> |
You can also directly include styles in the
head within a style
tag. The contents of a style element can
contain style definitions or <@imports>
of other stylesheets. Listing 12 provides an example.
Listing 12. A style element with various styles and an import
<style>
@import "stylesheets/MyStyles.css";
#myElement{
background:black;
color:white;
}
h3{
font-size:36px;
}
.bordered{
border: 1px solid red;
}
</style>
|
Listing 12 also demonstrates some of the basic ways to style HTML
elements. The first, <#myElement>, starts
with a hash symbol, which means it is targeting an element with
myElement as the id
attribute. The second shows that you can directly style elements by tag
name. Here, all h3 elements have a font size of
36 pixels. The third way starts with a period, which indicates that it is
a class name, and targets any element that contains
bordered in its
class attribute, such as
<div class="bordered">Stuff</div>.
Asynchronous JavaScript + XML (Ajax) is all the rage these days. Ajax is really a fancy marketing term for the JavaScript language, which is the default scripting language of browsers. You can use the JavaScript language to ensure forms are filled out correctly, hide or display elements, or even make them move around on the page.
As an HTML page is opened by the browser, the browser literally reads the content from the top to the bottom and renders it along the way. Pages on the Internet do not just appear; they display a few elements at a time.
Listing 13 contains a script that, when encountered
by the browser, executes and displays "Hello World" in an alert box. The
browser then stops everything and waits for you to press the
Okay button. At this point, the title in the browser
is rendered because the title element has been
read. But the page text "Page Content Here" does not display because the
browser hasn't read that far yet.
Listing 13. Hello World browser rendering example
<html>
<head>
<title>Example page</title>
<script type="text/javascript">alert("Hello World");</script>
</head>
<body>
<h1>Page Content Here</h1>
</body>
</html>
|
The alert box mentioned earlier is launched by calling the
alert function. It displays the text passed as
an argument between the parentheses immediately after the function name.
You can make and call a custom function the same way. Listing 14 shows how now the alert is
called within the customFunction function.
Listing 14. Custom function example
<script type="text/javascript">
function customFunction(){
alert("Called via custom!");
}
customFunction();
</script>
|
The JavaScript language is asynchronous, which pretty much means it
doesn't run until you call it. If the
customFunction(); line is removed, the function
will not be called. It's more versatile to call the function during a
browser event. An event is an occurrence in the browser. One of
the most used events is window.onload. It fires
when the browser is done reading and rendering all the content. You can
make the onload event the custom function by
simply setting it, as shown in Listing 15:
Listing 15. Custom function fired on the load of the browser
<script type="text/javascript">
function customFunction(){
alert("Called via custom!");
}
window.onload = customFunction;
</script>
|
HTML elements have events too. Listing 16 shows how you can call the function on a mouse click.
Listing 16. Custom function fired on the load of the browser
<html>
<head>
<title>JavaScript and Events</title>
<script type="text/javascript">
function customFunction(){
alert("Called via mouse click!");
}
</script>
</head>
<body>
<div onclick="customFunction">Click Me!</div>
</body>
</html>
|
A web server is the software that returns the content of a page or other resource that a client requests. There will come a time when you reach the limitation of directly viewing HTML pages from your hard drive. To a browser, a URL such as file:///Users/Documents/test.html is a security risk because it could theoretically be something on the Internet that is trying to access your hard drive. If you start seeing security messages, it's time to install a web server.
Fortunately, it's not difficult to install a web server, and there are many tutorials about it on the Internet. Apache is easy to install, small, and popular. IBM® WebSphere® Application Server is powerful, and there are downloads available so you can test it.
A common question is "Should I learn HTML5 first or start with HTML?" It's really all just HTML, and you should start with the basics regardless of the version.
HTML5 does provide new features that developers are excited about. In terms of markup, there are many new tag names that are available to help make web pages more semantic and maintainable. The JavaScript application programming interfaces (APIs) for HTML5 have dramatically increased so that web authors can build full-fledged web applications without the help of plug-ins.
In this article, you have learned the basics of well-formed HTML and how to get started with CSS and the JavaScript language. There are many resources and tutorials on the web to help you take your skills to the next level.
Learn
-
"How
browsers work" (Tali Garsiel's site): Understand how browsers rewrite poorly formed
markup.
-
"Choosing the right doctype for your HTML documents"
(Dev.Opera, July 2008): Get an
explanation of different doctypes.
-
"Semantics"
(Wikipedia): Learn more about semantics.
-
"Well-formed
element" (Wikipedia): Learn more about well-formed HTML.
-
developerWorks Web
development zone: The developerWorks Web development zone
specializes in articles covering various web-based solutions.
Get products and technologies
-
Evaluate IBM
products in the way that suits you best: Download a product trial,
try a product online, use a product in a cloud environment, or spend a few
hours in the SOA Sandbox, learning how to efficiently implement
service-oriented architecture.
-
WebSphere Application Server Community Edition: Try, WebSphere
Application Server Community Edition, a no-charge, pre-integrated,
lightweight Java Platform, Enterprise Edition 5 (Java EE 5) application
server built on Apache Geronimo technology.
-
Apache HTTP Server:
Download the Apache HTTP Server.
Discuss
- Create your developerWorks profile today, and setup a watchlist on HTML. Get connected and stay connected with
the developerWorks community.
- Find other developerWorks members interested in Web development.
-
Join one of our developerWorks groups focused on Web topics:
Share what you know.
- Roland Barcia talks about Web 2.0 and middleware in his blog.
- Follow developerWorks' members' shared bookmarks on Web topics.
- Visit the Web 2.0 Apps forum: Get answers quickly.
- Visit the Ajax forum: Get answers quickly.

Mike Wilcox is director of technology for BetterVideo, a fast-growing startup in Frisco, Texas. He is in charge of front-end engineering and online video services. Mike is a regular speaker on Ajax and other web technologies, and has spoken at the 2009 Rich Web Experience, the 2009 Dallas TechFest, and many other conferences. His open source work is on display in the Dojo Toolkit, where, as a committer, he has implemented many of the multimedia technologies, which include the Multi-file Uploader, the audio and video components, and the vector-based DojoX Drawing.




