[Next] [Up/Previous]

HTML Basics

This page deals with a number of things that will be found on almost every HTML page you are likely to encounter or create.

Organization of an HTML page

A very simple HTML page may look like this:

<html>
<head>
<!-- A comment -->
<title>This title appears in the title bar of your browser.</title>
</head>
<body>
<h1>This is a large heading; you might put the title here again.</h1>
<p>This is text. It is enclosed by tags that
indicate that it constitutes one paragraph.</p>
</body>
</html>

This very short page illustrates the basic form that all HTML pages must have. Many HTML tags are in the form <tag>...</tag>. When a pair of tags like that is used, they must be treated like parentheses: that is, they must be nested correctly in matching pairs:

RIGHT:
( [ ] )

<A>
 <B>

 </B>
</A>

      DO 7 I = 1,10   --------
...                           |
      DO 17 J = 1,8   ----    |
...                       |   |
   17 CONTINUE        ----    |
...                           |
    7 CONTINUE        --------

WRONG:
( [ ) ]


<A>
 <B>

</A>
 </B>


      DO 7 I = 1,10   --------
...                           |
      DO 17 J = 1,8   ----    |
...                       |   |
    7 CONTINUE        ----+---
...                       |
   17 CONTINUE        ----

or just like nested DO loops in FORTRAN, for that matter.

The example above shows only the most basic and essential elements that are found in nearly every web page.

The entire page is enclosed by <html> ... </html>, which indicates that it is indeed written in...HTML.

The page itself consists of two definitions, which are marked off by the next set of tags: <head> ... </head><body> ... </body>. The text, and other items that are visible on the page itself, appear in the body section: other items appear in the head section.

Only two things appear in the <head> section in this example: a comment, and the page title. Comments can appear in the body section of a page as well; a comment was shown at a very early point on the page because it is a good practice to put your copyright notice in this position on every page. A visible copyright notice should appear on your site as well, but you may choose to only have a visible notice on selected pages, such as the main (home, entry) page of the site.

Oh, and you can put a real copyright symbol on your page by using &copy; : this is explained further on the third page.

The page title, within <title> ... </title> tags, appears in the title bar of the browser, not on the page itself.

What you see on the page itself appears in the section delimited by the <body> ... </body> tags.

The first thing there is a header, within <h1> ... </h1> tags. <h1> stands for a Level 1 header; <h2> is a Level 2 header, and so on, down to Level 6. On typical browsers, headers appear in bold, and they vary in size, Level 1 headers being the largest.

Since you will often want the title of a page to appear in large print at the top of the page, it is repeated inside the <h1> ... </h1> tags.

Finally, you will want to put some actual text on your page! Text is a sequence of paragraphs, and each paragraph is enclosed in <p> ... </p> tags.

Note that using different kinds of headers is not the only way to control the size of text. Ordinary text in a paragraph can be made a few sizes larger or smaller using tags like <font size="+2"> ... </font> wherever desired.

Incidentally, since the title within the <title> ... </title> tags does only appear in the title bar at the top of the browser, and is not normally obtrusive when the page is viewed, it is possible to have a little fun with the title of a page.

Thus, for example, on a page about a form of cache-internal parallel computing as explained for an imaginary computer architecture that ends up with a rather long descriptive name, and on another page where the consequences of switching between little-endian and big-endian operation on a computer with packed-decimal arithmetic and similar features are explored, I provide amusing titles which relate to the content of the pages.

On a different page, on heptagonal tilings, because of the colors I had chosen for the tiles, a structure that appeared only rarely happened to recall the title of a novel by Lin Carter, whose name I therefore note.

On yet another page, I indulge in an obscure reference to a hardcover black-and-white comic book for an adult audience from 1950 which has been advanced as the first English-language graphic novel.

Other obscure things on a web page include the filename of the page itself and the filenames of images on the page. Thus it was that when the system I used for naming various illustrative diagrams of block cipher designs led to this page having on it a diagram named qb2b2.gif, when I had that diagram serve as a link to a page with a larger diagram (constructed from smaller pieces as a table), I gave that page a name that seemed appropriate.

Additional Basic HTML Features

There are numerous other features of HTML that are still very important, and which appear on just about every page.

Emphasized Text

If a word is to appear in italics, that can be indicated by enclosing it in <i> ... </i> tags, and boldface can similarly be indicated by <b> ... </b> tags. But those tags specifically request italic and boldface printing, and nothing else. It is possible to view an HTML page from what is known as a shell account.

Normally, when someone uses a modem to connect to the Internet from home, PPP or SLIP is used to send TCP/IP packets to the home computer, which then effectively behaves as if it is directly connected to the Internet.

With a shell account, only the computer you are connecting to runs Internet software: your computer only runs a terminal emulation program, often emulating the Digital Equipment Corporation's popular VT100 display terminal. So if you look at Web pages, a browser running on the remote computer (Lynx is a popular text mode web browser for UNIX systems, and it has also been ported to other operating systems) presents the pages in text format, as can be sent to a simple display terminal.

Such terminals can display text with highlighting, or in inverse video. But they can't do italics. So, if you would like to emphasize a word in a sentence, so that it would appear in italics on a typical web browser, but you would also like it to appear highlighted or in color for someone viewing it from a shell account, or by some other means that doesn't permit display of italic characters, you can't use the <i> ... </i> tags.

But you do have another choice. <em> ... </em> indicates emphasized text, and <strong> ... </strong> indicates strongly emphasized text. These normally appear in italics and in boldface, respectively, but they will also be visible with other forms of emphasis when those are what is available.

Images

Many web pages have pictures on them.

A picture is indicated by an HTML tag like this:

<img src="picture.jpg">

This tag doesn't have an </img> tag corresponding to it, it is complete in itself. (Also complete in themselves are the <hr> and <br> tags, which insert a horizontal rule and a line break into your document respectively.) Note that the source of the picture is a filename. Technically speaking, however, it is not a file name, but a type of URL: in this case, a relative URL. In this case, the file picture.jpg would be in the same directory as the HTML page (which is a text file) is contained in on the computer acting as a Web server.

On the page, the image is treated as if it is a funny-looking letter, and just appears in the middle of text, unless something else is specified.

Thus, you will often put an image in a paragraph by itself.

The paragraph can be centered on the page:

<p align=center><img src="image.gif"></p>

(An example of this appears on my page main.htm.)

Note that src="image.gif" and align=center are both examples of a syntactic element of HTML called attributes. The attribute value can always be enclosed in quotes: when it consists only of letters and digits, without spaces or punctuation marks, it does not require quotes. However, always enclosing attribute values in quotes is a good practice for compatibility with other standards, such as XHTML, SGML, and XML.

If you do want the image to appear in the middle of text, you can control whether it is aligned with the same baseline as the text (the default), or whether the text runs into the center of the image, or is aligned with the top of the image by specifying an alignment on the image itself:

<p>Diagram: <img src="figure.gif" align=middle></p>

is an example. The other two possibilities are align=top and align=bottom.

(An example of this appears on my page co0404.htm.)

Normally, however, every image you put in your document will have two additional parameters specified, making it look like this:

<p><img src="webcam.jpg" width=320 height=240></p>

The width and height attributes specify the width and height of the image. In this way, the browser can reserve space for the image even before it has finished loading, so that it is not necessary to wait for the image to load before reading any text that is below (or level with) the image.

Another strongly recommended feature is to include an attribute like alt="Picture of my house" to every image. This specifies text to appear when the image cannot be displayed, and that text can also appear as a popup when the cursor is over the image in some newer browsers.

Images can come in two standard file types. (Actually, there is are two others, with the extensions .xpm and .xbm, where the image is represented in what looks like a data statement from a C program.) The extension .gif represents an image that has been compressed exactly, using a technique similar to that used for ZIP archives. This type of file is most appropriate for line drawings, and diagrams composed of lines and areas of solid color. The extension .jpg represents an image subjected to a lossy compression, and is recommended for use with photographs or paintings, where colors change gradually from pixel to pixel.

Incidentally, if you accidentally give the wrong width and height for an image, many browsers will automatically stretch or shrink the image to fit. This will sometimes give results which are unattractive in appearance, but it will work well when the given dimensions are an exact multiple of the real ones, or the other way around. Some sites misuse this to allow a shrunken display of a large image to serve as a thumbnail to the image itself: this is not recommended, since the real image still has to be transmitted over the Internet, and thus the page loads as slowly as if every image thus 'thumbnailed' were present in full size on that page. But this trick can also be used to save bandwidth, where a drawing is made in a small size, large enough to show all the required detail, but is displayed in a larger size which is easier to see. I use this trick on my pages in ro0205.htm.

Links

Finally, it is possible to describe what is the most imporant feature of the World Wide Web: the ability to click on a word, phrase, or image to proceed to another web page. Here, the use of both absolute and relative URLs is quite common; except in special cases, one doesn't normally put an image from another web site on one's own page.

Links normally appear as underlined text. An image that is a link will have a box around it. The <a href="..."> ... </a> tags enclose the text or image that the user must click on to go to the link. Links look like this:

<p>This interesting topic is further explained in
<a href="eqn.htm">my page on differential equations</a>,
and you can find out more about it by going to
<a href="http://cthulhu.miskatonic.edu/math/diff.htm">the
differential equations page</a> at
<a href="http://www.miskatonic.edu/index.htm>Miskatonic
University</a>.</p>

Click on "my page on differential equations", and you go to the page eqn.htm on your own site; click on the phrase "the differential equations page", and you go to the page diff.htm in the math directory on the cthulhu server in the miskatonic.edu domain; click on "Miskatonic University", and you head to index.htm on their www server, which, presumably, is that university's main page.


[Next] [Up/Previous]