Skip to content

Republic / Articles /

Article updated 2025-02-05.
Written by Christofer Sandin.

This is an older article, so some references and links may be outdated. Keep that in mind while reading. Thank you!

An Introduction to HTML

HTML is a concept with many different interpretations; some use it to summarize all the technologies involved, while others refer specifically to the HTML standard itself. Here, we will look at the latter and provide an overview of the differences and similarities between the various versions of HTML and XHTML.

Below is a review of the different HTML versions we have had over the years, followed by a look at the new features of HTML5.

The History of HTML and XHTML

HTML has evolved over many years, and the W3C standard HTML 4.01 has been around since the end of 1999. As we all know, a lot has happened since then. HTML has always been backward compatible, which is a significant advantage. This means there are no issues viewing a webpage written in HTML 3 in today’s browsers.

As a further development of HTML, XHTML 1.0 was launched, last revised in 2002, with the main difference being the requirement to use XML syntax. This means that:

  • all tags must be written in lowercase,
  • all tags must be closed,
  • and all attributes must have quotation marks.

Otherwise, there are few significant differences between HTML 4.01 and XHTML 1.0.

XHTML then evolved into XHTML 1.1, which is essentially the same as XHTML 1.0 but requires that all data be sent with the same MIME type as XML (application/xhtml+xml). This means that documents are interpreted as XML documents instead of SGML documents like earlier versions of HTML.

A couple of years ago, many were confident that XHTML 1.1 was the right approach. As usual, however, the various browser versions caused issues for developers. Once again, Internet Explorer threw a wrench in the works because it did not interpret XHTML as XML but rather as HTML.

This resulted in many developers, including us, starting to develop and validate documents according to XHTML 1.1 Strict but still sending the documents to the browser as text/html instead of application/xhtml+xml, as the 1.1 specification required.

In hindsight, we might have been better off sending these as XHTML 1.0, since we were not using the XML features anyway. Aside from a slightly different DOCTYPE, the code looks the same, and the result is identical in practice.

In practice, however, text/html was a better choice

The above approach was not entirely foolish in practice, as XML, for better or worse, is a very strict language.

While an XML-based document contributes to more strictly structured documents, it also makes webpages sensitive to errors. A small mistake in the code means that the page will not display for the visitor, who will only see the XML parser’s error message on the screen instead.

If there is a publishing system behind the scenes, or if data is being loaded from an external source, there is always a risk that the generated code is not entirely strict. Therefore, allowing minor errors to render the entire webpage inaccessible is risky. The web is built on fault tolerance, and all browsers do their best to correct minor mistakes, so the XML parser’s error handling may not be the best for visitors, publishers, or the web itself.

HTML5 vs. XHTML2

In recent years, two different standards have developed in parallel. One is HTML5, which W3C is developing together with WHATWG, and the other is XHTML2 – both are currently Working Drafts and are not fully established standards.

HTML5 is a further development of HTML 4.01, while XHTML2 is not related to XHTML 1.1; it is an entirely new standard that is not backward compatible with HTML.

Confusing? Yes, a little. Recently, however, the W3C decided to cease further work on XHTML2 and instead focus on HTML5. This has also led to an increase in browser support for HTML5, and it appears that HTML5 is the way forward.

So, is it time for HTML again?

Yes, it is indeed time to start coding in HTML again, and I thought it would be useful to provide a brief overview following the historical lesson above.

In HTML5, you can write the code as classic HTML code or as XHTML code (and I mean with XHTML syntax) and validate it as HTML5. So there are plenty of opportunities to continue writing good code, regardless of which syntax you prefer. In most cases, there is no significant difference between writing good HTML and good XHTML, as long as you are consistent and stick to one of the two.

HTML5 can, by the way, have both MIME types text/html and application/xhtml+xml. In the latter case, it will probably be called XHTML5 instead of HTML5, which seems quite logical but simultaneously adds another XHTML term that has nothing to do with XHTML2, despite a higher “version number.”

Write code according to HTML or XHTML syntax

Regardless of whether you choose to write HTML or XHTML, write strict code. You do this by using a so-called DOCTYPE at the top of each document. This way, you specify what type of document you are writing and trigger what is known as Standards Compliant mode in browsers. This mode ensures that pages are rendered according to the same rules in different browsers (to a much greater extent).

So, use HTML 4.01 Strict or XHTML 1.0 Strict regarding syntax.

HTML5

HTML5 can thus be sent as both application/xhtml+xml and as text/html. This means there are no barriers for those who want to use HTML5 with an XML parser.

HTML5 also uses a very simple DOCTYPE; it is sufficient to use <!DOCTYPE html>.

In addition, the new standard introduces a range of new semantic tags to structure documents even better than today. Sure, <div id="navigation"> works well and is somewhat structurally and semantically correct. Still, since many use <div id="nav"> or <ul id="menu">, there is no real standard regarding how navigation should be marked up.

The new tags in HTML5 include, among others;

<nav>

The nav element represents a section of a page that links to other pages or to parts within the page: a section with navigation links.

This tag is intended to mark up the primary navigation of the page.

<section>

The section element represents a generic document or application section. A section, in this context, is a thematic grouping of content, typically with a header, possibly with a footer.

It is used to mark related parts of a page.

Within a <section>, the information can further be marked up using the tags below. If these tags are used outside a <section>, they are associated with <body> and thus apply to the entire document instead.

<header>

The header element represents the header of a section.

This does not mean it only contains H1 or H2 tags; it can include information such as authors, publication dates, and other metadata.

<footer>

The footer element represents a footer for the section it applies to. A footer typically contains information about its section, such as who wrote it, links to related documents, copyright data, and the like.

This defines information related to the <section> it is part of. For example, W3C uses links to related articles and copyright information.

It is worth noting that if a <footer> tag represents the entire page and contains contact information, it should be marked up with the

tag, which has previously had many different opinions.

<article>

The article element represents a section of a page that consists of a composition that forms an independent part of a document, page, or site.

The idea is that <article> is meant to be used to mark up parts of a document that can stand well on their own. W3C uses forum posts, articles, blog entries, and comments as examples in its description.

<aside>

The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content.

There are many opinions on how to mark up secondary information when writing HTML. With HTML5, a tag specifically intended for marking up related but not primary information has been added.

An example of this is the text that has often been found in a column to the left or right of the main content or as pull quotes and additional information in longer articles.

Other Interesting Tags

Other interesting tags to keep an eye on include;

  • <audio>
  • <video>
  • <canvas>

It is also new that the <input> element’s type attribute is supplemented with a couple of new variants, such as;

  • date
  • time
  • number
  • email
  • url

For everyone using UTF-8 encoding (which practically everyone should be doing today), this can be defined by adding <meta charset="utf-8"> as the first element in the <head> tag.

Using HTML5 Already Today

It is not particularly difficult to start using HTML5 today. Firefox versions 3.5+, Safari versions 4+, and Google Chrome can already handle most of it without any issues, while Internet Explorer may need a little help along the way.

First and foremost, the new tags are not interpreted as block elements, so you must declare these in the CSS document as such:

nav, 
section, 
header, 
footer, 
article, 
aside { 
    display: block; 
}

To apply CSS in Internet Explorer, you must also register the objects that use the tags above in the DOM. This is easiest done by creating them with JavaScript at the beginning of the document:

<script>
document.createElement('section');
document.createElement('nav');
document.createElement('header');
document.createElement('footer');
document.createElement('article');
document.createElement('aside');
</script>

Alternatively, you can use html5shiv or a more robust library like Modernizr to solve both of the above problems.

As pointed out by Steve Smith in his article Structural Tags in HTML5, HTML5 interprets the <script> tag as JavaScript unless otherwise specified, so it is unnecessary to declare type="text/javascript" explicitly.

A New Way to Embed Video on the Web

It is possible to start using the new structural tags mentioned above right now, but of course, the new elements such as <video> or <audio> will not work in browsers that do not have built-in support for HTML5.

Today, Firefox, Safari, Chrome, and Opera have relatively good support for <video>, while other browsers still require the use of something like SWFObject.

A significant advantage of the <video> tag today is that as long as Safari supports the video format, it will also work on an iPhone with iOS 3 or later. Henrik Sjökvist has written a good introduction called “Using the HTML5 <video> tag with a Flash fallback” that is both concise and informative.

If you are using Firefox 3.5+ or Opera, you can check out how the new <video> tag works at the links below. There, a video in Ogg Theora format (.ogm/.ogg) is displayed, which can be played without any extra plugins and is solely encoded with HTML5.

Unfortunately, Safari currently cannot handle the .ogm format, even though Safari supports HTML5. In this case, it is an active choice by Apple that causes the video in the example above not to play in Safari, not a lack of support for HTML5. If the film had been in a format supported by Safari, it would have worked. Currently, Safari supports the same formats as QuickTime (with MPEG4 likely being the most common and widely used today).

Conclusion

I hope you have gained a fairly quick overview of how HTML has developed and a little sneak peek at HTML5. Below are links to more interesting resources if you want to dig deeper.

References and Resources