CSIS 103 Introduction to the Internet

Lesson 8

Understanding Markup Languages

As you learned in Module 1, Hypertext Markup Language (HTML) is a nonproprietary markup language that a browser interprets and uses to display content as a webpage. The Session 8.1 Visual Overview shows how HTML is used. Markup language is a general term that indicates the separation of the formatting of a document and the content of a document.

The first version of HTML was developed in 1989 by Tim Berners-Lee and Robert Calliau while they were working at CERN—the European Laboratory for Particle Physics—on a project to improve the laboratory’s document-handling procedures. Berners-Lee eventually transformed the initial work into the markup language that is now known as HTML. HTML quickly became the language used to create webpages because of its simplicity and portability, which made it compatible with many operating systems and on different types of devices.

HTML quickly evolved through specifications that are the result of the collective work of the organization known as the World Wide Web Consortium or W3C (www.w3.org). The W3C establishes specifications, or sets of standards, that identify how a browser interprets HTML code so that different apps and devices will interpret HTML correctly and consistently. The specifications are voluntary, but because the success of a website depends on the browser’s ability to follow the specifications, most organizations adhere to them as much as possible. The current specification is HTML5, which is supported by most browsers.

Another popular markup language is Extensible Markup Language (XML), which was a W3C recommendation that began in 1998 to describe the format and structure of data. XML is used to share data across organizations, especially when data is used on the Internet. Most apps, including Microsoft Office, include features that convert data stored in a proprietary format into XML. Although XML is a markup language, it differs from HTML in that XML uses a set of customizable tags to describe data and its relationship to other tags. HTML uses standardized tags but does not allow this kind of flexibility when describing data.

The most recent markup language specification from the W3C integrates the formatting features of HTML with a stricter syntax that works to combine HTML and XML so that web content is more readily and easily delivered to all devices that are connected to the Internet. This specification, recommended by the W3C in 2000, is called Extensible Hypertext Markup Language (XHTML). The main differences between HTML5 and XHTML 1.1 are in the syntax of the language. HTML is somewhat forgiving when it comes to including closing tags and supporting older features of earlier HTML specifications. XHTML is not as forgiving; therefore, many web developers use the stricter syntax of XHTML in HTML5 documents so that any new applications that support only XHTML specifications will also be able to use the HTML documents.

Figure 8-1 identifies some of the major syntax differences between HTML5 and XHTML 1.1. As a beginning HTML student, it’s important to understand some of the differences between the languages that you use to create webpages as a basis for understanding them. You will learn more about syntax as you complete this session.