Week One

Class Activity: XML and Categorizing Sections of Text

Objective: To get a taste of the future of web markup by categorizing bits of famous works of literature in self-produced terms, acceptable in XML, as explained below

The Background: HTML is no longer the sole markup scheme for the web. Newer arrivals have added functionality to descriptions of data. With xml, we can create self-describing data. Bodies of text can be marked with any word that describes content, documents become aware of the properties of content.

For instance, the first paragraph of Jonathan Swift's Gulliver's Travels (1727) can be marked as follows:


<publisherSalutation>THE PUBLISHER TO THE READER.</publisherSalutation>

<aNote>[As given in the original edition.]</aNote>

<GulliverIntro>The author of these Travels, <mainCharacter> Mr. Lemuel Gulliver </mainCharacter>, is my ancient and
intimate friend; there is likewise some relation between us on the
mother's side. About three years ago, <mainCharacter>Mr. Gulliver </mainCharacter> growing weary
of the concourse of curious people coming to him at his house in
Redriff, made a small purchase of land, with a convenient house,
near Newark, in Nottinghamshire, his native country; where he now
lives retired, yet in good esteem among his neighbours.</GulliverIntro>

Any term may be used. The purpose is a machine understandable tag--to make machines smarter. Now the machine knows that this paragraph introduces the main character, knows that the publisher has issued a statement.

While we are not going to learn more about xml than just putting these texts into categories with our own terms, you should know that a style sheet is used to format the text based upon these terms. A style sheet is usually a file on the same server as your Internet site that formats multiple documents with a uniform styling scheme (similar font, etc.). For instance if we wanted <GulliverIntro> to be formatted in italics, we would designate that tag to be formatted as such in the style sheet. That way if we have several paragraphs using <GulliverIntro>, then they can be formatted in a single gesture.

One other basic style consideration for our activity: XML tags must be pairs. They must have a closing tag differing from an opening tag by the </ in </tagname>. If you do go on to author in XML, you'll want to make sure you close all your sections with end tags.

The first few paragraphs of the e-texts of these novels can be found at these links, as well as downloadable versions of the e-texts made available through Project Gutenberg (http://promo.net/pg/)

Jonathan Swift's Gulliver's Travels

Francois Rabelais' Gargantua and Pantagruel

 

 

Week One