Class Activity: XML and Categorizing Sections
of Text
Objective: To get a taste of
the future of web markup by categorizing bits of famous works of literature
in self-produced terms, acceptable in XML, as explained below
The Background: HTML is no
longer the sole markup scheme for the web. Newer arrivals have added functionality
to descriptions of data. With xml, we can create self-describing data.
Bodies of text can be marked with any word that describes content, documents
become aware of the properties of content.
For instance, the first paragraph of Jonathan Swift's
Gulliver's Travels (1727) can be marked as follows:
<publisherSalutation>THE PUBLISHER TO THE READER.</publisherSalutation>
<aNote>[As given in the original
edition.]</aNote>
<GulliverIntro>The author of these
Travels, <mainCharacter> Mr. Lemuel Gulliver </mainCharacter>,
is my ancient and
intimate friend; there is likewise some relation between us on the
mother's side. About three years ago, <mainCharacter>Mr.
Gulliver </mainCharacter> growing weary
of the concourse of curious people coming to him at his house in
Redriff, made a small purchase of land, with a convenient house,
near Newark, in Nottinghamshire, his native country; where he now
lives retired, yet in good esteem among his neighbours.</GulliverIntro>
Any term may be used. The purpose is a machine understandable
tag--to make machines smarter. Now the machine knows that this paragraph
introduces the main character, knows that the publisher has issued a statement.
While we are not going to learn more about xml than just
putting these texts into categories with our own terms, you should know
that a style sheet is used to format the text based upon these terms.
A style sheet is usually a file on the same server as your Internet site
that formats multiple documents with a uniform styling scheme (similar
font, etc.). For instance if we wanted <GulliverIntro> to be formatted
in italics, we would designate that tag to be formatted as such in the
style sheet. That way if we have several paragraphs using <GulliverIntro>,
then they can be formatted in a single gesture.
One other basic style consideration for our activity:
XML tags must be pairs. They must have a closing tag differing from an
opening tag by the </ in </tagname>. If you do go on to author
in XML, you'll want to make sure you close all your sections with end
tags.
The first few paragraphs of the e-texts of these novels
can be found at these links, as well as downloadable versions of the e-texts
made available through Project Gutenberg (http://promo.net/pg/)
Jonathan Swift's Gulliver's
Travels
Francois Rabelais' Gargantua
and Pantagruel |