< Zurück | Inhalt | Weiter >

9.4.1 XML for the Uninitiated

The buildfiles of ant, usually named build.xml, are written in Extensible Markup Language, or XML. Some of the reasons for this are:

• XML is hierarchical.

• XML is standardized.

• XML is widely used and familiar to many programmers.

• Java has many classes for reading, parsing, and using XML.

• XML-based representations of hierarchical data structures are easy to read and parse for both humans and programs.

XML is a successor to SGML, Standard Generalized Markup Language, which is a language for defining markup languages. A markup document may be validated. A validated document is one that conforms to a structural specifi- cation of the markup tags in the document. Such a specification may be made using a Document Type Definition (DTD), which is a holdover from the way SGML markup languages were specified, or using one of the newer specification


image

5. There is a horde of optional tasks. As the name suggests, they are optional. Include these if you need them. This is the only mention they will receive.


standards, such as W3C’s XML Schema. In either case, the DTD or schema specify what tags may be used in the markup, where they may exist with respect to one another, what attributes tags may have, and how many times a given tag may appear in a given place. A document can thus be validated—that is, checked against the corresponding DTD or schema. It’s not necessary, however; in many situations, documents can also be used without validation so long as they are well-formed—that is, conform to the basic syntax of XML.

HTML, with which even nonprogrammers are familiar, is an instance of a markup language defined in terms of SGML (and XHTML is its reformula- tion in terms of XML). This book itself was written in Docbook, which is another SGML markup language.

So, if SGML is such a wonder, why is XML all the rage? Well, SGML is one of those standards that attempt to “subsume the world.” SGML has very complex and flexible syntax, with many different ways to represent a simple markup construct. Thus, to completely implement an SGML parser is difficult. Recognizing that 90% of the complexity of SGML is needed in only about 1% of cases, the designers of XML realized that they could make a markup specifi- cation language only 10% as complicated that would cover 99% of cases (of course, like 85% of statistics, we’re making these numbers up, but you get the point).

Implementing an XML parser, while not exactly trivial, is much easier than implementing an SGML parser.

SGML/DSSSL and XML/XSLT are efforts to make the transformation and presentation of hierarchical data easier and more standardized. If what you have read here is all that you know about XML (or SGML), you should certainly consider getting yourself a book on these important standards.

For now, we can say that XML consists of tags which are set off from data

content by the familiar less-than and greater-than brackets we are used to seeing in HTML:


<samplexmltag>


Just as in HTML, the tags may have start tag and end tag forms:


<samplexmltag>Sample XML tagged data</samplexmltag>


The entire construct, including the pair of matching tags and everything inside them, is called an element. The start tags may also, like in HTML, carry data inside them in the form of attributes:


<samplexmltag color="blue">Sample XML tagged data</samplexmltag>


If you have an empty element, one that that either does not or cannot have data between its start tag and end tag, you may “combine” the start and end tag by putting the slash at the end of the tag:


<samplexmltag color="blue"/>


Obviously, there is more to it than this, but it is enough to begin with. XML’s uses range from publishing to networked interprocess communica-

tions. Our interest here is in using it to represent a model of a piece of software and the various ways that software might be built and deployed. So from here on, we will be discussing not XML in general, but the ant document type. Ac- tually, ant’s markup language uses unvalidated XML. In other words, there isn’t officially a schema for ant. Thus, the only formal definition for an ant XML file is what ant accepts and understands. This is more common than it should be. Any XML markup vocabulary really should have a schema, but often XML use starts with “Oh, this is just a quick thing. No one will ever read or write this markup. Just these two programs of mine.” These famous last words will one day be right up there with “I only changed one line of code!” As strongly as we feel about this, ant really can never have a DTD, at least not a complete one. The custom task feature makes this impossible.