Plea for help: The detriment of using Microsoft products
Mike Bilow
mikebw at colossus.bilow.com
Mon May 15 18:02:29 EDT 2000
I've snipped lots of your message to avoid overquoting, and I apologize if
this does a disservice.
First, SGML was not developed as a government standard, but as an internal
IBM standard, although certainly the government became its most
enthusiastic user. Second, HTML is not a subset (nor even an instance) of
XML, although HTML is an instance of SGML and there is now something
called XHTML which places some additional constraints on HTML so as to
make it compatible with an XML parser.
Third, and most importantly, the main purpose of SGML is the encoding of
semantic information in documents, which is completely opposite to the
encoding of presentation information. XML also is designed to encode
semantic information, and an XML document is intended to be rendered by
application of XSL which maps semantic information to some particular
presentation medium. (In fact, the main purpose of XHTML is to provide a
target for XML documents to be rendered through appropriate XSL so they
can be displayed on an HTML-compatible browser.)
The paragraph above probably leaves people's heads spinning, so it might
be helpful to give a heavily excerpted pseudo-example. If I want to
display something like a personal addressbook, I might be inclined to code
that in HTML about like this:
<TABLE>
<TR><TH>Name</TH><TD>General Motors</TD></TR>
<TR><TH>Street</TH><TD>300 Renaissance Center</TD></TR>
<TR><TH>City</TH><TD>Detroit</TD></TR>
<TR><TH>State</TH><TD>MI</TD></TR>
<TR><TH>ZIP</TH><TD>48265-3000</TD></TR>
<TR><TH>Phone</TH><TD>313-556-5000</TD></TR>
</TABLE>
This would be perfectly valid HTML, and it would display tolerably
well. However, there would be no hope of searching documents formatted
this way, let alone indexing them, without at a minimum building a fairly
sophisticated parser capable of handling regular expressions.
On the other hand, XML would be designed to code semantic meaning into the
document, something like this:
<ADDRESS>
<NAME>General Motors</NAME>
<STREET>300 Renaissance Center</STREET>
<CITY>Detroit</CITY>
<STATE>MI</STATE>
<ZIP>48265-3000</ZIP>
<PHONE>313-556-5000</PHONE>
</ADDRESS>
Obviously, there is no way to send a raw XML document of this kind to a
browser to be displayed unless there is some additional information. In
my case, I need a Document Type Definition ("DTD") which defines the legal
syntax for my XML document, and I need an Exetensible Stylesheet Language
("XSL") instance which maps the XML entities defined in my DTD to some
sort of presentation language, such as XHTML, which can be rendered. It
is entirely conceivable that my addressbook would be rendered from XML
source through XSL into XHTML so that it came out looking exactly like the
first raw HTML example (except with the tags in lower case), but it is
equally possible that some other XSL instance could be used to render the
document suitable for some non-HTML device, such as a SQL database, a
wireless phone, or a screen reader for the blind.
If I choose to author documents in XML, I might well choose to use a
standard DTD which is publicly accessible to browsers, in which case the
user could choose the particular XSL instance with which the document is
rendered. For example, a blind user might prefer to render XML documents
with an XSL instance adapted to a screen reader. This sort of client-side
rendering might also be of great value to search engines, which are now
reduced to using raw keyword searches augmented by the very occasional
semantic tag (such as "META").
In most cases, however, rendering for presentation is done by the server.
This will eventually allow clients to negotiate for their preferred XSL
instance with the server on-demand. For example, a web server would be
able to serve up XML documents in either XHTML or WML, depending upon
which the client requested. In the long term, companies will define a
companywide DTD (or use a public DTD) and their employees will use word
processors that actually store documents natively in XML.
-- Mike
On 2000-05-15 at 13:35 -0400, Jeffry Smith wrote:
> On Mon, 15 May 2000, Derek Martin wrote:
> > XML
> > Admittedly, I don't know very much about XML. A lot of the OSS
> > office suites seem to be using it a lot for their data format.
> > I gather it's a lot like HTML but more extensible.
> >
> Actually, it would be more to say that HTML is a select subset of XML
> (eXtensible Markup Language), which is itself a subset of SGML
> (Standard Generalized Markup Language). SGML was developed as a Gov't
> standard for marking up documents (this is a header, this is a
> chapter, this is a graphic, etc), that became an ISO standard. It
> specifically does NOT cover display, that is done via a viewer that
> takes the SGML / XML, applies the Document Type Descriptor (a
> meta-document that explains what the tags in the main document are),
> then applies transform rules for your display / item to put it in the
> correct format, before displaying. This is a long-winded way of
> saying that SGML was designed to separate:
> 1. The body of information
> 2. The structure of that info
> 3. The presentation of that info.
>
> By doing this, transforms become an easy "apply this new set of rules" to the
> document. It allows the same document to:
> 1. Be displayed on a computer screen, 1600x1200
> 2. be displayed in a Heads-Up Display
> 3. be printed.
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).
More information about the Discuss
mailing list