Message-Id: <9403251803.AA06078@ulua.hal.com>
To: timbl@www0.cern.ch, uri@bunyip.com
Subject: Re: Toward an unambiguous grammar [Was: FTP syntax ]
In-Reply-To: Your message of "Fri, 25 Mar 1994 11:19:40 CST."
Date: Fri, 25 Mar 1994 12:03:06 -0600
From: "Daniel W. Connolly" <connolly@hal.com>
I forgot to add...
Have folks looked at the grammar and test suite I suggested? If the
source distribution
<http://www.hal.com/%7Econnolly/dist/url_test-19940316.tar.Z>
is too clumsy, I've made it available in exploded form in:
<http://www.hal.com/%7Econnolly/url-test/>.
[We're all spoiled by the immediate gratification of the Web, aren't we?]
The issues are enumerated in:
<http://www.hal.com/users/connolly/url-test/url_grammar.tests>
and there are some ideas and errors tests in
<http://www.hal.com/users/connolly/url-test/url_ideas.tests>
<http://www.hal.com/users/connolly/url-test/url_error.tests>
Is the grammar in the spec useful to folks? It's not useful to me
as an implementor -- it's too ambiguous.
And most of all, it clouds the issues of generic URL parsing versus
scheme-specific URL parsing.
I suggest we replace it. This is assuming that the gopher advocates
are in the minority in their support of completely opaque
left-hand-sides of a URL.
Actually, there are only three reasons for making the LHS of a URL not
opaque, and they all come down to trying to cram multiple strings
into one:
* relative URL's. The WWW client has to know how to combine
a relative URL with an absolute URL to produce a new URL.
Sometimes I wish this were done on the server side, e.g.
GETREL <baseurl> <relativeurl>
But then I consider publishing a bunch of HTML files via FTP
or local-file access, and I see why it's done the way it's done.
The feature is useful, but I think there are better ways to
go about it. I think the current strategy as a hack, but
it's too widely deployed to turn back now.
* Search seed words. If in stead of:
GET /database?word+word+word
this had been expressed as:
TEXTSEARCH /database 'word word word'
or some such, then we wouldn't need to reserve ?
as a special character.
* Fragment identifiers. If these had been expressed as:
<A HREF="http://host/file.html" fragment="z10">
Then we wouldn't need to reserve # as a special character.
At various times, I have also suggested other information as optional parts
of a link:
Content-Type, for FTP, local-file, and other access methods that
don't define a content type for their data stream.
Bytecount, Modification Date, and MD5 signature -- in increasing
level of confidence, heuristics to validate links.
Bytecount, Title, Date, Author -- useful bits of info to
help the user decide whether to traverse the link at all.
I put them in the HTML DTD, but some folks objected, noting that
protocols should handle these issues. Some, like FTP, don't. I think
that in making these suggestions, I anticipated many of the URN and
URC issues by about two years.
URL's are handy because they're widely deployed. Folks use them.
Mosaic, lynx, and www-linemode implement them. But I think we are
still far short of a workable linking architecture. Until we have a
model of computation for links, including authoring, distribution,
navigation, and querying, we will not be able to address fault
tolerance, scalability, caching, or replication in a well-defined way.
In short, we need a model of computation for linking in order to make
WWW (or the internet information architecture, whatever you want to
call it) a high quality scalable distributed system.
Dan