Re: <URL:...> considered harmful

Chris Weider (clw@mocha.bunyip.com)
Thu, 15 Sep 94 07:18:56 -0400

Date: Thu, 15 Sep 94 07:18:56 -0400
From: Chris Weider <clw@mocha.bunyip.com>
Message-Id: <9409151118.AA23497@mocha.bunyip.com>
To: rtor@ansa.co.uk, sdm7g@virginia.edu
Subject: Re: <URL:...> considered harmful

Hi Dan and others:
Since I am the one who proposed the wrapper in the first place, let me state
why I think we *still* need something like this, and suggest some possibilities
for a modified wrapper now that we threw out the URL: prefix at the last IETF.

We still need a way to distinguish a URL in plain text. Using a scheme-based
recognition technique, which looks for a valid scheme and then extracts the
rest of the line (or the rest of the line up to the next white space) has several
problems. They are:
Scheme recognition. The number of new schemes will constantly increase.
Thus, without a generic wrapper, sites which have not installed the latest
set of schemes into their extraction tool will not be able to correctly
identify valid URLs embedded into text. A *human* might be able to,
if they are familiar with all the schemes,
but there will still be many that are missed by an automated scheme.
(I'm disregarding here the actual resolution of the URL).

Line length. The proposals I've seen for the X.500 URL will require far
more than 80 character lines. UNIX allows at least 128 character long
path names. A tool that simply goes to white space will not pick up
longer URLs because of the end-of-line/next space ambiguity.
In addition, that magic 80 character constraint is purely an artifact
of the fact that most mail tools only use vanilla ASCII. How can I
recognize the end of a URL if I've decided to create my masterpiece in
36 point type, which may cause an 80 character URL to extend over
three lines or more?

Human recognition. What's my current algorithm? Look for colons and
then scan the surrounding text hoping to recognize some URL format?
I think that we can be substantially more friendly than that.

So, having said that, let me propose a solution. I freely admit that my
suggested wrapper doesn't fit into the 'sgml'ish flavor of HTML.
So. Two suggestions that may fit better...

1: Highly recommending the anchor syntax (with surrounding <A> and </A>)
for all URLs quoted in free text. This allows the immediate display of any
text based document (with the appropriate semantics) through Mosaic.

2: The development of a new tag, call it URI, for example,
<uri ref="http:blah/blah/blah"> and highly recommending its use. This is
perhaps less general, but is a fairly useful hack in my opinion, and allows
all types of references to be placed inside.

In either case, I hope that I've convinced you of the necessity of a wrapper.
Tools are already being developed to take (for example) e-mail and extract
the URLs: if we can make their job easier, I think that will be a major win.
Yes, it does mean that we have to make some changes now, but I
believe that this will save us a lot of trouble in several years.

Chris Weider