Date: Mon, 12 Sep 1994 19:42:41 -0500
Message-Id: <199409130042.TAA15490@boombox.micro.umn.edu>
From: "Mark P. McCahill" <mpm@boombox.micro.umn.edu>
To: connolly@hal.com, uri@bunyip.com
Subject: Re: <URL:...> considered harmful
In message <9409122013.AA02543@ulua.hal.com> "Daniel W. Connolly" writes:
..[deleted text...]
>
> Further, if we do need some way to reliably pick URLs out of plain
> text, let's use _anything_ but <>'s and URL:
>
> <>'s are already used for mail addresses (e.g. <connolly@hal.com>)
> and message ID's (e.g. in <12343@hal.com>, Dan writes:...) and sgml
> tags (e.g. for more info, see <a href="...">this</a> -- even in plain
> text, folks write this these days).
>
> URL: looks like a URL scheme, but it's not.
It MIGHT look like a scheme if you recognize schemes based only
on if there is a word followed by a colon, but that seems like a
really unreliable/lame way of recognizing a URL in text...
> There is lots of software
> that searches for URLs by using a regular expression like:
>
> [A-Za-z0-9\.-]+:[^ \t\n]+
>
> I suppose they can use:
>
> [A-Za-z0-9\.-]+://[^ \t\n]+
>
> and get away with it, at least for URLs that use the //hostname syntax.
>
> Each piece of software that basically looks for "scheme:..." will have
> to have a special case to check to see if scheme: is URL:, and skip it
> if so.
>
> In practice, I find that the most reliable way to communicate a URL in
> plain text is to put it on a line by itself, preferably with a little
> whitespace on each side, e.g.:
>
> ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-07.txt
>
> That it is a URL is self-evident, or given by context.
>
> I'm willing to see something like:
>
> URL: ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-07.txt
>
> or:
>
> (URL: ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-07.txt )
>
> with space before and after the URL itself. That way, folks can
> double-click on the URL and get the right thing, and all sorts of
> other happy, practical things.
>
In these examples you are using the whitespace to delimit the URL.
This limits the length of the URL to one line which is a real problem
for any URLs that are longer than a line. Having a wrapper around the
URL does not preclude you from having a program that can parse text to
find URLs and make them something the user can double click... and the
wrapper makes it possible for even long URLs to be automatically detected
and parsed.
> "What about long URLs?" you might ask. Well, they don't work in plain
> text. They just don't.
They don't unless you have an explicit wrapper so you know when the
URL begins and ends. That is what <URL:...> provides.
> The receiver has to glue them together by hand.
The reciever doesn't have to do this if there is a program that
understands the wrapper and automates stripping the whitespace and
linebreaks... providing a standard wrapper makes it feasible to
write such a program and deploy it
> It's a tedious, error-prone situation with no widely deployed
> solution. Emperical arguments to the contrary are welcome.
>
If I know where the URL begins and where the URL ends, then I can strip out
whitespace, linebreaks, etc. have a URL that works. URLs for many resources
are longer than one line of text, and it is important that even long URLs
can automatically and reliably be extracted from text and resolved. So
we need a wrapper.
> <URL:...> is invention by committee. It serves no useful purpose.
It serves a very useful purpose. It tell me in a standard way that there
is a URL inside the wrapper, and tells me where the wrapper begins and ends
so I extract the URL from the text around it even if the URL is really long.
> It
> is harmful in at least the above ways.
The examples you used to claim that the <URL:...> wrapper is harmful were:
1.) E-mail addresses and message IDs use <>. However, it is easy to
write a program that can differentiate an e-mail address like
<connolly@hal.com> from <URL:gopher://gopher.tc.umn.edu/11/fun>.
2.) SGML tags MIGHT appear in plain text and SGML uses constructs like
<a href="...">this</a>.
I wonder how much weight this argument should have since your first
argument seemed to be that <> shouldn't be used since it might be a
wrapper around an e-mail address. If <> is bad in a URL wrapper in
plain text, then the <> use of SGML must be equally bad. But if people
are throwing SGML into text along wth e-mail addresses you are apparently
successfully handling both these cases together, so you can certainly
differentiate <URL:gopher://gopher.tc.umn.edu/11/fun> from both e-mail
addresses and SGML.
3.) <URL:....> looks like gopher://gopher.tc.umn.edu/11/fun
This is only true if you have a lame way of detecting URLs. Since you
must be looking at the URL scheme to determine if you can resolve it,
you aren't going to be fooled by <URL:....>.
Mark P. McCahill gopherspace engineer
mpm@boombox.micro.umn.edu University of Minnesota
612 625 1300 612 625 6817 (fax)