Message-Id: <9409130114.AA02972@ulua.hal.com>
To: "Mark P. McCahill" <mpm@boombox.micro.umn.edu>
Subject: Re: <URL:...> considered harmful
In-Reply-To: Your message of "Mon, 12 Sep 1994 19:42:41 CDT."
<199409130042.TAA15490@boombox.micro.umn.edu>
Date: Mon, 12 Sep 1994 20:14:05 -0500
From: "Daniel W. Connolly" <connolly@hal.com>
In message <199409130042.TAA15490@boombox.micro.umn.edu>, "Mark P. McCahill" wr
ites:
>In message <9409122013.AA02543@ulua.hal.com> "Daniel W. Connolly" writes:
>>
>> <>'s are already used for mail addresses (e.g. <connolly@hal.com>)
>> and message ID's (e.g. in <12343@hal.com>, Dan writes:...) and sgml
>> tags (e.g. for more info, see <a href="...">this</a> -- even in plain
>> text, folks write this these days).
>>
>> URL: looks like a URL scheme, but it's not.
>
>It MIGHT look like a scheme if you recognize schemes based only
>on if there is a word followed by a colon, but that seems like a
>really unreliable/lame way of recognizing a URL in text...
Would you care to suggest an alternative? I didn't write the code that
picks URLs out by looking for scheme:... . I just observed that it
exists and is widely deployed (hypermail, python's urlopen module
etc.)
>In these examples you are using the whitespace to delimit the URL.
>This limits the length of the URL to one line which is a real problem
>for any URLs that are longer than a line. Having a wrapper around the
>URL does not preclude you from having a program that can parse text to
>find URLs and make them something the user can double click... and the
>wrapper makes it possible for even long URLs to be automatically detected
>and parsed.
If this feature is so valuable, why has it not been implemented and
deployed by now? URL's have been around for 2 years. Nobody seems
to need anything more reliable than whitespace to delimit URLs in
actual practice. If they do need more reliability, they use some other
format besides plain text.
>> "What about long URLs?" you might ask. Well, they don't work in plain
>> text. They just don't.
>
>They don't unless you have an explicit wrapper so you know when the
>URL begins and ends. That is what <URL:...> provides.
How is this url: <URL:ftp://cnri.reston.va.us/
internet-drafts/draft-ietf-uri-url-07.txt> better than
this one: ftp://cnri.reston.va.us/
internet-drafts/draft-ietf-uri-url-07.txt ? A human reader will
understand. Computers? We already discussed
the nightmarish performance implications of parsers looking for the
closing '>' since this requires arbitrary backtracking. All that
aside, my point is that if it were really all that valuable, we'd
have tools that exploit it by now. We don't. So let's not bloat
the spec with it.
>> The receiver has to glue them together by hand.
>
>The reciever doesn't have to do this if there is a program that
>understands the wrapper and automates stripping the whitespace and
>linebreaks... providing a standard wrapper makes it feasible to
>write such a program and deploy it
OK, but we agree that such programs do not exist, so we have the
freedom to choose the delimiters. Why not choose the delimiters that
cause the least grief for implementors and users? Something like
regular old RFC-822 header syntax:
URL: ftp://cnri.reston.va.us/
internet-drafts/draft-ietf-uri-url-07.txt
or something vaguely lispish:
(URL: ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-07.txt )
or how about:
(URL: "ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-07.txt")
but I don't like that as much because there _are_ tools that allow me
to take advantage of URLs surrounded by whitespace (xterm, Motif
text widgets, emacs, etc.)
>> It's a tedious, error-prone situation with no widely deployed
>> solution. Emperical arguments to the contrary are welcome.
>
>If I know where the URL begins and where the URL ends, then I can strip out
>whitespace, linebreaks, etc. have a URL that works. URLs for many resources
>are longer than one line of text, and it is important that even long URLs
>can automatically and reliably be extracted from text and resolved. So
>we need a wrapper.
Emperical evidence (that there are no tools to attack this problem,
yet we get by OK) says that this is not important.
>> <URL:...> is invention by committee. It serves no useful purpose.
>
>It serves a very useful purpose. It tell me in a standard way that there
>is a URL inside the wrapper, and tells me where the wrapper begins and ends
>so I extract the URL from the text around it even if the URL is really long.
_Today_ it does not serve that purpose. _Today_ it serves to make
selecting URLs from the surrounding text _more_ difficult. When tools
have been deployed to extract URLs from plain text have been deployed
and experience from using those tools shows that having a wrapper is
better than not having a wrapper, let's think about standardizing
on a wrapper.
>> It
>> is harmful in at least the above ways.
>
>The examples you used to claim that the <URL:...> wrapper is harmful were:
>
>1.) E-mail addresses and message IDs use <>. However, it is easy to
> write a program that can differentiate an e-mail address like
> <connolly@hal.com> from <URL:gopher://gopher.tc.umn.edu/11/fun>.
Again, argument by assertion. "It is easy..." Exactly what criterion
would you use to differentiate a URL from a message id? Is the following
a URL or a message id:
<URL:my-scheme://foo.bar/abc@foo.com>
It satisfies the syntax of both, since the path syntax of my-scheme
might allow '@' characters.
>2.) SGML tags MIGHT appear in plain text and SGML uses constructs like
> <a href="...">this</a>.
>
> I wonder how much weight this argument should have since your first
> argument seemed to be that <> shouldn't be used since it might be a
> wrapper around an e-mail address.
Not much. Agreed.
Dan