Message-Id: <9403252112.AA06283@ulua.hal.com>
To: Larry Masinter <masinter@parc.xerox.com>
Subject: Re: URL: Outstanding issues
In-Reply-To: Your message of "Thu, 24 Mar 1994 20:02:34 PST."
<94Mar24.200246pst.2732@golden.parc.xerox.com>
Date: Fri, 25 Mar 1994 15:12:49 -0600
From: "Daniel W. Connolly" <connolly@hal.com>
In message <94Mar24.200246pst.2732@golden.parc.xerox.com>, Larry Masinter write
s:
>> re: PrePrefix
>
>Tim, you're not helping us by reviving it this way. We decided that
>"Telephone:" is a mandatory part of writing down a telephone number in
>any ascii encoding of telephone numbers.
[Sorry, but I wasn't around when this was decided.]
"Telephone:" is only necessary when the context doesn't specify what
the ASCII string is. In the only widely deployed usage of URL's
(HTML) it's perfectly clear which strings are URLs and which are not.
There has been some noise about being able to pick URLs out of
plaintext. This is a nifty idea, but it's orthogonal to the problem
of specifiying URLs themselves.
If you want to be able to put URLs in plain text and get them back out
reliably, I suggest you need the following things:
1. A way to recognize the start of a URL. <URL: is perfectly
reasonable. But a pattern like /[a-zA-Z0-9]+:/ will probably
find more URLs. (though it will find more junk too)
2. A way to recognize the end of a URL. You could do this
by writing
<URL:16:http://host/file
meaning that the next 16 characters after <URL:16: constitute
the URL. This is somewhat clumsy, so we may in stead place a
restriction on URLs that they not contain the character '>'.
Then you can just write:
<URL:http://host/file>
If you intend to use the mechanism in internet mail, you'll
have to restrict URLs to 76 characters or provide a mechanism
for splitting them across lines. I'd suggest the MIME quoted-printable
encoding. But, if you don't want to drag MIME into the picture
just to mail URLs (a lazy and backward proposition, if you ask me)
you can make up your own line continuation mechanism (like
"ignore all the whitespace up to the next >")
But there's no more information in
<URL:http://host/file>
than in
<http://host/file>
or even just
http://host/file, which brings us to:
3. A way to be sure that what you found is really a URL and
not just something that looks like one by conicidence. For
example, none of the things in the preceding paragraph should
be treated as URLs -- they're text that talk about URLs.
You can't do this reliably with plain text, but there are
various heuristics: <>'s, the URL: prefix. A checksum would
be much more reliable.
If the motivation for all this URL-in-plaintext is for mailing URLs,
I'd strongly suggest some application of MIME technology. External
body messages with access-type="url" have been proposed. You're free
to use x-url for now.
>> re: Wrapper
>
>I would like to formally have my suggestion considered, that the wrapper be
>allowed to be any ONE OF < > or " " or ' ' or { }, to allow the
>wrapper to be used effectively in multiple contexts. This allows
>maximum flexibility, some compatibility with SGML (in HTML, the
>wrapper would be the "" or '' in <A HREF="url:...."> ).
There's no reason <>, "", , '', or {} can't be used to delimit URLs
right now. Even if a URL has a " character in it, this is expressible
in sgml:
<A HREF="local-file:/wierd/name/xxx"yyy">
the " represents a " character. So the SGML parser will pass
local-file:/wierd/name/xxx"yyy
to the application.
On the other hand, if you're suggesting that reserving the
characters <>""''{} in order to force folks to write:
<A HREF="local-file:/wierd/name/xxx%22yyy">
then I'd say that's pretty cheesy, but it's not a completely bad idea,
I guess. But it's just another heuristic. It doesn't solve anything.
Let's not get confused about what a URL is versus how to represent one
in context.
Dan