Re: Changes to URL document

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Thu, 17 Feb 1994 13:30:02 -0700

From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Thu, 17 Feb 1994 13:30:02 -0700
Message-Id: <199402172030.NAA21255@idaknow.acl.lanl.gov>
To: timbl@www0.cern.ch
Subject: Re: Changes to URL document

Tim Berners-Lee says:

> From: Tim Berners-Lee <timbl@ptpc00.cern.ch>
> Alan, you say:
>> From: bajan@bunyip.com (Alan Emtage)
>> Date: Mon, 7 Feb 1994 11:28:49 -0500
...
>> 5) The WG has decided that the "URL:" prefix is standard and this should be
> ^^^^^^^^^^^^^^^^^^
>> made clear in the draft. Currently the only place that this appears is in
>> the BNF. It should rightly be part of the "Scheme" section which
>> currently makes no mention of it.
>
>Sorry, I wasn't clear to me that it had. Look at Larry Masinter's
>message of 17 Dec available as
><http://www.acl.lanl.gov/URI/archive/uri-archive.messages/900.html>
>summaries the problems. My personal feeling is that this shouldn't hold
>us up as defining the URL itself is more important that its wrappers
>for plain text. But there seem to be a lot of suggestions about this.
>Do you regard prefix as part of the URL, or part of a wrapper for
>plain text use? Have I missed a roar of consent about this one?

Wow, the first citation to my service. Thanks! As for missing a roar
of consent, I don't think so. (But I'm a newbie) I think that the consent
is that the wrapper and tags are for picking URLs out of free text, not
for them to be part of the URL itself. Some background:

As Carl Hauser pointed out to me a couple of days ago, another message
that seems related to the issue of the URL prefix is Jock Williams' message:
<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.messages/572.html>.
He argues that we are confusing the syntax of a URL (or URN or other
strings) with its encoding in various document formats. Jock says:

! <URN:ISSN:....> for ascii.
! <urn>URN:....</urn> for sgml
! -- for ASN.1 based applications
! urn ATTRIBUTE WITH
! ATTRIBUTE-SYNTAX caseIgnoreString -- case sensitivity is an open issue
! SINGLE VALUE
! ::= urn-att-id
!
!These examples show different encapsulations of the same string value. They
!also hightlight a redundancy in that both the SGML and ASN.1 examples already
!tag the encapsulated data, and dont require an additional tag within the value.
[...]
!you can eliminate the redundancy if you view the tagging as the
!responsibility of the encapsulation rather than the URN itself:
! <URN:ISSN:....> for ascii.
! <urn>ISSN:....</urn> for sgml
!
!where the components "<URN:" and ">" are viewed as part of the encapsulation
!into ascii, not an inherent part of the URN value.

In other messages, such as those in the thread following
<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.messages/822.html>,
it is stated that the purpose of the prefix is to pick URLs out of free text.

I like much of what Larry Masinter suggests in the message you cite. I am
not real crazy about the prefix being able to go inside or outside of the
wrapper, but it is not that big of a deal. Larry seems to suggest it so
that it is legal to say things like: <A URL="foo://host/string"> </A> in
HTML or URL:"http://host/string" in Gopher. I would contend that these
are not free text, they are particular encoding schemes, and that encoding
schemes should define their own method for picking out URLs.

There are some things that I feel strongly about and others that I
don't really care about one way or the other. Here is a list of
my feelings (technically-based, I assure you:-), sorted into decreasing
order of intensity.

1) The prefix for URLs must start with "URL". This is so that URNs will be
able to have their own prefix once they get standardized. The delimiter
should probably be ':', but I don't have any objection to '='.

I would have to go to battle over any attempt to make the prefix
something like "URI". I think that URNs and URLs will probably be
treated differently enough that they should be easy to discriminate.

2) The purpose of the wrappers and prefix is to make it easy to pick
URLs out of free text, such as this mail message. This wrapper is NOT
part of the URL. However, the default wrapper for free text should be
defined in the standard. The wrapper, tag, or whatever for other
document formats is up to the developers of that format. If they want
to use the free text wrapper, fine. If they want to define a method
that is more in the spirit of their format, such as Jock's ASN.1 example
above, that's fine too.

I would bitch and moan about requiring the wrapper and prefix as
part of every URL, but would eventually go along because this does
need to be finalized and it is just not that big of a technical issue.
(Stylisticlly, I think the redundancy Jock mentions is ugly as sin.
Technically, I thnk it is a trivial detail).

3) The prefix goes inside the wrapper characters. As I mentioned above,
things like HTML are their own formats, so according to point 2
they can use URL="protocol://host/string" if they want. Also, it is
going to be a bit easier to parse if we don't have to deal with
cases like prefix inside or outside the wrapper, : or = as the
prefix delimiter, etc.

I would point out parsing ease as a consideration, but would
otherwise not have any objection to allowing the prefix outside
the wrapper.

4) The wrapper characters. I think that the characters are necessary in
order to identify the end of the URL in the presence of line breaks,
spaces, and bizzare strings like WAIS uses. Larry Masinter suggests
that {}, "", and '' be allowed, as well as the common <>. That might
be nice. It would be slightly harder to parse, but would cover the
case where '>' might appear in the string part of the URL.

The particular characters are a matter of the utmost indifference to me.

>> -Alan
>
> Tim

Ron