To: uri@bunyip.com
Subject: Re: Why URN is a subset of URL
In-Reply-To: <Pine.3.05.9410060850.A8508-b100000@suna>,
<9410061425.AA10074@expresso.bunyip.com>,
Date: Fri, 07 Oct 1994 03:49:03 -0700
From: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Message-Id: <9410070349.aa17724@paris.ics.uci.edu>
I see that terminology seems to be the central point of confusion,
and the rest of the comments are just going around in circles. I'll try
to address the salient issues in one message, rather than bombarding
the list with another round of replies.
I was talking ONLY about IETF URLs, as defined by the particular spec
carefully referenced in my first message. The WWW uses a different set
of definitions, wherein WWW-URI == IETF-URL. WWW-URL and WWW-URN are
subsets of the common syntax defined by WWW-URI's (RFC 1630).
This working group discarded those definitions (foolishly, IMHO) when
it decided to separate the URL and URN syntax and to ignore any notion
of a URI syntax. Nevertheless, we did NOT discard the URI syntax -- we
just changed its name to URL. Thus, all things which fall under the
WWW banner of URI also fall under the IETF banner of URL, even though
it may sound weird to talk about a URN as a Locator.
In <Pine.3.05.9410060850.A8508-b100000@suna>, Jon Knight writes:
> Hmm, I'm not so sure. I think that the fact that URLs and URNs may have
> similar syntax _is_ an implementation detail, but the semantics are
> _defined_ to be different and its the semantics that are really the
> important thing (I think its also what Peter was probably getting at).
> URLs are resource _locators_; URNs are _location_ independent names. How
> something which is location independent can be a member of a subset of a
> set of locators is a little beyond me at the moment.
Allow me to check with Mr. Webster:
lo.ca.tor \'l<o^->-,k<a^->t-<e>r, l<o^->-'\ n (1784)
:one that locates something (as a mining claim or the course of a road)
lo.cate \'l<o^->-,k<a^->t, l<o^->-'\ vb lo.cat.ed; lo.cat.ing (1652)
[L locatus, pp. of locare to place, fr. locus] vi
:to establish oneself or one's business: SETTLE locate vt
1: to determine or indicate the place, site, or limits of
2: to set or establish in a particular spot: STATION
3: to seek out and determine the location of
4: to find or fix the place of esp. in a sequence: CLASSIFY
So, at this point I'd like to ask:
Is it a requirement of any URN definition (for the purpose of IETF)
that a URN be capable of being used to seek out and determine a
location of the Resource identified by the URN?
The answer is yes, according to the "Requirements for Uniform Resource Names"
<draft-ietf-uri-urn-req-00.txt>, and thus all IETF URNs will also be Locators.
Is this a paradox? Absolutely not. It's a little late right now for a
formal proof, but just think of it in terms of mathematics:
Let URL be the set of all functions L, such that
(1) L : url_string -> resource_instance
and URN be the set of all functions N, such that
(2) N : urn_string -> { url_string }
and C1 be any function (e.g. nearest, least_expensive, random), such that
(3) C1 : { url_string } -> url_string
Now, using the rules of function composition,
(4) N o C1 o L : urn_string -> resource_instance
In other words, L(C1(N(urn_string))) -> resource_instance
Note that both "url_string" and "urn_string" are just strings, independent
of any semantic meaning (it is the functions, and only the functions, which
assign the semantics). Thus, if the allowed syntax for url_string's
encompasses that of urn_string's, both can be replaced by a single
type string which I will call "uri_string". If we assume that is the case,
we can rewrite (1) through (4) without losing any semantic meaning:
(1) L : uri_string -> resource_instance
(2) N : uri_string -> { uri_string }
(3) C1 : { uri_string } -> uri_string
(4) N o C1 o L : uri_string -> resource_instance
Recall that we said "Let URL be the set of all functions L s.t. (1)"
Since the composition function defined by (4) is such a function,
(5) (N o C1 o L) is an element of the set URL
which also means that the UNION of the domains for N, C1, and L cannot
contain any elements outside the domain of L, which is the same as
saying the domain of N must be a subset of the domain of L.
Therefore, if the allowed syntax for url_string's encompasses that of
urn_string's, then
(6) The set of all legal urn_string's (as defined by the set URN)
is a subset of the the set of all legal url_string's (as defined
by the set URL).
Which is what I meant by my original statement (and the subject of this
thread) that "URN is a subset of URL".
Finally, the only question that remains is whether or not the assumption
regarding "allowed syntax" is valid. But that's obvious -- just look
at the IETF URL spec and you will find that the only real requirement
for the URL syntax is
<scheme>:<scheme-specific-part>
and the set of encoding conventions described in Section 2. Since ANY
concept of URNs can be mapped into these syntactic requirements without
ANY loss of semantics, our assumption is valid.
The moral of this story is that it is a bloody waste of everyone's time
to define a syntax for IETF URNs which is separate from IETF URLs, and
that if you desire to place some special significance on the acronyms,
then we should resurrect the common syntax called "URI" such that the
syntax is associated with URI instead of URL and we won't get into
religious wars every time URNs and URLs are compared.
=====================================================================
In <9410061425.AA10074@expresso.bunyip.com>, Peter Deutsch writes:
> To me we can identify two classes of identifier, with two
> sets of semantics. One is a _locator_, which will by
> necessity contain all the information needed for a
> competent programmer to obtain access to an instance of a
> resource. This I call a URL. The other is an _identifier_
> for a class of resource, which I call a URN. Whether these
> have similar syntax is open to debate, but there is no
> absolute requirement for this. In fact, there are reasons
> you might want to differentiate (just as we differentiate
> between the syntax of a DNS domain name and an IP address).
While it is reasonable to want to differentiate between the two
types of identifiers, it is not reasonable to use a different syntax
for that purpose. <scheme> is a sufficient method of differentiation
for any identifier. The only things accomplished by changing the
syntax is the prevention of URLs and URNs from being used interchangeably
within a single data type (i.e. a loss of generality) and an unnecessary
doubling of the code required to parse identifiers.
Since there is no technical reason to choose a separate syntax, there
is no valid reason why this IETF working group should use a separate syntax.
> The legal actions I can perform on these two classes of
> item are different. If I give what I call a URN to Mosaic
> today it will fail to access that resource. I know I can
> build a client which can attempt to obtain a URL, given a
> URN, and then perform the access (presuming the syntax
> allows me to determine when I have an item of either
> class) but just as there is a real difference between
> domain names and IP addresses (and what you can do with
> them), there is a real difference between a URN and a URL.
First off, not even the WWW community makes technical decisions
based on what Mosaic can or cannot do. Second, that statement is
false in any case. If a URN is defined such that the syntax does
not conflict with the IETF URL syntax of <scheme>:<scheme-specific-part>,
then the current version of XMosaic (and many other WWW clients)
is perfectly capable of accessing that resource, providing that some
proxy gateway exists which can do the URN->URL resolution and return
an HTTP redirect message. All the user has to do is define an environment
variable called <scheme>_proxy which points to the gateway, e.g.
setenv isbn_proxy http://gateway.ics.uci.edu:8080/
and then all references of the form "isbn:anything" will result in the
request
GET isbn:anything HTTP/1.0
being sent to port 8080 of gateway.ics.uci.edu. What the gateway returns
is entirely up to the particular piece of software acting as the server
on that machine and port. It may very well be an HTTP -> whois++ gateway.
The point which should be taken here is that the above will ONLY work
as long as the URN syntax does not conflict with the URL syntax.
This is yet another example of why making the URN syntax different
"just because we feel it may someday be useful to be different" is a
bad idea and should be rejected as such by this working group.
=====================================================================
In <9410061314.AA04832@plato.ansa.co.uk> Owen Rees writes:
> My informal distinction is that a URN is the name of a resource, but a URL is
> the name of a location at which the resource exists. (Replace "location" with
> "means to retrieve a resource from a specified location" if you want to go
> into that aspect of URLs.) The "news:" scheme falls into the grey area
> between URNs and URLs, but that just shows that there are useful things
> that do not fit neatly into this taxonomy.
Or that the taxonomy is not useful for distinction between URL and URN.
> There are times when the distinction between the resource and the location is
> important, and times when it is unhelpful. For example, consider the
> difference between:
> Fetch the resource that is at URL
> Fetch the resource from URL
>
> In the first case, any instance of the resource will do, I will be satisfied
> with the copy that is in the local cache. This is a case where current
> implementations of caches use URLs as if they were URNs. The second case
> emphasizes the location, and I have to run a browser that does not use the
> cache to get this effect.
This is, I think, the best argument I've seen for such a distinction.
However, the distinction is already provided in the form of <scheme>,
just as
news:comp.infosystems.announce
is distinct from
nntp://paris.ics.uci.edu/comp.infosystems.announce
They both locate the same resource when referenced from my system, but
different ones when referenced from outside ICS. There is no need for
a separate syntax just because we want to be able to identify such
distinctions, nor would a separate syntax be any more expressive than
the one we already have.
=====================================================================
In <9410070022.AA10974@expresso.bunyip.com> Peter Deutsch writes:
> Next, I would reply that forcing the world to adopt all of
> the WWW URL syntax without modification at this point is
> more that a waste of valuable resources. It's plain wrong.
> I can enumerate areas where this would cause problems for
> others and thus is not necessarily the right thing to do.
Then by all means do so. To date, there has not been a single example
wherein adopting the WWW URL syntax would be any worse than adopting
some other syntax, given the stated requirements for IRL and URN.
> As but one example, I see problems if the WWW community
> were to insist on the need for partial references in the
> core standard, as to me they only make sense in the
> context of a single homogenous system, such as WWW. I have
> no trouble with WWW using them internally, I just don't
> think they have a place in the current URL standard, given
> where the rest of the community is right now and what we
> expect people to be doing with these things. Either the
> WWW community should document an extension (which I gather
> is the current plan in moving them to an appendix or
> whatever) or simply build their own URLs on IETF URLs, as
> I understand you can build extensions to a DTD provided
> you don't change the core DTD.
What? Good grief! Read <draft-ietf-uri-relative-url-00.txt>.
=====================================================================
*phew*
......Roy Fielding ICS Grad Student, University of California, Irvine USA
<fielding@ics.uci.edu>
<URL:http://www.ics.uci.edu/dir/grad/Software/fielding>