Re: Why URN is a subset of URL

Peter Deutsch (peterd@bunyip.com)
Thu, 6 Oct 1994 10:25:27 -0400

Message-Id: <9410061425.AA10074@expresso.bunyip.com>
From: Peter Deutsch <peterd@bunyip.com>
Date: Thu, 6 Oct 1994 10:25:27 -0400
In-Reply-To: "Roy T. Fielding"'s message as of Oct 5, 21:17
To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>, uri@bunyip.com
Subject: Re: Why URN is a subset of URL

[ Roy Fielding wrote: ]

> Because of the non-uniform syntax for URLs (as defined by
> draft-ietf-uri-url-07.txt, section 2), there is no difference
> whatsoever between what we choose to refer to as "URL" and what
> we choose to refer to as "URN". Although we may wish to "imply"
> some philosophical difference, there is no real difference in the
> specification and URNs (of whatever sort) can easily be defined as URLs.

But as someone else has pointed out, the semantics are
completely different. You cannot compare two URLs and
determine anything about their associated content, anymore
than you can compare two RAM addresses and determine
anything about what's in them (and please let's not
confuse the issue with discussions about hardware mediated
code and data spaces! :-)

. . .
> > When I get lots of archie hits I cannot simply compare
> > their URLs for equality to see if they're the same
> > document (because a URL identifies a resource's location
> > but not its content), nor can I be sure that I found all
> > copies of a document (because someone is free to rename a
> > document and its URL would change).
>
> You are referring to a quality of the http, ftp, wais, file,
> (and several other) schemes, not to the abstract concept of a URL.
> If a particular locator scheme defined a one-to-one correspondence
> between document content and URL, the above statement becomes false.

I think the issue here is that unfortunately different
people are using the same words for different concepts and
then arguments break out when others don't reach the same
conclusions after reasoning about their different
concepts. _Your_ abstract concept of the thing you call a
URL has certain properties and semantics which I identify
with an abstract concept I call a URN. Now, I certainly
don't want to reopen the infamous acronym wars in which
the meaning of these were all thrashed out, but I do want
to make clear what I mean when I use the terms URN and URL.

To me we can identify two classes of identifier, with two
sets of semantics. One is a _locator_, which will by
necessity contain all the information needed for a
competent programmer to obtain access to an instance of a
resource. This I call a URL. The other is an _identifier_
for a class of resource, which I call a URN. Whether these
have similar syntax is open to debate, but there is no
absolute requirement for this. In fact, there are reasons
you might want to differentiate (just as we differentiate
between the syntax of a DNS domain name and an IP address).

The legal actions I can perform on these two classes of
item are different. If I give what I call a URN to Mosaic
today it will fail to access that resource. I know I can
build a client which can attempt to obtain a URL, given a
URN, and then perform the access (presuming the syntax
allows me to determine when I have an item of either
class) but just as there is a real difference between
domain names and IP addresses (and what you can do with
them), there is a real difference between a URN and a URL.

We may or may not choose to use similar syntax for both,
but there is of course no burning need to do so. In fact,
we might choose to use different syntax because the
functional requirements are different, or simply because
we want to take the opportunity to remove the silly
multiple redundency of "://". In any case, we should
remember that for users who only have Mosaic there should
be some clue that if they are given an ISBN number, Mosaic
will not fetch it for them. That matters and we should
keep it in mind here.

> > . . . A URL gives you the information you need to
> > access a copy of a resource.
>
> This is true.
>
> > It does _not_ allow me to
> > perform the operation I need to perform, which is to compare
> > multiple instantiations of resources for equality of
> > content without examining the content itself.
>
> Not true -- that is a property of the scheme, not of the URL syntax.

Sorry, I'm still not convinced. There are actions I can
perform on a URN which make no sense on a URL, and vice
versa. You can build a system which performs sequences of
actions, but that doesn't mean that the individual actions
are all the same. You've just automated a sequential
process. You still need to determine when it is necessary
to perform the various steps in the process. For that, you
need to distinguish between URNs and URLs.

. . .
> No problem,
>
> urn:<unique-content-identifier>
>
> is a URL with (in theory) one-to-one correspondence with the content.

Again, I claim we are simply using the same word for
different concepts. Your example does not have the
property that I can access the resource named. I may be
able to use it as an index into a database to look up the
needed info, but in that case it is an index, not a URL
(in my vocabulary). To date, I have used the term URN for
such an index item.

. . .
> > In the archie context, we plan to serve to our users both
> > a location pointer and a content identifier at the same
> > time. Thus, a search for the string fred might return:
> >
> > URN:12345 URL:ftp://site.com/pub/fred/
> > URN:45666 URL:gopher://site.com/usr/fred/
> > URN:12345 URL:ftp://bozo.com/pub/fred/
> > URN:59555 URL:ftp://mysite.edu/pub/fred/
>
> Why? Why not return
>
> (urn:12345, ftp://site.com/pub/fred/, ftp://bozo.com/pub/fred/),
> (urn:45666, gopher://site.com/usr/fred/),
> (urn:59555, ftp://mysite.edu/pub/fred/)

What I showed is not our final syntax, I just spelled it
out that way to make clear that you would have four
different hits from the collections. Grouping URLs with
corresponding URNs makes sense, but it does seem to
reinforce my belief that there is a difference between
URNs and URLs. Otherwise, why not simply return:

(12345, ftp://site.com/pub/fred/, ftp://bozo.com/pub/fred/)

Answer - because you couldn't distinguish in this list
which are valid URLs and which need to be processed before
attempting access. It seems to me that if you need to
determine something about them to determine what you can
do with them them, then they are different.

- peterd

[* begin gratuitous archie support question aside *]
. . .
> Well, thanks very much for that -- it's an unbelievable pain right
> now to do a search and not be able restrict it by country-code.

Actually, you can limit an archie search geographically
today with the telnet client. The syntax is something like:

find <search-string> *.edu

I don't know if any of the client authors picked up on
this functionality, I use telnet....

[* end of gratuitous archie support aside *]

-- 
==============================================================================
...
"It's a -. Shall I tell him?" he asked, looking at Bill. Bill nodded, and
 the Penguin leaned across to Bunyip Bluegum and said in a low voice,
 "It's a Magic Puddin'."
...
"that's where the Magic comes in," explained Bill. "The more you eat the more
 you gets. Cut-an'-come-again is his name, an' cut, an' come again, is his
 nature. Me and Sam has been eating away at this Puddin' for years, and
 there's not a mark on him."
                               "The Magic Pudding", by Norman Lindsay

Sounds like a pretty good analogy for the Internet to me (and yes, that's where we got the name "Bunyip"...) ==============================================================================