Message-Id: <9410070155.AA11084@expresso.bunyip.com>
From: Peter Deutsch <peterd@bunyip.com>
Date: Thu, 6 Oct 1994 21:55:31 -0400
In-Reply-To: "Daniel W. Connolly"'s message as of Oct 6, 10:19
To: "Daniel W. Connolly" <connolly@hal.com>
Subject: Re: Why URNs are a subset of URIs [Was: No "TOP" of the docuverse]
[ Daniel W. Connolly wrote: ]
. . .
> I do hope that URNs and URLs don't have to be incompatible just
> because one came from WWW and the other came from archie. They're both
> wonderfully useful applications, and the sooner they interoperate, the
> better.
Bunyip plans to adopt the IETF URL standard. If this were
to end up disagreeing with the WWW URL standard in some
way (say over the need for partials or whatever) we might
attempt to support both formats, but this would not a
strong requirement in our books. Assuming the WWW
community works with the IETF, we should interoperate in
the forseeable future just fine.
. . .
> In practice, sometimes you compare URLs, and sometimes you dereference
> them. . . .
Yes, but the result of comparing two URLs is not the same
as comparing two URNs. Two identical ISBNs mean you have
the same information, but they could be at different
locations. Two identical URLs means that you have the same
information and that they are at the same location. This
appears to me to be a significant difference...
. . .
> And I bet that sometimes folks will compare URNs, and sometimes they
> will dereference them using big databases.
Again, the actions are not the same. If I have a URL and
want a copy of the resource, I need to parse the URL,
identify the protocol and call a routine to grok that
protocol, with appropriate arguments. I will get back a
copy of the resource. If I have a URN, I need to parse the
URN and from that determine if dereferencing to a URL is
possible (for some classes of URN this may not be so). If
I think I can do this, I need a URL for a suitable server
(there may be lots of ways to do this, using multiple
protocols, so please don't hardcode in a URL. If you do,
you simply don't have a URN and are being silly).
Note that the above sequences of actions are not the same
and of course the results of those actions are not the
same.
> >> The idea that URNs are somehow fundamentally different from URLs is
> >> odd, and the proposals of deploying a namespace disjoint with the WWW
> >> address syntax is just plain silly.
>
> >I respectfully disagree with the above paragraph. The WWW
> >address space is just that, an address space, along with
> >accompanying protocol (and where appropriate, host)
> >information. A URL gives you the information you need to
> >access a copy of a resource.
>
> Well... except that you have to go through DNS to find out the "real"
> location of the resource. Have you read the cited notes by TimBL about
> the distinction between names and addresses, and how they blur? I find
> it quite convincing.
I'm not sure I've read everything Tim's written on this,
but I've stated my beliefs here a couple of times now.
I'll spare you a repeat for now...
> > It does _not_ allow me to
> >perform the operation I need to perform, which is to compare
> >multiple instantiations of resources for equality of
> >content without examining the content itself. On the other
> >hand, a suitable URN _will_ allow me to perform that
> >operation. Ergo, URLs and URNs are not the same thing.
>
> There are no URI schemes that allow this _yet_. So let's get busy and
> deploy them! That doesn't mean we have to reinvent the syntax.
It's not clear if we even have a disagreement here. We've
proposed schemes which have been prototyped. Michael
Mealling has a modified Mosaic which handles URNs
identified with the "URN:" prefix, which I believe goes to
a hardwired URN->URL server for now. Some people are
working on proposals for encoding DNS-based resolver server
names into URNs (which makes me nervous, but hey I drink a
lot of coffee). Work is starting on experimental
deployment. I don't see the need for artificial attachment
to a particular syntax at this point but if you're happy
with "urn:<something>" then we're in agreement. If you have
an alternative proposal, I missed it. If you claim that
everything is in fact a protocol, so "md5:1232..123" is a
URL, I disagree. Otherwise, please post something more
concrete so I can study it.
> >(BTW, I certainly don't require URNs to have high
> >availability nor authentication. I merely require that
> >they identify content, not location.)
>
> We must be using different definitions for the same terms -- otherwise
> the above is pure doublespeak. . .
I already made the same observation (about overloading of
terms, not that I speak doublespeak! ;-) in a previous
posting. I also admitted that I become confused by your
terms so please let's not start name calling just yet.
It's still only 10:00pm my time. Name calling should begin
only after midnight!
> . . . Let's take for a working definition of
> high availability:
>
> A resource is _highly available_ if there is no single
> point of failure between the producer and the consumer
> of the resource. An optimal high availability strategy
> will also result in consumers accessing the "nearest"
> replica of a resource most of the time.
And I therefor repeat that I don't require URNs to have
"high availability". From your definition that seems to be
a property of resource servers, not URNs. The use of
URN->URL resolvers would seem to help us build highly
available services, but "highly available URNs" would seem
to be nothing more than URNs that don't periodically
disappear from the text of a file... ;-)
Now, I know that's not what you meant, but that's why I
say we're arguing at cross-purposes. Life's too short for
this.
> And for authentication:
>
> A data entity E is an _authentic_ representation of
> a name N at time t iff the owner of N has certified
> that it is so.
And I don't require this of all URNs. I'd like it, and can
use it at times, but I have applications coming up where
I'm going to be able to live in some cases without it.
Thus, I don't "require it".
You may, which means your life's going to be harder than
mine. For me there are times where if a server responds
"URN:MD5:12312..3" or "URN:ISBN:123-312-2333" I'll just
take their word for it. Certification can be done in other
ways.
> For example, for resources that consist of a sequence of bytes, you
> can use md5://... . The "owner" of all md5:// names is the special md5
> principal, and E is an authentic representation of md5://sum iff
> md5(E) = sum, independent of time.
I'm not sure I agree that you must always require the
"independent of time". For some applications this is more
important than others.
> For resources that change (like weathermaps), we need to deploy a
> identifiers like urn://principal/name, where to check that E is an
> authentic representation of urn://principal/name at t, we need a
> signed certificate that says so, for example
> (C, S)
> where C = (principal, name, cksum, t0, t1)
> and S is the RSA signature of C with principal's key. Once we have
> obtained principal's key and verified C w.r.t. S, we check that
> md5(e) = cksum and that t0 <= t <= t1. (Although time is a slippery
> thing in distributed systems... we need to think about it some more).
This is a classical illustration why the proposal in the
discussion paper Chris and I put out calls for
"URN:<naming-scheme>:whatever". You have proposed your own
URN scheme. It serves your purpose. You should be able to
register the naming scheme name and deploy this without
needing to reconvene the URI working group, which is
destined to retire to dribbling obscurity in the Kamchatka
peninsula once we finish our last document (because we
think there's no email access to the Kamchatka peninsula.
We're probably wrong...)
Implementation of your scheme is up to you. Thus, whether
there exists a URN->URL mechanism is up to you to
organize. Whether your scheme is "authentic" is up to you.
Those who are happy with "PGU" (Pretty Good URNs) can get
started without talking to you at all and you can go off
and deploy now while I wipe my chin. The one thing the
past couple of years on this topic has taught me is that I
was silly to think we could build a critical mass
consensus before deploying something concrete. I aplogize
for being so silly...
> >With that as background, let's consider a couple of
> >scenarios.
> >
> >In the archie context, we plan to serve to our users both
> >a location pointer and a content identifier at the same
> >time. Thus, a search for the string fred might return:
> >
> > URN:12345 URL:ftp://site.com/pub/fred/
> URN:45666 URL:gopher://site.com/usr/fred/
> > URN:12345 URL:ftp://bozo.com/pub/fred/
> > URN:59555 URL:ftp://mysite.edu/pub/fred/
> >
> >This allows me to see that the first and third entries are
> >the same item, so I don't need to examine both
>
> First we'll notice that modulo capitalization, the above URNs
> are perfectly good WWW addresses (URIs, if you will). Writing
> URL: in front of the gopher: and http: addresses is redundant.
Boy am I confused here. In fact, I don't think we're
talking about the same thing at all at this point. Modulo
what capitalization? Which URNs are valid URLs? Do you
assume that each entire line above is a URN? Help me out
here.
I meant to imply above that the URN consists only of the
string "URN:12345", which identifies a particular
information item. It has associated URLs, but those are
not part of the URN in my book. I didn't bother with
dreaming up a particular detailed syntax, it's just an
abstract identifer here.
Now, the URNs and URLs above may have a similar syntax
(<something> ':'<something>) but that doesn't make
"URN:12335" a valid URL, since I don't think there exists
in the world today a WWW client that can be used to access
a resource given nothing but that. We _can_ build a tool
which, given a suitable syntax for URNs, identify them,
dereference them and them access a copy of the
corresponding resource, but that doesn't make the URN a
URL.
And of course, I put "URL:" in front of the URLs so we can
distinguish between that and the URN in the above
discussion. I was not pushing a particular syntax, I was
attempting to illustrate with a real world example where
you might need to distinguish between the two classes of
URIs.
> Second, what sort of reliability/fault detection mechanisms accompany
> the deployment of these URNs? That is, what's to stop rogue servers
> from saying X is a copy of Y when it's really not?
It would be nice, and if we allow multiple URN schemes (as
we have multiple access protocols for URLs) then we can
reply on "Darwinian selection" to provide us with URNs
having properties people really need. I simply don't see us
getting it perfect in the first pass, and even expect
different applications to need different levels
functionality. Our scheme had better be extensible from
day one.
. . .
> I think this is an essentail feature of the system. Not that strong
> authentication must be used in every case -- I agree that there are
> applications that don't require it -- but the overall system must
> allow for authentication if it is to encompass valuable information.
As I said, I suspect that what we'll end up with is a set
of URN schemes, some of which do such things better than
others. I can live with this.
> >Alternatively, I might want to do a search for a
> >particular URN, say number 12345
>
> Ah! So now we're dereferencing URNs! I thought so...
Of course, dereferencing them is one of the key
requirements. But of course when we do we don't end up with
a copy of the resource in our hands, we end up with zero
or more pointers to the resource that can be accessed if
so desired. That's not the same thing at all. Yeesh...
> >Conceptually and practically there are still two different
> >classes of identifier being used and of course getting to
> >this ideal state will still require working with the
> >installed base of URLs. There is a difference here and
> >even if you don't need both, some of us most definitely
> >do...
>
> I agree we need more WWW addressing schemes (URI schemes, if you
> like). I don't agree that URNs should be incompatible with URLs.
Actually, I don't like. Again, I'd prefer to speak of
"IETF URLs", which are the locators I feel comfortable
writing code for. Much as I admire WWW, I don't think
that, say, the Adobe people would appreciate you speaking
of these things as if they're only intended for WWW. One
day I hope to see these in gopher, wais, archie and lots
of other places. Let's keep our terminology
implementation-independant, please.
Now, are the syntax of URNs and URLs the same or
different? I don't know and frankly don't care all that
much because I don't think it really matters. After all,
the code that processes them will by definition be
different. I do care deeply that there are still people
who don't seem to understand why they're different, but
hey, I'm going to start serving these things at some point
and if you don't see a difference upon receipt then you
were right. As long as I can identify when I have a
instance of either class, so I'll know what subroutine to
call when I want to perform an action on it, I'll be happy.
- peterd
--
==============================================================================
...
"It's a -. Shall I tell him?" he asked, looking at Bill. Bill nodded, and
the Penguin leaned across to Bunyip Bluegum and said in a low voice,
"It's a Magic Puddin'."
...
"that's where the Magic comes in," explained Bill. "The more you eat the more
you gets. Cut-an'-come-again is his name, an' cut, an' come again, is his
nature. Me and Sam has been eating away at this Puddin' for years, and
there's not a mark on him."
"The Magic Pudding", by Norman Lindsay
Sounds like a pretty good analogy for the Internet to me
(and yes, that's where we got the name "Bunyip"...)
==============================================================================