"uniquely identifies content" -- only for URNs?

Daniel W. Connolly (connolly@hal.com)
Fri, 07 Oct 1994 10:11:20 -0500

Message-Id: <9410071511.AA07469@austin2.hal.com>
To: uri@bunyip.com
Subject: "uniquely identifies content" -- only for URNs?
Date: Fri, 07 Oct 1994 10:11:20 -0500
From: "Daniel W. Connolly" <connolly@hal.com>

This phrase has come up several times recently, and I'd
like to zero in on a precice definition.

Do we mean to say that there is a functional mapping

URN -> Seqence-of-bytes

So that once I have fetched the resource corresponding to a URN,
I can validly cache it for all time?

If we take the case of an md5:... scheme, this is clearly the
case (well... except for the very small chance that two resources
would have the same MD5 digest), and any party on the network
can check the authenticity of a (name, data) pair.

But suppose I want to make a lasting reference to "NCSA's weathermap."
If nothing else, NCSA can register their own URN scheme, so I might
use the name

NCSA:weather-map

Now that weather map changes day to day, so clearly the functional
mapping above is insufficient. But we could model the system as
a mapping

URN, time -> sequence-of-bytes

That means you can't cache indefinitely just because the names match.
You have to use leases, callbacks, time-to-live, etc. (and it starts
to look like HTTP...)

So suppose this map is mirrored at FTP sites around the globe, and
there's a service that will give me a list of the sites in exchange
for the above name.

Are we building a service that assumes consistency between the
URN->URL mapping service and the mirror sites themselves, or is it
possible that the mapping service will give me a URL that points to an
out of date copy?

Also, is it legal to cache the map without registering with the
URN->URL service?

In the end, it's possible that the client will get an out of date
copy.

Do we intend to have some model behind this so that at least in
the abstract, we can identify the party at fault?

And for applications where the information is very valuable, do we
intend to have services with fault detection methods so that we
can identify the party at fault concretely?

So... I'd like to suggest a model for the way these URIs work:

There's a functional mapping

URI x time -> set of type x sequence-of-bytes

If we call it Rep, than for any URI r and time t, Rep(r,t) is a set of
typed byte sequences. This allows for the case where there are
multiple representations of some resource.

So if you resolve a reference r at time t and you get a data
entity (t, bytes), then the computation succeded iff
(t,bytes) is in Rep(r,t)

This allows us to model concepts like time-to-live, last-modified-date,
caching, etc.

[The tricky part is time. Do we assume time is just a scalar number,
and the whole internet shares a global notion of time (and hence time
is totally ordered), or do we use the more complicated but more
accurate model where a point in time represents the sending or
receiving of some message (and hence time is only partially ordered).]

One corollary of this model is that a name like NCSA:weather-map
uniquely identifies content no more or less than a name like
http://ncsa.uiuc.edu/weather-map . Either one can refer to a resource
available from multiple hosts. Either one supports resources that
migrate. DNS shuffling and http's redirection feature are somewhat
crude, but as far as "real and measurable difference," there is none.

Even though the ftp protocol has no redirection or time-to-live
features, folks regularly send ftp: URIs to http gateways that
implement the functionality.

Someone mentioned that "the TCL archive" is something we want to be
able to name. I contend that until there is a service which provides
authentication, the old ftp://sprite.berkeley.edu/pub/tcl (or whatever
it was) is as good as anything.

In fact, this brings up an argument for a common hierachical syntax:
it would be valuable if, given a URI of the form:
scheme:/word1/word2/word3/word4
a service could operate as follows:

Do I have scheme:/word1/word2/word3/word4 cached?
if so, return it...
Do I know a nearby mirror for scheme:/word1/word2/word3/?
if so, redirect to that mirror
Do I know a nearby mirror for scheme:/word1/word2/?
if so, redirect to that mirror
Do I know a nearby mirror for scheme:/word1/?
if so, redirect to that mirror

So is there some part of the notion of "uniquely identifies content"
that disqualifies http:, gopher:, etc. URIs? If so, I have missed it.

Dan