Re: URN and citations

Keith Moore (moore@cs.utk.edu)
Thu, 14 Apr 1994 21:49:09 -0400

Message-Id: <199404150149.VAA20425@wilma.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: "Karen R. Sollins" <sollins@lcs.mit.edu>
Subject: Re: URN and citations
In-Reply-To: Your message of "Thu, 14 Apr 1994 18:29:45 EDT."
<9404142229.AA11614@zippy.lcs.mit.edu>
Date: Thu, 14 Apr 1994 21:49:09 -0400

Karen writes...

> I'm not going to include your whole message here but rather just
> address the point you are making. The naming authority (for example
> publisher of a book) will assign URNs. What algorithm that naming
> authority uses for determining whether two resources are "the same"
> and therefore should have the same URN or different and therefore
> should have different URNs is purely a decision of that particular
> naming authority.

The problem with this argument is that whether or not two resources
are "the same" differs depending on your purpose.

It would be reasonable to assign a URN to a resource that changed over time
(like the weather map). While the meaning of the URN doesn't itself change,
the object named by the URN does. We wouldn't normally call today's weather
map and yesterday's weather map "the same".

However, I might need to reference the weather map at 14 April 1994 at
3:00pm EDT. This might require a different URN, and a different notion of
"sameness" would be used.

Now, as it turns out, I might really need to reference a particular instance
of that weather map, say one rendered in a particular format, in order to
illustrate a subtle detail that is not visible in (say) GIF format.
(this may seem like a stretch for a weather map, but make it a Hubble Space
Telescope photo and it really does matter.)

The point here is that any particular idea of "sameness" used by the
publisher may not be appropriate for all purposes. And the publisher might
not really be sufficiently aware of my needs in order to assign the right
variety of names, unless the publisher actually assigns a separate name to
every instance of the object.

Yet, there's another notion of "sameness" that will be important for nearly
every object to be named -- if two instances of a "file" resource contain
exactly the same sequence of bytes. To ensure that a file is not corrupted,
and especially to provide any kind of authenticity check, there needs to be
a mechanism by which one can ask a naming authority (or a trusted third
party) if a particular sequence of bytes is a valid copy of an object.

One means of doing this is to have a distinguished "location independent
file name" (LIFN) for each valid instance of an object, with a description
of that object (containing e.g. the MD5 signature) available from the
naming authority. (I would think that the Trojan horse introduced in the
wuarchive ftpd source code would convince people that such a function is
necessary.)

Ideally, this LIFN would be always used as the actual handle from which a
location of the resource were derived. In that way it would be possible to
verify the authenticity and/or integrity of a resource. Furthermore, a
LIFN->URL mapping obtained from the naming authority for the LIFN, would
provide reasonable assurance that the URL pointed to the correct (and
current) version of that resource.

URNs intended to identify fuzzier concepts (like "RFC 1521") could be
defined in terms of several LIFNs associated with specific instances (like
"the PostScript version of RFC 1521"). There could be hiererchies of
resource descriptions, but the LIFNs would be at the leaves.

> In fact, there may be several naming authorities assigning names to
> the same resources and one may choose to make different printings
> distinct and another may not.

While a publisher might choose to use the same name for any of several
different representations of a resource, I believe there will always
be a need for each representation of the resource to have its own specific
name.

> It is not the business of this
> architecture to make policy choices like that but rather allow
> flexibility and heterogeneity in how these decisions are made. It is
> for exactly this reason that version management, for example, is NOT
> in the list of requirements.

I'll note that the requirements of the "architecture" have not yet been
agreed to. Furthermore, the requirements so far developed for the naming
scheme, should be dictated by the requirements for the architecture, not the
other way around.

We should carefully consider the limitations imposed by whatever
architecture we develop. While I do not want the architecture to prevent
similar representations of an resource from being recognized as such (or
identified in a search for that object), neither do I want it to be so
limited that it cannot provide for authentication and integrity guarantees.

Keith