URI position statement

browne@cs.utk.edu
Thu, 24 Mar 1994 12:23:42 -0500

Date: Thu, 24 Mar 1994 12:23:42 -0500
From: browne@cs.utk.edu
Message-Id: <199403241723.MAA23296@pebbles.cs.utk.edu>
To: bajan@bunyip.com
Subject: URI position statement

Alan,

Please consider the following position statement for inclusion
in pub/Network/uri/postition on archives.cc.mcgill.ca
(I purposely misspelled position, as that is how it is on your server)
and for discussion at the IETF URI meeting next week.

************************************************************************
Shirley Browne Research Associate 107 Ayres Hall
browne@cs.utk.edu Computer Science Dept. University of Tennessee
(615) 974-5886 Fax (615) 974-8296 Knoxville, TN 37996-1301
*************************************************************************

Issue1: Should a URN name a single unchangeable object, or should a URN
be allowed to name a collection of such objects (e.g., different
versions -- Postscript, ASCII, etc.) or even an object with
content that changes over time (e.g., current weather map)?

Note: For purposes of the following discussion, I am assuming
a URN syntax along the lines of

URN:<URN-type>:<naming_authority_id>:<opaque_string>

Position: The URN syntax should contain a field that indicates what
type of URN it is. The allowed types should be an extensible
set. The initial type should be that for a single unchangeable
object (e.g., atomic-immutable). Furthermore, for the
atomic-immutable URN-type, the URN itself should allow
verification that a retrieved object is indeed the object
named by that URN (e.g., have the initial type be
atomic-immutable-md5 and the opaque string part of the
URN be the MD5 signature of the file).

Arguments:
1) There is a clear need for the type of URN that names
a single, immutable object. For example, a user retrieving
a piece of software needs to know that she has retrieved
the correct version and not some modified version with the
same name. There are also needs for other types of URNs,
for example to name a library of software modules, the
type here being a URN which names a composite object.

2) The issues concerning URNs which name single, unchangeable
objects are the best understood at this time. Thus,
implementation of this type of URN could proceed while
issues for the other types continue to be worked out.
There have been so many possibilities proposed for other
uses of URNs and so many questions raised that considerable
time will be needed to sort through all the issues and
decide exactly what types of URNs are needed.
For example, for naming composite objects, what will be
the relationships between the components and how will they
be specified? For naming a replicated
network service, what will be the consistency requirements
between different instances of an object with the same URN?
For naming a collection of objects that vary by format or
naming temporally changing versions of an object,
how will the different version be described in a standard
way so that client programs know how to deal with them.
I think it's an open question how many different types
of URNs will be needed. I am not saying that all the
above information (e.g., consistency requirements) should
be part of the URN. I just mean that we know too little
about how such complex URNs will be used to
start deploying them now.

3) It should be easy for a client to figure out for a URN
which URN-type it is without having to retrieve its URC.
If the URN/URC databases are separate from the URN->URL
lookup services (as I argue they should be in Issue2 below),
then making the client retrieve the URC every time it
resolves a URN will be too much overhead. URN->URL lookup
needs to be fast, especially for the
URN-type used most frequently in hypertext links,
which I expect will be atomic-immutable. Otherwise,
HTML authors will continue to use URLs.

4) A URN->URL mapping for an atomic-immutable object may be
inaccurate, or the file containing the object may have
been corrupted or have been edited by an unknowing or unauthorized
person. A client program needs a way to validate that
a retrieved object is indeed the one named by the URN,
working just from the URN and the retrieved object.
Think of, for example, scientific data sets or software routines,
where correctness is paramount.
The client should not have to access the URC database
to obtain a certificate or signature or have to rely on
on such being in the URC database. Nor does it seem
workable to me to try to enforce that a signature should
always be included in the URN->URL database.
Registering atomic-immutable-md5 as a standard URN-type
does not mandate the use of md5 or
preclude the use of other signatures -- who knows,
perhaps md5 will be broken.
Another type might be atomic-immutable-ISBN, if the
ISBN number can be assumed to be reliably included
in the object.

Issue2: Should URNs, URCs, and URLs be packaged together as a URT,
as proposed in http://www.gatech.edu/urm.paper, or should
the URC database be separate from the URN->URL lookup service?

Position: The URC database should be separate from the URN->URL
lookup service.

Arguments:
1) These databases will have different purposes and different
characteristics. The URC database will be the basis for search
engines that handle queries for objects that satisfy particular
characteristics. A URN->URL lookup service will be consulted
after an object has been identified in order to retrieve it.
(Admittedly, there may some cases when a client program will
need to query the URC database when resolving a URN, for
example to determine the content-type of an object if the
access method doesn't provide it. Perhaps some thought
should be given to including such information with the URL
in the URN->URL database).

The URC for an object should seldom change, perhaps only when
additional descriptive information is added. URLs corresponding
to a given URN will change more frequently, as objects are copied and
mirrored.

2) Insisting that the URC-based search services and
URN->URL lookup services reside on the same server or
even under the same authority imposes unreasonable and
unnecessary constraints. URC-based search services
may cover URNs from multiple authorities and may not cover
all the URNs from a given naming authority. URC-based search
services will be best divided up by subject area, which may
or may not correspond to naming authority domains.
If the electronic publishing world
develops as the print publishing world has, most
naming authorities/publishers will
not be subject-area-specific. There will also be a very
large number of publishers, and if each publisher ran
its own URC-based search service, it would be expensive for
the user to search them all.
The analogy in the library on-line database
world is a search service provider such as Dialog which provides
a number of subject area databases, each of which indexes documents
on its subject from a large number of publishers.
As an example of what might happen, consider the mathematical
software world. There will likely be a large number of publishers
of math software. Someone
could provide a search service by harvesting URCs from all the math
software publishers it knows about, indexing them, and running
a search server.

On the other hand, a URN->URL
lookup service will most likely be run by a naming authority,
or by someone else on behalf of the naming authority,
since the naming authority identifier part of the URN will be
used to determine which URN->URL lookup service to consult.
File servers would be responsible for notifying the the
naming authority's URN->URL service of what files they
were making available. (I am not referring here to cached
copies of files, but to copies that a file server provides
on a long-term basis).

************************************************************************
Shirley Browne Research Associate 107 Ayres Hall
browne@cs.utk.edu Computer Science Dept. University of Tennessee
(615) 974-5886 Fax (615) 974-8296 Knoxville, TN 37996-1301
*************************************************************************