Re-write of (formerly) URM paper!

Michael Mealling (michael@fuzzl.oit.gatech.edu)
Wed, 3 Nov 1993 22:38:59 -0500 (EST)

From: michael@fuzzl.oit.gatech.edu (Michael Mealling)
Message-Id: <9311040339.AA02103@fuzzl.oit.gatech.edu.noname>
Subject: Re-write of (formerly) URM paper!
To: uri@bunyip.com
Date: Wed, 3 Nov 1993 22:38:59 -0500 (EST)

This is a rewrite of my URI paper. Most of it concerns deleting URMs entirely
and more thoroughly discussing URCs which were formerly URTs. Anyway,
have a read and let me know what you think:

Michael Mealling

Michael.Mealling@OIT.gatech.edu

Georgia Tech

October, 1993

Uniform Resource Characteristics

(NOTE: This paper makes the assumption that the intended audience has working
knowledge of URIs and the past work of the URI-WG of the IETF. It also uses
names for things that as yet have not been agreed upon within the working group.
If you don't agree with what a specific entity is called then please insert
your favorite TLA where needed.)

1: Introduction

Currently, there are two issues facing the URI working groups: encoding of
meta-information and Uniform Resource Name (URN) to Uniform Resource
Location (URL) resolution. The first is causing considerable trouble within the
working groups because meta-information is by far some of the most important
information to the user. The second, while not as volatile as meta-information,
will soon be very important as many people start using the new URN
specifications in real applications. (NOTE: For the rest of this paper the act
of resolution will be depicted with the "->" notation, i.e.: URN->URL means
URN to URL resolution.)

Presented here is a set of items that should offer an acceptable solution to
both problems. For meta-information the author proposes the creation of an
additional URI entity called a Uniform Resource Characteristic (URC). This
entity will be used to encode meta-information such as filesize, type, title,
author and version. The URC does not completely solve the above problems
though. How do you associate meta-information with a given URL or URN? The
simplist way to do this is to abstractly call a URN and a URL meta-information
as well. This allows you to put URNs and URLs into a URC. The URC then solves
two problems: URN/URL/URC encapsulation and URN/URL/meta-information resolution
and transport. It does this by exploiting the fact that all URIs start with the
common identifier "UR*:". This causes a URC to be a template that is useable as
a whois++ template. The only additional aspect of URC that is needed for a
useable structure is to specify how a given URC specifies internal relationships
between specific URIs and meta-information.

3: The Uniform Resource Characteristic(URC)

3.1: Functionality

A URC is a method for showing relationships between URNs, URLs and meta-
information in an encapsulated entity that can be passed around as one token. It
utilizes simple parsing rules based on URNs having precedence over URLs and URLs
having precedence over various pieces of meta-information. This allows a URC to
contain most of the information needed about a given network resource in one
cacheable chunk of data.

The format of a URC exploits the format of each individual component. URNs and
URCs both start with an identifier ending with a colon. By allowing for a "URL:"
wrapper for URLs we end up with a list of components that naturally fall into
template format. Currently the IETF URI Working Group has specified that URLs
must have "URL:".

We can use the template format to our advantage by using the whois++ server
protocol as a way to resolve URNs to URLs and to cache meta-information with
URLs to make network access more efficient. Also, with the use of centroids, the
URCs can be searched globally (this depends on whether the IIIR group decides if
centroids scale or not).[1-Fullton] The use of whois++ is only a recommendation
and is an implementation issue only. The URC can be used by different resouces
and directory lookup services but the fact that it is a standard thing and
has structure is what gives it value.

3.2: URC Contents

A URC can contain any number of URNs, URLs and specific meta-information
delimited by the associated wrapper for each entity. The order of each of these
in the file is important since that is how a client would determine which entity
corresponds with which other entity.

What is not apparent is why multiple URNs could be allowed in the same URC. This
is useful for caching information about related resources. For example, the URC
for The Declaration of Independence could also include a URN (and associated
URLs and URCs) for the Federalist Papers. This saves the user from going back
to the network to retrieve meta-information that is closely related to what
they have already received.

Multiple URNs in a URC is a very flexible section of the implementation rules of
a URC. Some clients may wish to ignore any other occurrences of URNs while
others may wish to parse a very large URC with large numbers of related URNs.
This is left up to client implementations. The only requirement is that they
must at least be able to handle the file. There is no requirement that they
keep or use any of the supplied information.

NOTE: There has been some violent disagreement with the above statement
concerning multiple URNs in the same URC. It is only offered for comment.
Nothing is lost by disallowing multiple URNs and only a small gain is
accomplished. Take it or leave it...

3.3: Ordering Rules

3.3.1 URI Rules of Precedence

In order for the numerous UR* in a URC to make sense, there must be order to the
sequence of items. The order that makes the most sense is based partly on
expected time to live and partly on an arbitrary precedence scheme. A URN is
meant to be unique over all time and eternity; therefore, the first occurrence
of a URN must have precedence over all other UR* in that URC. A URL is meant
to be unique to the location of the document. The document itself may change,
which would cause its meta-information to change, but not it's URL. Thus, a
URL has precedence over a URC. Note there are situations where a URL can
change but the meta-information may not. This case doesn't really matter since
it would just mean updating the URC anyway. It is useful to think of a URC
being even more transient than a URL since a URL can stay the same but the URC
changes. Peter Deutsch said it nicely when he considered it to have a zero
time-to-live. Finally, another occurrence of a URN denotes a new resource that
has precedence over subsequent UR* in the URC.

Also, a URC does not need 1 or more of any URN, URL or URC to be a URC. A
URC can be made up of just URLs and meta-information without a corresponding
URN. Conversely, a URC can have only URNs and URLs, or just URNs, or just
URLs, or even just a collection of meta-information. You can even have a null
URC which contains nothing.

3.3.2 URI Combination Rules

With a URC having some internal structure, certain scenarios become apparent
when certain combinations of UR*s occur. Listed below are several different
combinations of URNs, URLs and URCs that denote different resource
relationships:

URN,URL and meta-info denote one specific instantiation of a network
resource at a specific location on the net. This is useful for pointing a
client at the closest source for a resource.

URN and meta-information denote meta-information about a URN that is
global to all occurrences of that URN. If a URL comes after that
meta-information then any meta-information after that URL modify the
global meta-information only for that URL. This DOES NOT mean that the
meta-information associated with the URN has the unchangeability attribute
of a URN.

URN and URLs denote multiple resource locations with no
meta-information.

URL and meta-information specifies a location with its associated
meta-information but with no URN. This is useful for resources that are to
transient too deserve a URN or for users who know of a specific resource but
don't know the URN for it.

URN, URL, meta-information, URN and other UR* denotes a URN that
has "related" URNs to show relationship between wholly different resources.
This is used to cache closely related objects to reduce calls to the
network for related meta-information.

URNs only denotes a set of related resources.

URLs only denotes a set of resources that describe some URN that we don't
know about and that hopefully we can find.

Meta-information only denotes a template that is used by a user to find a
resource when they don't know a URL or URN. This is useful for users who
are comfortable with the library based method of finding something by title
and author. They simply build a template with those two pieces of
information and pass it some resolver. There is no gaurantee that it will
work but it's a good first try.

3.5 Example URCs

The following example URCs are not exhaustive. They are only used to give a hard
example of a URC in order to show the structure:

URN:1:IANA:626::Dir:6345
Author: Michael Mealling
Subject: OIT Computing Resources
URL:gopher://gopher.gatech.edu:2048/11/Computing.Resources
Cost: US$0.00
MIME_Format: application/postscript
URL:http://www.gatech.edu/Computing.Resources
Cost: US$0.50
MIME_Format: text/plain

4: whois++ servers as URN->URC servers

Since a URC is simply a template and whois++ was specifically built to to handle
anything in template form it seems logical to use whois++ as a resolution
scheme. It allows the resolver to handle update records within the protocol
instead of as a separate function. It also has the added function of allowing
for centroids that make global searches of meta-information easier and faster
(NOTE: This assumes that centroids will scale!). For more information this
paper differs to the whois++ specification [1-Fullton], the IETF WNILS-Working
Group and to a new paper by Mitra concerning URN->URC resolution[Mitra].

5: The problem of standardized meta-information

The problem of standardizing meta-information tags in the template format is a
VERY thorny problem. Currently the author believes that the IAFA templates with
the Non-Existant Data Elements Working Group modifications are some of the best
work in this area with the corrolary that this subject is far from being
rectified. This should be punted to the IETF IIIR Working Group.

5: References as example URCs

[1-Weider 93]
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-iiir-transponders-00.txt
Author:Weider, Chris
Title: Resource Transponders
Date: March 1993.
[2-Weider 93]
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-iiir-vision-00.txt
Author: Weider, Chris and Deutsch, Peter.
Title: A Vision of an Integrated Internet Information Service
Date: March, 1993
[3-Weider 93]
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-resource-names-01.txt
Author: Weider, Chris and Deutsch, Peter
Title: Uniform Resource Names
Date: Oct, 1993.
[Berners-Lee 1993]
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-uri-url-01.txt
Author: Berners-Lee, Tim
Title: Uniform Resource Locators
Date: March, 1993
[1-Fullton]
URL:ftp://cnri.reston.va.us/internet-drafts/draft-ietf-wnils-whois-01.txt
Author: Fullton, Jim, Wieder Chris and Spero, Simon
Title: Architecture of the Whois++ Index Service
Date: March, 93
[Mitra]
URL:ftp://ftp.path.net/pub/ietf/urn2urc-02.txt
Author: Mitra
Title: URN to URC resolution scenario
Date: November, 1993