Date: Mon, 7 Jun 1993 09:05:34 -0500
Message-Id: <199306071405.AA09785@ux1.cso.uiuc.edu>
To: timbl@nxoc01.cern.ch, Peter Deutsch <peterd@bunyip.com>
From: e-krol@uiuc.edu (Ed Krol)
Subject: Re: Suggest meaning for URN
At 14:47 6/7/93 +0100, Tim Berners-Lee wrote:
>Th eonly proposals I have heard of which doesn't need this structure
>are Peter's, whereby however many documents everyone can write, he
>can still buy a bigger disk to index them ;-), and a suggestion of
>Robert Acskyn that one could have a large distributed and replicated
>set of servers into which one would hash the URN. The last is the
>only attempt I have heard of to scale up the dereferencing of opaque
>names.
>
>So, Ed, can you dereference a URN?
I made the statement because I think the discussion seems to get caught
up in a lot of detail much of which does not seem to be relevent to the
task at hand. Now it seems that 1. yes what we got is a symbolic
reference and 2. a table lookup based ont that symbol is conceptually
correct but it doesn't scale well.
I have two thoughts and no real conclusions. First consider OPACs. Every
OPAC stores this much data (and more) for its holdings and does lookups
for patrons all the time (granted they are sometimes slow). So in this
model, once you find a server it is relativly easy to symbolically look
up something and find its location. Note that libraries have not solved
the problem of "Where in the world do I find this book?". You need to
pick the server and just try servers until you find one which has what
you want. What we are trying to do is solve this problem.
Using this as a model consider that the common solution to it is to have
a librarian who knows which servers are good for what things. Then you
go to those servers and proceed as before symbolically. So what I think is
happening is that there is are two parts to a URN. One is strictly symbolic
and can be whatever the creator of it damn well pleases as I said in
my previous message. The other part is not a reference to a resource but
rather a description of how to find the cataloging authority. So the
structure of the URN could be <server part><resource part>. The server
part can be as structured as the community wants (if we want to make
it X.500ish or domainish it works either way).
Now lets apply the one of the two cardinal rules of computer science:
when in doubt recurse. Why is the catalog of servers any different than
any other catalog of resources? The answer probably is "it isn't". So what we
could have is two symbolic references:
URN(sub combined) ::= URN(sub server) URN(sub resource)
So you break the sucker apart, look up the first one to look up the second
one. So now we have a workable and scalable solution. What we don't
necessarily have is a fast solution. I could probably guarantee resolution
of a URN to a string of URL's in a minute. Great for finding online books,
not so great for hypertext links to something else.
Now what we have is an optimization problem. And here it gets stickier.
There are two obvious ways to help this out and one which I think is useless:
1. Cacheing. I'm not convinced that the number of repeat hits on a resource
will ever be high enough regardless of size of cache to make this
viable.
2. Adding a "possible URL to the URN". So the above becomes a
URN (to the max) ::= URN(sub S) URN(sub R) URL?
Wher the URL is tried first and if it succeeds then the whole
lookup process is bypassed. Fast, but the problem is that it
is possible that over time the URL may point to a reasonable resource
but not the resource that the URN refers to (file contents get changed).
3. I don't think adding additional structure to the URN(sub s) helps
much. I think the number of servers would be quite manageable in
a flat space and the added structure will only speed things up by
a small delta.
In the spirit of brainstorming I hope that all this will spur some
great thoughts in someone else.