A modest suggestion for the URN->URL service

Ronald E. Daniel (rdaniel@acl.lanl.gov)
Thu, 31 Mar 1994 23:57:44 -0700

From: "Ronald E. Daniel" <rdaniel@acl.lanl.gov>
Date: Thu, 31 Mar 1994 23:57:44 -0700
Message-Id: <199404010657.XAA19291@collie.acl.lanl.gov>
To: uri@bunyip.com
Subject: A modest suggestion for the URN->URL service

A MODEST SUGGESTION FOR THE URN->URL SERVICE

Ron Daniel Jr.
Advanced Computing Lab
Los Alamos National Laboratory

INTRODUCTION

Up to now, the discussion of the URN->URL mapping service has concentrated,
naturally enough, on that one operation. However, my belief is that there
are several other operations that need to be considered. When we consider
them, they imply certain things about the syntax of URNs, and also about
the contents of URCs. My hope is to generate some discussion on what
services, in addition to the URN->URL lookup, are needed for the
URN service. This will let us make better choices on representations,
etc. for URNs and URCs.

The reader is assumed to be familiar with URLs, and the concepts of
URNs and URCs.

SO WHY ARE WE DOING THIS URN THING ANYWAY?

Before we talk about a URN->URL mapping service, it seems we should
briefly revisit why URLs are not sufficient for all the jobs in the
URI pantheon. The example I always use when introducing URNs to
someone who knows about URLs is the "roaming URL problem" - moving a
file or changing a host name can invalidate thousands of
documents across the world. While I don't think a URN server should
snoop on the directories containing various resources, I do think that
the needs to be a way of telling it that things have moved.

So what are we expecting of URNs? Karen Sollins and Larry Masinter
have, at the request of the working group, established a set of
functional and encoding requirements on URNs. The functional
requirements include global scope, global uniqueness, persistance,
scalability, extensibility, etc. The encoding requirements include
human transcribability, simple comparison, single encoding (modulo
upper vs. lower case), and transport friendliness.

I think Karen and Larry did a very good job with the requirements
document and deserve praise (and a few brews should they be so
inclined). However, up to now the discussions I have seen concentrate
solely on the URN->URL lookup (by way of URCs) and have not addressed
some other operations that are going to be necessary for the service. I
think that a few small modifications to the requirements may be needed to
account for these other operations. What operations? Well, for
starters, how is a resource published and registered with the system?
How does a publishing authority grant permissions to a sub-authority?
Since a major motivation for URNs is to cure the "roaming URL" problem,
how do we register a resource's new location with the service? If we
have multiple URLs for a URN, how do we register these additional
locations and verify that their content is correct? All of these
operations seem implied by the current model of URN->URL via URCs.

Of course, I'm not happy stopping there, no, I've got to go and
talk about extending the content in URCs :-).

One thing I think we all would agree on is that we should cite our
sources. I think that the 99 % of us who have used the Science Citation
Index would agree that it would be very nice to have links from the
original resource to annotations, critiques, extensions, etc. of that
resource. To extend this slightly, some of those annotations might be
SOAPs (Seals of APproval), which seems to be the current consensus on
how peer review will be handled in the future world of on-line
publishing. So now we need operations to register SOAPs and other URNs
that annotate the original URN. The URC seems, to me at least, to be
the natural place to store all this extra junk.

THE MODEST SUGGESTION

Of course, now the URC of a URN is far too big to download to a browser
every time we just want to get the URL for the next resource. What do
we do to cope with this problem? Well, we need a way to pick out the
appropriate fields of the URC. Here is my modest suggestion. We
extend the structure of a URN from
<URN:opaque_string>
to
<URN:method[(arg_list)]:opaque_string>

and define a set of methods.

Some examples:
"gimme the cheapest and closest URL for URN foo"
<URN:fetch:foo>
(foo is the opaque string. Closest and cheapest are the default
restrictions on fetches. It is possible to imagine wanting the
"best" URL of a URN, but it is not possible to rank qualities of
presentation in a simple way. PostScript is usually better than
text, but is it better than HTML? Depends. We may want a query like
"find all the different versions of a URN, and gimme the URLs (and
the type info) for closest and cheapest versions of them")

"gimme the URNs of the 100 most recent annotations to URN foo"
<URN:fetch_annotations(N=100, by_date):foo>
(Comments on operation complexity are given below)

"add URN bar as an annotation to URN foo"
<URN:annotate(bar, authority, authorization):foo>
(Obviously, Prentice-Hall does not want me to publish scurrilous
annotations to revered documents using a URN that is under their
authority. Therefore, some authorization must be provided to
actually register a URN. Digital signatures seem the approach,
more on this in a bit.)

"register URL baz as a location for URN foo"
<URN:register_URL(baz, authority, authorization):foo>

How do we relocate a URL?
<URN:delete_URL(bar, authority, authorization):foo>
<URN:register_URL(baz, authority, authorization):foo>

"publish a new resource (i.e. establish a new URN foo)"
<URN:register_URN(authority, authorization):foo>

"create a new sub-authority"
<URN:register_authority(parent, key1, key2):child>

It is possible to imagine other operations, such as "gimme the
annotations to URN foo that have SOAPs from organization bar and are
disapproved of by organizations bletch and baz". Really, this can
become a query language, but complex queries like that may be far more
than a URL->URN server wants to provide, and there is no reason to
require them to do so. Some basic level of service should be
established as the minimum (which might just be to handle the necessary
registrations for administrative purposes and to do the default
lookup. It might totally ignore annotations). Other people might make
money by providing very complex query capabilities for a fee.

AUTHORIZATIONS

Earlier I alluded to the use of digital signatures (which imply
public/private key pairs) for authorizations. Let me begin with
a disclaimer - despite what some people might try to infer from
where I work, I don't know much about cryptography or security.
(Actually, I work outside the fence and do not have a
clearance). But based on what little I know, here is a possible
approach.

Lets start with the process of registering a new URN:

<URN:register_URN(authority, authorization):foo>

foo is the opaque string. While it will probably be of the form
".../publisher/sub-publisher/sub-publisher/.../ugly_string"
we can't guarantee that. So, the argument list specifies the
organization authorizing publication. When that organization
got authority to create URNs, they had to register a public
key. The authorization field is a secure hash of "foo",
encrypted using the organization's private key. The URN servers
receiving this message fetch the public key, unmangle the
authorization to make sure that it matches "foo", and if so
they start up a URC for the URN.

So how did that public key get registered?

<URN:register_authority(parent, key1, key2):child>

The new authority's name is child. Key 1 is the hash of "child",
encrypted with the private key of parent. When a URN server gets
this message it fetches the public key of "parent", de-mangles
"key1", and compares it with "child". If it matches, it
accepts this as a valid registration and establishes "key2"
as the public key for "child". (There are some issues of key
distribution that need to be addressed, some comments on that
are in the bugs section).

A similar process is carried out for registering URNs as annotations
to other URNs. First, we establish the second URN (bar) as a valid URN
using the

<URN:register_URN(authority, authorization):bar>

method. Then we state that it is an annotation to URN foo with

<URN:annotate(bar, authority, authorization):foo>

Here, authorization is the hash of the concatenation of bar and foo,
encrypted with the private key of bar's authority. (Foo's publisher
can't prevent other people from commenting on foo). The URN server gets
the public key of authority, demangles the authorization, and sees if
it is the composition of bar and foo. See the section on character sets
for how we delimit all this stuff.

Registering locations for resources requires the approval of the
publisher of the URN. Let's say I want to establish a copy of
some resource. I fetch the original in the case of a static file
and give it a URL, baz. I then tell the publisher of
the original about this. They take my new URL and encrypt it
with their private key. They send the
<URN:register_URL(baz, authority, authorization):foo>
message to the URN service. It grabs their public key,
demangles authorization and sees of it matches "baz". If so, baz
is stored in the URC of foo. Issues of meta-information to go
along with the new URL are not addressed in this message, it is
getting too big as it is! What about when publishers go out
of business? See bug 2.

Deleting a URL is the only remaining method with an authorization.
<URN:delete_URL(bar, authority, authorization):foo>
The authority is that for URN foo. The authorization is the hash of
the concatenation of bar and foo. If it checks out, the resource pointed
to by the URL is not deleted by this method, but the URL is deleted from
the URC of the URN. (Can I get more TLAs into that sentence?) So, publishers
retain control over who can provide their material.

CHARACTER SETS AND OTHER ISSUES

For the scheme outlined above to work, it has to be possible to
find the end of the method section, and the end of the opaque
string. There are (at least) three terminators used in my examples
above. First, ':' terminates the method portion of the URN. Both
',' and '>' terminate the opaque string. The ',' is used when
we specify an opaque string in an argument list, the '>' is used
when the opaque string appears at the end of the URN.
When we concatenate two URNs for the purpose of authorizations, we
could use either a '>' or a ',' but the comma seems more natural.

So how does this break when compared to the URN requirments document?
I think the only conflict is with the "simple comparison" requirement
in the encoding section. That says that it is the goal to make URN
comparison as simple as possible, with no optional parts. It seems
that we can only do comparisons on the opaque strings, and that
those opaque strings cannot contain ',' or '>'. This does not seem
a major problem to me (but then I would say that, wouldn't I)?

CONCLUSION

Thanks for reading all this. This note has drug on long enough, so just
a quick wrapup. It may be premature to specify the representation of
URCs within a URN->URL server. It matters a bit when a query is made
that returns a complex result, but we never want to be sending the
whole damn URC down the wire to a poor hapless client unless they are
begging for it. We should define the basic operations to support, and
the formats for complex queries and complex results, but this can be
independent of the internal representation of the URCs.

BUGS

1) Public-key distribution is not a solved problem. It is possible
to provide a different public key than the real one through
denial-of-service attacks (and presumably many other styles of attack).
In most of the operations above we fetch the public key of an
authority using an unspecified infrastructure. This is so we
don't have to store bazillions of keys. On the other hand, the
register_authority() method specifies the public key for the newly
established child. What gives? We may want to store a random sample
of keys to check for attacks. Also, the register_authority operation
is a prime place for an attack, because if you can spoof a URN server
into taking your fake key instead of the parent authority's real key,
you can create your own publishing authority and the system takes
your new public key as gospel. The distributed nature of the URN
servers may prevent this simple attack since each server that gets
a register_authority message will query some public key server and it
is probably not possible to easily spoof all of them. Anyway, this needs
examination by someone who does know about cryptography and security.

2) Requiring the publisher to approve new URLs for resources is a way
to guarantee (at least initial) fidelity of the thing the URL points
to as being an accurate copy of the URN's object. Unfortunately,
if a publisher goes under, there is no way to continue to copy
their work. The administrative side of the IIIA (is this the correct
organization?) will need to address this problem - perhaps by requiring
a defunct publisher to surrender their private key to their parent
authority, or to the IIIA, or to another authority of their choice.

3) A somewhat related problem is guaranteeing continued fidelity of
the various copies to the original intent of the publisher. It may
be that the URC should store the secure hash of a document when a URL
is first registered. Later retrivals can use this to check fidelity.

4) Growth without limits - as it is now, a lot of drek accumulates
in the URC. I would favor a means for expiring annotations from
URCs after some period of time, with the clock being reset if
there is an annotation to the annotation.

5) Undocumented assumptions - I have assumed that there are many URN->URL
servers across the world. When someone registers a URN, new URL,
whatever, they do it once. The new information percolates across the
world. There needs to be a way of restricting who knows what so that
the servers don't need 10 TB of storage to keep all the URCs. This has
to be balanced with the need for multiple places that can map from a
particular URN->URL. I don't know enough about DNS to critique the
reent discussions about TXT records. Nor do I know if the methods I
have suggested in this paper can be mapped onto DNS operations.

Ron Daniel Jr. email: rdaniel@acl.lanl.gov
Advanced Computing Lab voice: (505) 665-7453
MS B-287 TA-3 Bldg. 2011 fax: (505) 665-4939
Los Alamos National Lab http://www.acl.lanl.gov/~rdaniel/Home.html
Los Alamos, NM, 87545 tautology: "Conformity is very popular"