Re: MD5 and LIFNs (was: Misc Comments)

Alexander Dupuy (dupuy@smarts.com)
Tue, 26 Apr 1994 12:35:47 +0500

Date: Tue, 26 Apr 1994 12:35:47 +0500
From: dupuy@smarts.com (Alexander Dupuy)
Message-Id: <9404261635.AA13370@brainy.smarts.com>
To: uri@bunyip.com, peterd@bunyip.com
Subject: Re: MD5 and LIFNs (was: Misc Comments)

> > > o LIFN's for "byte-stream" identification is very important. Shouldn't
> > > it be possible to now define an "MD5" namespace authority via an
> > > informational RFC which specifies how to calculate the defacto name
> > > of any byte-stream?
> >
> > While this seems like an interesting proposal, I see two problems with it.
> > The MD5 namespace is non-hierarchical, so a single namespace authority would
> > have to administer the MD5 names for every published resource in the world;
> > this is unlikely to scale well.
>
> Actually, proposals for using MD5 as a URN have been
> circulating for some time now (certainly since the early
> threads on URSNs, etc). Their principal attraction (to me,
> there may be others) is that they in fact do not need any
> namespace authority support to be deployed and the scaling
> issues are not nearly as bad as they might appear.

I am in violent agreement with you that MD5-based URNs can be useful. My
criticism of MD5 was in response to the original suggestion that a *single*
"MD5" namespace authority could be defined to provide a URN namespace. I
believe you have not disputed the points I made, but rather suggested that MD5
is a useful tool for generating URNs, and this is certainly true. To be
concrete about it, the original message was suggesting that anyone could
generate URNs which looked something like:

[1] URN:MD5:<hexadecimal md5 digest here>

while I believe that you are suggesting that a coordinated archie-like URN
provider could use md5 to generate URNs for electronic documents available via
FTP; these would look something like:

[2] URN:bunyip.urn.int:md5:<hexadecimal md5 digest here>

or better (in view of the possibility that some files may share md5 digests)

[3] URN:bunyip.urn.int:md5:<hexadecimal md5 digest here>:<size>:<serial>

where the document size is used to differentiate between files that have
matching md5 digests, and a serial number is appended to deal with the remote
possibility that two documents may have the same size and md5 digest. Note
that this serial number implies that some entity is actively indexing the URNs
it issues and can determine how many URNs with the same md5 digest and size
have already been issued.

The difference between [1] and [2,3] is that I don't think it is feasible in
the long term for a *single* URN namespace authority to manage a
non-hierarchical namespace which attempts to provide a URN for *every* digital
document in the world. I do think it is feasible for any URN namespace
authority which wants to to use md5 to generate URNs to do so (although they
should be aware that md5 digests are not necessarily unique).

Part of the problem with [1] is that there is no reason to believe that a URN
generated in that way is actually resolvable (i.e. that anyone can get a URL
from it). It just assumes that there is some entity out there which just
indexes anything and everything. While this may be true for files in
well-known FTP archives, it is less likely to be true for Compuserve uploads,
e.g.

A more realistic scenario is that when I make something available via an FTP
site, I might know that Bunyip will be indexing it. I can calculate the md5
digest and size, check to see if Bunyip already has a URN with those values,
and thereby determine what Bunyip's URN will be. If I were making it
available via other means which aren't indexed by Bunyip, but are indexed by
somebody else, then I would check the somebody else's URN servers instead.

@alex