Message-Id: <9404170651.AA02242@tipper.oit.unc.edu>
To: uri@bunyip.com
Subject: URN allocation and resolution
Date: Sun, 17 Apr 94 02:51:43 -0400
From: Simon E Spero <ses@tipper.oit.unc.edu>
I haven't had time to write this up properly, as I've been working on
implementation; this is an outline of a proposed pilot scheme.
A Pilot Service for URN allocation and resolution
-------------------------------------------------
1) URN Format
-------------
Because there is no fixed format for URNs, we shall use the name
format specified in RFC1485 [1]. This format represents names as a
sequence of comma separated attribute-value pairs, in little-endian
order.
Example:
CN=Christian Huitema, O=Inria, C=FR
Note that this format does *not* imply any particular resolution
service (see section XXX below on URN resolution).
UFRNS (User Friendly Resource Names) may be formed by omitting
attribute names; however these names may not be used as URNS, and are
not permitted in this pilot scheme.
2) Organisational Name allocation
---------------------------------
If an organisation already has a name allocated within the X.500 name
space, then that name should be used. Otherwise, the organisation name
shall be determined by taking the longest match of the organisations
domain name in the Internic whois database. If any components of the
domain name are not matched, they be used to form an Organisational
Unit at the organisations option.
Examples
Sunsite.unc.edu =>
OU=Sunsite, O=University of North Carolina at Chapel Hill, C=US
Whitehouse.gov =>
O=Executive Office of the President, C=US
bunyip.com =>
O=Bunyip Information Services, C=CA
If a name has aliases or abbreviations, then these names may be used
for resolving purposes; however, when comparing two names, the
canonical form should be used.
3) Allocating URNS to files
---------------------------
Most information available on todays Internet is stored in files, and
nearly all replication is done using FTP and mirror. We therefore
need a way to allocate URNS to files, and to keep track of this
information as those files are mirrored.
The solution below combines the experience gained through the IAFA [2]
project with Chris Wieder's "Resource Transponders". [3]
All directories containing resources which have been allocated URNS
should contain a file named either "IAFA.URN" or ".urn". Only one such
file should appear in each directory.
The contents of the file are described below; an HTML/SGML [4] style
format is used, as the information has more than one structural level,
and users are more familar with this layout than with other structured
text formats such as STIF [5]. To make parsing easier, tags may not be
ommited. The file does not contain an SGML declaration or DTD.
-- <urns> <root> The root for all relative names in this file </root> <authentication> <-- OPTIONAL, REPEATABLE --> <scheme> authentication scheme </scheme> <name> name to be used for authentication </name> <issuer-certificate> certificate of issuer </issuer> <certificate> a certificate </certificate> </authentication> <file> <-- REPEATABLE --> <filename> Filename </filename> <urn> URN for this file. If this field does not end in @, the root should be appended to give the full urn </urn> <url> <-- OPTIONAL --> URL for this file's original source </url> <signature> <-- OPTIONAL --> <name> name of signer </name> <scheme> signature scheme </scheme> <file-checksum> checksum for file </file-checksum> <urn-checksum> checksum for the entry in this file </usrn-checksum> </signature> </file> </urns>One way of automatically generating URNS is via an extension module for MUPET ( index2cap on steroids), or a similar directory tree walker. The basic algorith involved walking through the directory tree, building a default URN root by prepending a component for each directory entered.
Initially the default URN root should be the organisational name from section 2. If a directory contains a URN file, then the default root should be set to the urn root in that file.
If a directory contains files originating at this site, and no URN file exists, a new URN file should be created, with the root being set to the current default root.
Each file in the directory is then checked against the URN file. If no entry exists, then a check should be made in any system-wide urn listings to determine whether this file has been moved from a different location on the system (using checksum, filename, etc). If a previous name can be acertained, then an entry should be added to the URN file using that name; otherwise a new urn should be generated using the filename.
Although this algorithm attempts to detect files that have been moved, utilities used to move files within the archive should also update the urn file (as well as other IAFA related files)
4) Resolving URNS -----------------
The names generated by this scheme are suitable for resolving by several directory services. For a production service, performance contraints mandate a light weight, connection-less directory service; however, for a pilot project, a heavier weight, connection-oriented service can be used. Schema information is common to all resolution services and is theoretically derived from URCs/URMs. [7]
4.1) Schema
Since no URM spec exists yet, the following basic schema is proposed (format used is in whois++ template format. All fields should be treated as octet strings. If more than one URL corresponds to a given URN, multiple records should be returned.
-- URN: The URN URL: URL for this URN Content-Type: mime content type of this file (optional) Byte-Count: size of this file, in ascii (optional) Title-English: 7-bit ASCII English Language title (optional)--NOTE: ---- For this pilot, only titles in English are available; titles in other languages are an open issues to be solved in the full URC/M specification. Experience with multi-lingual cataloging systems such as ALEPH has shown the usefulness of this field even for non-english resources.
For resources with English titles, this corresponds to UNIMARC [8] field 200, subfield a; for non English titles, this corresponds to UNIMARC field 541, subfield a (where subfield z = eng)
4.2) directory profiles
Whois++: [9] Searching: Perform a search using the attribute/value pairs in the URN. If a URN is not found on a server, then a referral to a parent server should be issued, corresponding to the requested heirachy. Centroids: Servers _must_ provide a centroid to pollers that at minimum contains entries for the attributes and values corresponding to the organisational root of their URN tree.
SID (The Simple Internet Directory - aka CLDAP redux): [10]
A profile will be provided along with the draft SID spec and pdu parser tommorow.
References: [1] RFC 1485 - A String Represntation of Distinguished Names S. Hardcastle-Kille
[2] XXX IAFA RFC Deutch and Emtage
[3] XXX Resource transponder I-D Chris Weider
[4] XXX The SGML Handbook, Charles Goldfarb
[5] XXX STIF I-D Dave Crocker
[6] XXX URL I-D Tim Berners-Lee
[7] XXX URC/URM I-Ds John Kunze? Michael Mealling? [8] UNIMARC Handbook IFLA
[9] XXX Whois++ I-Ds Deutch, Emtage, Fulton, Gargano, Shoultz, Spero, Weider
[10] XXX SID To appear
Respect: Tim "Rimmer" Berners-Lee for starting the whole mess Jon "c.*y" Magid for index2cap, MUPET, and assorted Elvis parephnalia Michael "Yellow Packet" Mealling for advice on URMs Chris "Piano man" Weider for the transponder thing The archie twins for IAFA