INTERNET DRAFT Uniform Resource Names draft-ietf-uri-resource-names-03.txt Expires April 20, 1995 - Mitra - Chris Weider - Mike Mealling Uniform Resource Names (URN) Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It s not appropriate to use Internet- Drafts as reference material or to cite them other than as a "working draft" or "work in progress." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained n the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, ftp.isi.edu or munnari.oz.au. This Internet Draft expires April 20, 1995. Abstract This document defines a syntax for URNs, and operational rules for their assignment and usage. The intent is to provide enough information for implementors of IIIR systems to use these in their work. Operational test bed This document also specifies the procedures that will be used for an operational testbed. Please contact any of the authors to confirm you have the latest version before implementing, or look in Syntax The URN consists of four parts, its header "URN", a namespace identifier,a naming authority ID, and an opaque string. The naming authority ID is assigned by a distributed process defined below, the opaque string is assigned by the owner of the naming authority ID, subject to the rules defined below. A typical URN might look like this: Case is not significant. White space is not significant within a URN, this includes the characters Space, Tab, CR, LF and hyphen. Since a URN may contain carriage returns (for example inserted by naming Authoritys) it must always be delimited, how it is delimited is defined by the context, for example it might be part of another data structure, in free text it is delimited by "<" and ">". The BNF for a canonical URN (i.e. white space stripped, and converted to lower case) is: URNinText ::= "<" URN ">" URN ::= Urntag ":" NameScheme ":" Naming AuthorityId ":" OpaqueString NameScheme ::= "dns" | "isbn" | xasciis Urntag ::= "urn" Naming AuthorityId ::= NameScheme PubName NameScheme ::= "dns:" FQDN # Note there is only one name scheme defined for the testbed, it is expected to be extended. Naming AuthorityId ::= "dns:" FQDN | RegisteredString FQDN ::= Dnsasciis Dnsasciis ::= "a..z0..9" | "." # Open question - are dashes allowed in DNS names? RegisteredString ::= xasciis OpaqueString ::= xasciis xasciis ::= xascii [ xasciis ] xascii ::= "a...z" "0" .. "9" "." "/" percentcoded percentcoded ::= "%" 0..9a..f 0..9a..f Equality of URNs The rules under syntax imply that to compare two URNs for equality, the following procedure, or an equivalent, should be followed. Canonisize each URN by, stripping all space, tab, CR, LF and hyphen characters, and converting all A..Z to a..z Do a string compare on the results. Two important things to note about this, a) this compares the URN's, not the bytes they might point to. Because a publisher gets to decide on what changes require the allocation of a URN, two documents may have the same URN but a different series of bytes. b) to compare two URNs, there is no need for a net access. Rules for allocating naming Authority ids It is crucial to the scalability of this scheme that naming Authority ids are allocated in a distributed fashion. The best model we have for this is the existing DNS, and using it gives us some other benefits. Therefore, it is proposed (for the test-bed) that the owner of a FQDN automatically has the right to use of that string (and descendants of it) as a publisher ID, subject to the following constraints. 1) This FQDN has not previously been used as a naming Authority ID, this only constrains those cases where a FQDN is being reused (see below). 2) A resolution service is made available conforming to the rules below. For example in order to use the naming Authority ID "path.net" I would need to ensure that there was a resolution service at "uri.path.net", or a CNAME at uri.path.net to a resolution service. Note by this, we are expecting that some higher level FQDNs, for example mit.edu might be split up differently for naming Authority IDs than for machines, e.g. physics.mit.edu. Also note, that the naming Authority ID is essentially an opaque string, stucture is for allocation purposes ONLY. Pulling them apart and inferring structure is explicitly not allowed. Rules for allocating URNs URNs, or rather the opaque string portion, are allocated by the owner of the naming AuthorityID, in any form they wish - subject to the character set constraints defined above. The publisher decides what changes require allocation of a new URN - and valid choices may range from requiring a new URN for any change in the bytes, to assigning a single URN for all versions of a work in all languages. URN to URL or URC resolution For the test-bed, the following resolution scheme will be used: Client sends the URN to its proxy server in a protocol the client speaks (e.g. Gopher or HTTP) If the server hasnt cached the URC already { If the Server (or its DNS) has not cached the Resolution server { it asks the DNS for "uri.foo.bar", and caches the resulting IP address(s). } The server sends the URN to the Resolution server, using a minimal whois++ syntax, specifically. TEMPLATE=urc;URN=urn:dns:foo.bar:12345 The server receives a URC in whois++ format, {help I need an example here} %220 Command Ok # FULL URC 1 # URC : URN:URN:dns:foo.bar:12345 Title:Moby Dick Abstract: -Moby Dick -This is the story of a guy, a boat, and a very big fish. +It concerns this guy called Ishmael who is kind of hung +up on killing this very big fish. Author:Melville, Herman URL:http://www.fiction.com/books/fish/whales/moby.dick.html Content-Type: text/html URL:ftp://ftp.bunyip.com/sillybooks/moby.dick.ps # END Connections closed..... } The server converts this URC into a format the client can understand, e.g a Gopher menu, or a HTML document, and transmits it to the client. The client either automatically chooses the desired URL, or presents the results to the user to make that decision. At a later date, or even earlier, some resolvers might provide more services, in particular it is expected to add: Redirection, so that a resolver can refer a query Writing, so that a cache/replicator can register a new location Adding Template=URL. An important note, is that this doesnt define that every URN will always be resolvable - in fact, their may not be a net-accessable version of the URN. What it defines is that a reasonable client can take a single deterministic action involving a single DNS access, and a single net transaction, and know that for the vast majority of cases if the URN is resolvable it will have been resolved. This also doesnt say that every publisher must run a URN->URL resolution service. It is perfectly acceptable that uri.foo.bar is a CNAME for a resolution service run at a different address. This also doesnt require the client to always use this resolution service, it is anticipated that some clients may wish to access the resolution service directly. It should also be noted, that in order to deal with scaling problems, that final deployment may require resolution servers on foo.bar.uri rather than uri.foo.bar. Grandfathering in of existing numbering schemes It is intended to grandfather in existing schemes to the extent possible, without constraining an optimal solution for the future. ISBN ISBN's consist of a string of digits which look opaque to the user, but actally contain two distinct parts, the first is the publisher. So assuming an ISBN of 1234567890 where 12345 is the naming authority id, the URN would be . With "isbn" being a new PubId naming scheme. Until there is an intent to run a resolution service for this, then the syntax will not be frozen. ISSN ISSNs will be handled similarly at such time that someone who understands them can fill in the detailes and run a resolver. Anonymous FTP archives There is a desire to be able to retrofit the existing FTP archives so that, resource location tools such as archie could better de- duplicate things. However - since there is no obvious publisher-id in this case, its probably better to do de-duplication via checksums. Alternatively, archie could retroactively publish all of the files with something like where the opaque string would be an MD5 checksum, however its hard to see what this gains? This area will need to be given more work. Integration with existing IIIR schemes Gopher It is anticipated that Gopher will handle URNs by taking the following actions. Add "URN=" as a valid attribute in .Links files, and in the protocol Support multiple "URL=" attributes in .Links files and in the protocol Run a gateway at a ToBeDecided location, e.g. "urn.umn.edu" Possibly integrate the URN proxy server into the server (as for ftp) For backwards compatability (and rapid deployment) URNs will be able to be passed in Gopher0, in the form "Host=urn.umn.edu" (or whatever address is decided) "Port=70" (or whatever decided), "Path=urn:dns:foo.bar:12345" WWW The most obvious way to handle this in WWW is to: replace the anchor with the URN, i.e. the clients need to able to redirect the URN scheme to a specific address, currently this (we believe) is easy only with those clients that redirect all schemes to a proxy server. When a client sends "GET urn:mitra.path.net:12345" to the proxy server, it will return a HTML document with one or more URCs to choose from. A usefull extension to HTML would be to be able to handle multiple href= in a link, enabling both a URN and a cached URL to be in a single link. WAIS Its unclear how to best integrate URNs into WAIS. The first step would be to replace the URL returned in the QueryResponse, with a URN. Since this is just an opaque string which is always returned to the same WAIS server, that WAIS server could then return the appropriate document. Unfortunately it doesnt appear that there is any way withing the current WAIS protocol for the server to offer the client a choice of variants, however a smart (i.e. new) client could take the URN and either do its own URN->URC lookup, or potentially could pass the URN as the search term of a WAIS query to a gateway that would return a QueryResponse consisting of the URLs for this URN. Problems and their solutions LIFNs Keith's requirement for an identifier independant of location, but identifying only one particular combination of bytes is met by a simple URC consisting of the URN as defined above and a checksum. What happens when a FQDN gets reused? If a FQDN has never been used as the naming Authority ID of URNs then reuse poses no problem. However if the FQDN has been used as a naming Authority ID then there are two possible solutions. a) The new owner agrees to continue to resolve or arrange for resolution of the old URNs, or to redirect queries for those URNs to the previous owners resolution service. b) The new owner applies for a new naming Authority ID, either a totally new FQDN, or a subid of the previous FQDN that has not been used by the old owner. Reliance on DNS This scheme specifies DNS as the method for obtaining the location of a resolution service. However there are concerns that this scheme should be able to outlast DNS. There are two areas of issue - existing URNs and new URNs: Firstly the abscence of DNS does not invalidate the syntax as a scheme for uniquely identifying documents. However - at such a time, a new RFC would be required specifying an alternative means of resolution - since DNS going away would require rewriting almost every internet application, defining and then recoding, for a new resolution scheme is going to be small in comparisom. Secondly - abscence of DNS does not mean that FQDNs will go away, so these (or any successor specifying a unique name) still remain as a valid way of assigning naming Authority ids. How it addresses the requirements document This section still needs writing, I dont have the requirements in front of me at the moment. Server side code library Mike Mealing will coordinate pulling together a simple library for the proxy server to fetch and cache results. This library will enable gopher and http gateways to be built on a common underlying code, so that lessons learned in the test-bed can be rapidly deployed. Every object has a name It is NOT assumed in this scheme, and certainly not in the test-bed, that every object on the net has a name, objects which do not have a name, can simply not be resolved using this scheme, but can be handled using existing schemes. How to register a name (and authentication) It is intended that there be a simple, fast way to register a new URL for a URN. This will probably be by adding writing back into this minimal whois++, it is also possible that manual - e.g. HTML forms, and Gopher Ask blocks, can be used for this. {detailed specification to be found by Mike} There is an important issue of authentication of information returned from the resolver, initially, for the testbed, we assume out-of-band authentication between the naming Authority and the resolver, but nowhere else, so the client should be able to trust the URN specific information in the URC, but NOT the URL related information. So for example a client can trust an MD5 checksum at the top of the URC, and then verify against the document returned. In a long term , the current (and to be discussed ) thinking is that this same URN specific information, will be signed by the naming authority, but this relies on a currently non-existant security infrastructure. URL -> URN We believe that URL -> URN resolution drops out easily in existing protocols, e.g. Gopher+ can tell you the attributes (including URN) of a URL, HTTP can be used similarly. Port assignment We need an IANA defined port for the resolution service, preferably NOT that of whois++ since it might handle different functions. Propogation of meta-information There are some specific requirements for propogation of Meta- information that need addressing, especially when it comes to propogating charging and copyright information and ensuring that this gets handled. ======================================================================= Mitra mitra@path.net Internet Consulting (415)488-0944 fax (415)488-0988