Message-Id: <9405182053.AA10577@ulua.hal.com>
To: uri@bunyip.com
Subject: A Formalism [Was: URN various ]
In-Reply-To: Your message of "Wed, 18 May 1994 08:27:07 PDT."
<199405181527.IAA19109@rock>
Date: Wed, 18 May 1994 15:53:46 -0500
From: "Daniel W. Connolly" <connolly@hal.com>
In message <199405181527.IAA19109@rock>, Terry Allen writes:
>Larry Masinter writes:
>| I'm willing to change it to remove 'uniquely' from 'uniquely namable
>| entity' if you think it will reduce confusion.
>| I'm less willing to remove 'unique' from '... function of a URN is to
>| provide a globally unique, persistent identifier', since 'unique'
>| modifies 'identifier' and not the resource.
>
>Yes, that's all I want made clear, that URNs are unique in that
>no two URNs are (encoded the) same.
The URN requirements specification is foggy because there is
no definition of the term "resource."
The novel feature of the URN abstraction is that it maps
strings (names) to resources, no? So then we have the questions:
(1) Is the mapping from names to resources unique?
(2) Is the mapping from resources to names unique?
(unique meaning 1-1, i.e. map(name1) = map(name2) implies name1=name2)
I suggest that the answer to question (1) is "Yes, by definition."
In other words, we define the set of resources as "those things
which can be named with URNs." It follows that the answer to (2)
is yes also. In a situation where, for example,
urn:ora.com/docbook.dtd/1.1
and
urn:hal.com/docbook.dtd/1.1
both resolve to the same sequence of bytes, we say that this is
a coincidence which is not guaranteed to continue to be true
over time. O'Reilly may serve up a different sequence of bytes
for that name tomorrow if they like. If, on the other hand,
O'Reilly and HaL want to express in the URN formalism that
their FTP servers serve the same document, they should use
a name like:
urn:davenport/docbook.dtd/1.1
>BTW, is "plain text" defined by MIME?
Yes. The text/plain content type is defined as any sequence
of lines of characters in a registered character set, US-ASCII by default.
The (internet-mail-safe) 7bit content-transfer-encoding further
specifies that lines are delimited by CRLF pairs and limited
to 72 characters in length.
>Dan Connolly writes:
>
>| For the record: I think that the "URN->URL mapping problem" is
>| not the right problem. Once you've got a URL, you still have to go get
>| the "resource." And you've got security, scalability, and reliability
>| issues to deal with there.
>
>I think the discussion has shown that URNs are going to be useful,
>but they're a smaller piece of the overall architecture than was
>anticipated at first.
URNs are a big piece of the puzzle; they are an answer to the
"how do we express references?" question. It's the URN->URL
mapping problem that I think lacks merit. Any URN requirements
document must address the URN->resource mapping to have merit.
>| The right problem to attack is "How do we express references, and
>| how do we resolve them?" Until I can compose documents that reference
>| other documents in such a way that (1) the reference remains meaningful
>| despite various inevitable changes in the world, (2) my reader
>| can follow the references reliably, we have made no progress.
>
>I think even (1) is a worthy goal. I'd invert point 2: my reader
>will not get the wrong source when he follows a reference, although
>he may get nothing at all.
But this is too limited. It may be the case that the "right" resource
is unavailable, due to the fact that it was revised. The server
may offer the revised edition. The point is that the client and
the server _detect_ this situation, so that the user can decide
if this heuristic solution is ultimately acceptable.
>One item from the minutes has been bothering me a bit:
>
>| o URNs must be built with a limited character set in order to be
>| transportable
>
>This seems like a short-term desideratum compared with the eternal
>life of URNs. All is okay for now, but eventually won't there be a
>way to use any character one desires in a URN?
It is always possible to represent any sequence over a finite set in
terms of any other sequence over a finite set. Think of it this way:
you can represent integers as numerals using any set of digits.
There are other mappings, e.g. it's possible to represent strings of
8bit chars as strings of 7bit chars through escape mechanisms.
There is apparently a requirement to represent URNs uniquely
in 7bit text/plain. This is a well-defined requirement, trivial
to implement once the set of URNs is known. (The problem
of deploying the resulting conventions, however, looms large)
I think there are some fomal properties of URNs that we can speak
of. We say that a URN identifies a resource, i.e. there is a
functional mapping from the set of URNs to the set of resources:
Identify: URN -> Resource