Re: Last Call: URL to Proposed and URN- and IRL-Reqs to Informational

Norbert Leser - OSF DCE: (nl@osf.org)
Fri, 23 Sep 1994 14:14:57 -0400

Message-Id: <199409231814.OAA13582@postman.osf.org>
To: uri@bunyip.com
Subject: Re: Last Call: URL to Proposed and URN- and IRL-Reqs to Informational
In-Reply-To: Your multiple messages
Date: Fri, 23 Sep 1994 14:14:57 -0400
From: "Norbert Leser - OSF DCE: (617)621-8715" <nl@osf.org>

I heard you - don't need more flames on this subject! I'm fully sympathetic
to your desire to get free access to the XFN specification. Without being
able to promise anything, we will try to get X/Open to get the document to you.

It's very disappointing that this issue distracts from the real points that
I was trying to make in my request to the IESG.
What I'm really concerned with in this case and what is subject of the
discussion is not the XFN specification, it is your "Requirements for Uniform
Resource Names" document!

I assume that there still might be an interest to listen to comments
about contents. Thus, here are some more details on things that I've already
raised in my initial mail to the IESG:

1. Section 2 of the URN spec requires names "with global scope" that are
"globally unique forever", and that "the same URN will never be assigned to
two different resources". I agree that this should be the typical and
recommended way how one expresses an URN name. In fact, this is the only safe
way to guarantee the same resource being referenced by a given name.

However, it can become very inflexible if we prohibit context relative naming
per se. I don't see a good reason why this should not be allowed if the
underlying naming system (and there are many of them) can deal with this.

In fact, if you look at the efforts in the definition of "Relative URLs",
there are quite a lot similarities to URNs. The reasonings for relative
URLs applies even more to resource names. One might want to reference
documents (a set or hierarchy thereof) relative to a context. Even if the
name is permanent, the context might change over time (not just the addresses
identified by URLs).
[I don't propose to use the same mechanism as specified for relative URLs.
I simply don't want to preclude the use of relative names.]

Lets just consider an example where an organization changes its underlying
naming server technology and wants to maintain the old documents by the
new servers. In this case, not only the access protocols and the
associated addresses (URLs) change but also the global context might change.
It's obviously not a good idea to require an update of any instance of
URN references within a document.

2. The scalability requirement for URNs and the requirement that names are
persistent "forever" seems to be superficial. How do you want to enforce
this? Isn't this clearly the property of the underlying naming systems?
If a naming system has a flat namespace, it might not be as scalable
as a hierarchical system, etc. But if we require certain properties, how
does that go along with the other requirement of "legacy support"?

3. In section 3 (URN encoding), a "single" and "well specified" algorithm
is required for "simple comparison". Yes, taking the qualifier "simple"
it might work and it's goodness to have such algorithms (actually,
we have similar comparison operations for equality of names in XFN).

However, it needs to be pointed out that failing a comparison doesn't
necessarily mean that names are not equal (i.e., naming the same object).
Things like different character encodings (I18N issue), multiple AVAs
(X.500 property), different case matching rules, and support of
approximate matches (also X.500) cannot easily be supported by a
"single" and "well specified" algorithm.

4. Also in section 3, the requirement that comparison "is case insensitive"
and "probably (insensitive to) white space and some punctuation marks"
appears to conflict with the previous requirement of legacy support.
There is a number of naming systems that assume case exactness, use
white spaces, etc.

5. If I understand the URI specification (RFC 1630) and the current URN
proposal right, you want to support any number of naming systems with
your format ("legacy support" and "grandfathering"). As I described above,
a conceivable way of mapping XFN as well as any specific naming system
onto URNs might be:

<scheme-id>:<native-name>

where <scheme-id> could either be something like "xfn" or the proposed (in
URI spec) "urn", and <native-name> would be the name that is understood and
processed by the name server that supports the scheme. Now - if I interpret
it correctly -, in the URN proposal, the URN knowledgeable server has
to have a means for dispatching the requested name to an appropriate naming
service (representing the right naming authority). I'm fully in the dark
how this can sensibly been done without tweaking the representation of
the <native-name> somehow. The example in the URI spec

urn:/iana/dns/ch/cern/cn/techdoc/94/1642-3

seems to indicate that the "name authority" (iana ?) as well as the
supporting naming system (dns ?) is embedded in the name. This is
certainly a way of doing it if you're willing to accept the burden
of having naming authorities centrally registered, having well known
name strings, and implying resolution semantics in string representations
of names.

XFN has another way of dealing with this by dynamically
resolving names of multiple and possible federated naming systems without
exposing this in the name itself. Besides the internationalization
problems that one has with requiring (potentially a large number of) well
known names, embedding these in the name would make it hard if not
impossible to migrate a named object from one to another naming
authority (one would have to modify all occurrences of the name).

6. I never understood (and unfortunately cared less in the past) that
the URI specification (supposedly the all encumbering or generic UR*
specification?) goes to a great detail into things that should be left
to specs such as URL and URN. What the URI spec should define is the
the generic format such as

<scheme> : <opaque_name_or_address>

But that is pretty much it; maybe also including some general requirements
such as that URIs are text representations, need to be transferred on the
wire etc.
But why restricting any derived UR* format by specifying URL specifics
such as reserved characters, hierarchical ordering (of the supposedly
opaque address part), fragment ids, etc?
As you know, the URI and URL specs are completely intertangled
(as far as I can see, they also contain conflicting definitions)
but if you want to make any sensible progress on URNs, this has
to be fixed first (a clarification of the dependencies on URI).

7. Assuming that URNs are derivates or forms of URIs (at least, both
documents contain cross references), the previously stated restrictions
for URIs cause significant problems if you want to meet your goal
of grandfathering existing naming systems. It might be theoretically
feasible to convert (i.e., squeeze) any given naming system's native name
into the required format but I doubt it that one can seriously consider
such names as "transcribable" (one of the properties of _names_ in
contrast to _identifiers_),. Let's take one example. The fully
legitimate composite XFN name:

.../C=US/O=SUN; L=Palo Alto/arch.dev/_fs/D:\games\bonk.exe

would translate to:

//.../C%3DUS/O%3DSUN%3B%20L%3DPalo%20Alto/dev/arch/_fs/D%3A%5cgames%5cb
onk.exe

I cannot imagine that anybody would defend the proposed syntax
as being practical.
Despite the fact that such names can very well be computed, how ever do
you want to support 'cut and paste' through GUIs and how do you
think a human being can parse such cryptic sequences of characters?

One could argue that such names are not exposed to humans (which one could
easily counter argue with examples of Web clients such as Mosaic that
constantly expose URLs to the user); but taking this argument, why
introducing names at all, rather than adopting unique identifiers.

I couldn't find any good reasoning why you didn't leave the encodings of
names (and addresses) alone (they're supposedly opaque, aren't they?).
Simple mechanisms such as quoting or wrappers could take care of
potential problems. Also, the ever lasting discussion on why white
spaces are prohibited and always need to be encoded doesn't give a
sound reason. Yes, there are the problems with some (screwed) protocols
that do nasty things with multiple spaces and tabs (compression or
conversion under the covers); but if we look at names, I'd claim that
in 99.9% of the used white spaces (in those naming systems that permit
these), multiple consecutive occurrences are either insignificant or
they don't occur at all. So, why not just require in these .01% of
cases to encode _multiple_ white spaces accordingly, but don't make this
a general requirement (for single spaces)!

Similarly, there is no good reason for forbidding these numbers of
reserved, national, escape, and punctuation characters (I've counted
at least 17 (!) from the already limited ASCII character set). A small
number (let's say 3) might be acceptable to define things like quotes,
escapes, and possibly component separators.

Another issue is the imposed left-to-right hierarchical ordering.
Besides that other scalable naming systems aren't necessaryly purely
hierarchical (such as the attribute based X.500 with multiple
attribute-value assertions), ironically enough, the Internet own DNS uses
a right-to-left delineation of names.

In summary, what I want to point out here is that we (the XFN and DCE folks)
are interested in providing some means of expressing resource names
through URIs in a uniform way. However, it is inherent in our models that
existing naming systems and conceivable new naming systems can utilize
such "uniform" or "universal" data formats in a sensible manner, as
transparent as possible. As I have tried to outline above, there are
a number of issues to be resolved in the current drafts in order to
fulfill this goal. It would be contrary to the declared purpose of
"uniformity" if you're going ahead with promoting the current drafts
to Internet RFCs.

I'm fully aware that for a number of issues, philosophical and religious
arguments can be brought up on either side; the things that we're dealing
with are too complex to come up with a single optimal solution. But without
even trying to resolve at least the interferences, our community would really
badly been served.

Norbert