Re: URN single or multiple variants

Marc Andreessen (marca@ncsa.uiuc.edu)
Fri, 17 Sep 93 01:51:56 -0500

Date: Fri, 17 Sep 93 01:51:56 -0500
From: marca@ncsa.uiuc.edu (Marc Andreessen)
Message-Id: <9309170651.AA04344@wintermute.ncsa.uiuc.edu>
To: Keith Moore <moore@cs.utk.edu>
Subject: Re: URN single or multiple variants
In-Reply-To: <9309161821.AA19826@thud.cs.utk.edu>
<9309161821.AA19826@thud.cs.utk.edu>

Keith Moore writes:
> Marc Andreessen writes:
>
> > My argument would be that since an image in GIF format and an
> > image in JPEG format are the exact same piece of *intellectual
> > property*, they deserve the same URN -- and by extension, likewise
> > for all pieces of intellectual property that exist in multiple
> > formats containing the same intellectual content. I want to look
> > for intellectual property on the network, not "GIF images" or
> > "JPEG images"; I want the mechanics of which file format the
> > intellectual property is in to be largely irrelevant to operations
> > on and with pointers to that piece of intellectual property; and I
> > generally want my client to automatically choose the most
> > appropriate format for me after it has located the intellectual
> > property that I want. All this seems to point to (a).
>
> I guess I draw a distinction between *conversion* from one
> representation to another and a document that is available in
> different representations.
>
> There are two "official" representations of RFC 1341, of which the
> PostScript version is more faithful to what the authors intended
> (notwithstanding the rules that say the ASCII RFC should be
> authoritative). If I'm looking for RFC 1341, I want to get the
> Postscript version if my viewer can handle it (and I can spare the
> bandwidth), and the plain text version only if it cannot. I would
> treat these as two separate documents, with different URNs, and tie
> them together at the citation level.
>
> On the other hand, if I'm looking for a document that was
> "published" as image/gif, I don't want a server to claim that
> there's an image/jpeg representation of the document available just
> because it's capable of converting the document to jpeg (or has a
> "cached" copy of the converted document lying around).
>
> (I don't mind if it tells me it's willing to do such a conversion,
> but I need to know which representation is the original form.) For
> this case, I would have one URN, corresponding to the "published"
> document. If I were caching that document, I would want it in its
> original format. Otherwise it might be subject to more conversions
> later, causing lossage.
>
> Keith
>
> P.S. My example is somewhat strained...because of course the
> original of rfc 1341 is in Andrew format (right, Nathaniel?)
> ...which produced the other two representations...but in this case
> the *published* versions are in PostScript and ASCII.

An idea for coping with this: attributes associated with content-types
as part of the determination of multiple formats associated with a
given (single) URN. The attributes could be {original, published,
convertible}.

So hypothetically accessing the URN 'urn://internic.net/rfc1341' (say)
would hypothetically return you a set like this:

file://internic.net/RFC/1341.andrew application/andrew original
file://internic.net/RFC/1341.ps application/postscript published
file://internic.net/RFC/1341.txt text/plain published
file://internic.net/RFC/1341.dvi application/x-dvi convertible
file://internic.net/RFC/1341.mpeg video/mpeg convertible

A client (Mosaic) might then say, "I prefer the 'original'
content-type in all cases, but I can't deal with application/andrew so
toss that out. Now, I prefer 'published' over 'convertible' and my
user prefers PostScript to plaintext so I'll take
'application/postscript'."

A different client might say, "Wow, I can't take any of the 'original'
or 'published' content-types, so I guess I'll settle for MPEG. Since
the MPEG version is tagged 'convertible' and therefore will be created
on the fly for me by the server, I'll pop up a dialog box to warn my
user that it's not one of the standard (original or published)
distribution formats."

This seems to satisfy Keith's requirements above... ? All of the
information that would be in separate single-format URNs is present
(and possible more), as are the advantages of a single (canonical)
multiple-format URN.

Re Rich's comments on image quality, I see two solutions:

(1) In the case of high-quality GIFs and low-quality converted JPEGs,
image/gif would be 'original' and image/jpeg would be 'published'.
A known quality of 'original' would be that that content-type is
what the producer of the data intended end-users to download and
view, and as such can be expected to be of the highest quality and
fidelity. Similarly, 'published' can be expected to be "good
enough" for the producer of the data to judge that making it
available is a good idea, and 'convertible' can mean "quality may
vary; caveat emptor".

(An alternative attribute set might be {published, converted,
convertible}, which may make the differentiation in expectable
quality more apparent.)

(2) A quality value could be associated with content-types as well as
an {original, published, convertible} attribute -- then, you could
know what you're losing or gaining on a per-case basis.

I think (1) would be sufficient, given a well-chosen attribute set.

Cheers,
Marc