Re: (long) sketch of proposed imap: URL syntax and semantics [TRY AGAIN!]

Steven D. Majewski (sdm7g@elvis.med.virginia.edu)
Fri, 1 Jul 1994 18:25:18 -0400

Date: Fri, 1 Jul 1994 18:25:18 -0400
From: "Steven D. Majewski" <sdm7g@elvis.med.virginia.edu>
Message-Id: <199407012225.AA15850@elvis.med.Virginia.EDU>
To: uri@bunyip.com, imap@cac.washington.edu
Subject: Re: (long) sketch of proposed imap: URL syntax and semantics [TRY AGAIN!]

For some reason the previous message went out blank.
I'll try it again...
--------------------

John Gardiner Myers had some comments on my first tentative proposal
for a definition of an imap URL. I have since modified parts of
it, filled in other portions, and tried to provide a justification
for some of my choices. This is being send to both the imap and uri
mailing list for comment.

( I think the problem of how to represent hierarchy in mailbox
names has some relevance to current discussion in the uri mailing
list on the latest draft, and the question of how much syntax
should be reserved for global semantics. )

-- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --

Let's start with the uncontroversial part :-)

1.0 Scheme.

<scheme> :== imap

The precedent ( e.g. gopher|gopher+ ) is to NOT distinguish
different versions of the same protocol (IMAP2|IMAP4) but to either
use the compatible subset, or to have the client+server negotiate. )

1.1 Host and access part.

Some of the published URL drafts|descriptions state unambiguously
that everything after the scheme is opaque: it only has meaning
to that scheme and can only be parsed by that scheme. However,
in practice, there is a lot of library code that expects a
certain global syntax. I don't think that the existance of
libraries implementing that syntax is a definitive argument,
but I do think that there are other reasons to enforce a
global syntax at least up to and including the host/access
part ( if it exists for that scheme ). The latest draft seems
to assert this principle:

>From: Larry Masinter <masinter@parc.xerox.com>
Subject: current status of URL document...
Message-Id: <94Jun29.124801pdt.2760@golden.parc.xerox.com>
Date: Wed, 29 Jun 1994 12:47:56 PDT

| 2.1.4. Internet Scheme Syntax
|
| While the syntax for the rest of the URL may vary depending on the
| particular scheme selected, URL schemes that involve the direct use
| of an IP-based protocol to a specified host on the Internet use a
| common syntax for the initial part of the scheme-specific data:
|
| //<user>:<password>@<host>:<port>
| //<user>:<password>@<host>:<port>/<url-path>
|
| This initial part starts with a double slash "//" to indicate its
| presence, and continues until the following slash "/", if any.
| Within this section are:
|
| user
| An optional user name. Some schemes (e.g., ftp) allow the
| specification of a user name.
|
| password
| An optional password. If present, it follows the user
| name separated from it by a colon.
|
| The user name (and password), if present, are followed by a
| commercial at sign "@".
|
| host
| The fully qualified domain name of a network host in RFC1035
| format or its IP address, as a set of four decimal digits
| separated by periods.
|
| port
| The (optional) port number to connect to. Most schemes designate
| protocols that have a default port number. Another port number
| may optionally be supplied, in decimal, separated from the
| host by a colon.
|
| url-path
| The rest of the locator consists of data specific to the
| scheme, and is known as the "url-path". It supplies the
| details of how the specified resource can be accessed.
|
| The url-path is interpreted in a manner dependent on the scheme
| being used. Generally, the slash "/" denotes a level in a
| hierarchical structure, the higher level part to the left of the
| slash. In addition, the characters "=", ";", "#", "?" and ":" have
| special syntactic common to many schemes.
|

And recent discussion seems to show a desire to more strongly reserve
a global meaning for "#", "?", ";" , etc.

So, if there is a host or host+access part, it should follow
existing practice, which gets us to:

imap:[//[user[:pwd]@]host[:port]] <mbox> [<selector>]

( What to do to indicate that a different AUTHENTICATE method is used
rather than LOGIN is a problem. This is perhaps another example of the
lack of extensibility of the URL syntax ( compared, for example, to
MIME external body parts ) that, I think, Daniel Connolly was
discussing in his comments on the latest URL draft. )

1.2 Mailbox Name.

I have reviewed some of the past discussion on the imap list
about hierarchy and mailbox names, and have read the changes
and additions in the IMAP4 draft. ( But please tell me if I
have got any of this severely wrong! )

In IMAP2 (RFC1176), the protocol itself had no hierarchy and mailbox
names were intended to be a flat space ( actually, two independent
namespaces for MAILBOXES and BBOARDS ) even though, in practice,
mailbox names containing "/" were mapped into the unix file-space in
the expected manner. ( However, I used the word "intended" above
because I don't think that servers maintained that view consistently.
The U-Wash imap2bis imapd server, for example, returns the contents of
the single top level home directory for "FIND ALL.MAILBOXES *" - it
does NOT flatted the filesystem namespace into a flat imap namespace.)

IMAP4 drops the dual MAILBOX/BBOARD space but (re-)introduces hierarchy:
"/" MAY be a hierarchy delimiter, but is not necessarily one - LIST will
return the host hierarchy delimiter. Still, to access a mailbox by URL
does not require a complete implementation of the IMAP protocol, and
we could probably get by with the ambiguous state of IMAP2 - that
"/xxx/yyy/zzz" is a mailbox name and it doesn't matter to the *client*
whether or not it represents a hierarchical name ( doesn't matter to
the client using a imap URL to access a message archive, I mean. Reviewing
that discussion, I DO think IMAP4 made the correct change, but many of
the complications of a full mail MUA client aren't a concern from a limited
URL access client point of view. ). Except that IMAP4 adds
"LIST reference mbox" where reference and mbox explicitly have a hierarchical
relationship.

Trying to make something compatable with IMAP2/IMAP2BIS/IMAP4, and not
wanting to produce URLs that are harder to read and write than necessary
by requiring escaping "/" to "%2f" in typical unix mailbox names, I
propose that "/" be normally considered a part of the mailbox name
and NOT a hierarchy delimiter. ( Although, as in IMAP2, it just MAY
also be a delimiter on the server host machine. ). In the cases
where a delimiter MUST be explicitly expressed ( it must be split
in two parts for proper LIST request - although, in many cases a
null reference part will work properly ) then I propose a double
slash as the explicit hierarchy delimiter:

imap://host/reference//mail/box

I would also allow "*" as a wildcard mailbox selector, but IMAP
"%" and "?" wildcards would have to be escaped.

1.3 Message Selectors and Query/Search Strings.

<selector> := ?<query string> | #<message-selector>

<query string> would be the arguments to an IMAP SEARCH
command with spaces ( and other non-safe chars ) escaped.

/mailbox?to%20uri%subject%20imap

would be the results of the IMAP request:

SEARCH mailbox to uri subject imap

On Jun 25, 10:27, John Gardiner Myers wrote:
>
> IMAP URLs should most certainly use UID's, not message-id's or
> sequence numbers.
>

I agree that UID's should probably be the canonical way of
constructing URL's. But I don't think that should be the only
way. The server I am currently running does not seem to support
either FETCH UID or any of the UID commands. Also, a more
typical way for a human to construct a reference is by
Message-ID: header fields. The IMAP UID's may be exchanged
by client and server, but I expect that most clients don't
display them to the user. Also, URL's do not have any
guaranteed lifetime. Sequence number in an active INBOX
would, I'm sure, be too volitile to be of *ANY* use, but
then, I don't anticipate giving someone a URL pointing
to my INBOX. Publicly exchanged URL's would, I expect,
typically point to mailing-list archives, and would typically
be read-only access. Public archives might typically grow
but not shrink, so sequence number, while not the preferred
method, would work as well as most other URL's.

message selection should be possible by:
IMAP Unique ID (the canonical way for URL's to be returned)
Message-ID (the typical way for hand constructed URL's)
mailbox-sequence-number

Optionally followed by an additional #<selector> specifying
either:
MIME Content ID (preferred)
MIME Body Part number.

And I would propose that those be indicated by:

<selector> := <message-selector> [<part-selector> [<fragment>] ]
<message-selector> :=
#mid:<message-id> |
#uid:<imap-uid> |
#<number>

<part-selector> :=
#cid:<MIME-content-id>
#<body-part>

So without the literals "mid:", "uid:", or "cid:", numbers
are interpreted as mailbox sequence numbers, and for a
second selector, as MIME body parts. For example:

imap://loghost/mail/mailbox#473#2.1.4

If the MIME part content type is HTML, then an additional
optional #<selector> field is defined to be a HTML fragment.

For other cases, additional #<selector>'s are permitted,
but are not defined.

2.0 Semantics:

URL's that exactly specify a single message will return that message.

For URLs that do not exactly specify a single message ( for example,
URL's that specify a mailbox, a mailbox wildcard, or a mailbox and a
query/search string ) a list of either mailboxes of messages in a
mailbox should be displayed. ( The result of a "FIND ALL.MAILBOXES",
"FIND MAILBOXES" , "LIST" (mailbox) or "SEARCH" (message) command )

A search string that results in a single message is treated as a list
containing a single message. Although access by message-id is
implemented by an imap SEARCH command, the different syntax indicates
that it is to be treated as exactly specifying one message.
( i.e. the semantics of mailbox#mid:<message-id> and ( the escaped
version of) mailbox?text message-id: <message-id> are different. )

It is allowed for Multipart MIME messages to return the parts in some
symbolic form than requires further dereferencing. ( But it should be
something less raw than just the S-expr returned by FETCH BODY or
BODY.STRUCTURE. A list of body parts, their MIME content type, and any
other informative MIME headers would be more like it - something
humanly readable.)

2.1 Applications

My plan is to make a prototype IMAP/HTTP gateway, implemented as a CGI
script, and using proxies to redirect imap: URL's to the gateway. The
gateway will accept IMAP URL requests and return HTML objects with the
above semantics. Lists of messages, will be expanded into hypertext
links to those messages. ( Other HREFS may also be added, for
example, the header lines referencing other messages may be turned
into hypertext anchors to those messages. )

Accessing an imap URL thru the proxy gateway from a HTTP
client will present to the user either an active list of messages
or a message.

IMAP URL's should be a "logical address" and not be in any
way limited or linked to this particular gateway. It would
be possible ( and encouraged ) either to add imap: support
to a WWW browser like Mosaic, or alternatively, to add support
for interpreting imap: URL's to an IMAP client like Pine, which
could then be called from a browser as a "client-side helper" ,
in the same manner that a telnet window is started up to handle
that scheme.

2.2 Security.

There are obvious problems with sending user/password login strings in
clear text. However, I anticipate that typical use of an imap URL will
be to access a publicly available Read-Only mailing list archive,
either thru a preauthenticated gateway, or via anonymous user-id.

Processing imap url's thru a third party gateway ( i.e. when the
proxy gateway is on a different machine from both the client and the imap
server referenced in the URL. ) adds another level of insecurity.
( Another reason to wish for client side support for imap url's! )

The ability to embed in the URL, a public-key "access-ticket" or capability
to read only the referenced message only by the authorized person
would be necessary to allow secure references to personal mailboxes.

___________________________________________________________________

-- Steve Majewski (804-982-0831) <sdm7g@Virginia.EDU> --
-- UVA Department of Molecular Physiology and Biological Physics --
-- Box 449 Health Science Center Charlottesville,VA 22908 --