Unresolved URL issues

Alan Emtage (bajan@bunyip.com)
Mon, 7 Mar 1994 00:07:12 -0500

Message-Id: <9403070507.AA26431@mocha.bunyip.com>
From: bajan@bunyip.com (Alan Emtage)
Date: Mon, 7 Mar 1994 00:07:12 -0500
To: uri@bunyip.com
Subject: Unresolved URL issues

Hello All,
As Larry enumerated last week there are a couple of minor issues
still to be resolved before we can put the bow on the URL box.

In decreasing order of importance (by my reckoning) here they are. I have
included where I believe the current majority opinion lies (and pro/con
arguments if appropriate). My own personal observations for resolving the
issue is also there. If you have strong objections to these proposals,
please raise your hand. My plan is to formalize the mailing-list position
in short order in Seattle and move on.

1) The syntax and semantics for certain URLs.

a) the FTP URL.

Majority(?): The syntax should be "URL:ftp://host/a/b/c/d". Meaning that
repeated CWD commands "a", "b", "c" should be performed and a RETR done
on "d". The "/" is a directory boundary and if embedded "/" are to be
allowed they must be quoted via the same mechanism as whitespace (ie,
%<number>).

Pro: Will work in most cases

Con: Will fail in (a minority of) cases

Personal comments: I've had some experience with automated ftp retrieval
through archie and the technique we use is that proposed above. We
perform an additional "PWD" as the first command after login and keep
track of what's returned, presumably the root of the ~ftp tree. We
eliminate that from given paths. However there are problems with this in
that this would fail if a file called /pub/ftp/a was being retrieved and
~ftp was /pub/ftp.

I propose that the path be given relative to the login (uid/password) in
the URL, as opposed to an absolute path. The URL still contains enough
information that it is not "relative" (or "partial") and the context may
be fully resolved on the host in question. It does however prevent the
conversion of the URL to another access method. Not a requirement in any
case, I believe.

I would further suggest (as I believe has been done in the past) that the
"login" term in the BNF be rewritten to have the username/password
combination at the end of the hostname/port, not at the beginning. This
would allow it to conform to the other access methods. In addition, an
"account" term will have to be added (RFC 959 has a triplet,
username/password/account).

Also as Larry notes, there is no current provision for typing the object
being referenced or the transfer mode that has to be used. Since both are
required for access to the object and since the draft requirements allow
such typing in cases were the information is necessary for access, I
propose that we allow the terms "binary", "ascii" and "tenex" to be used
as transmission specifiers (again, see RFC 959). Since the only two
objects that can be obtained from ftp sessions are (potentially)
directories (that is the contents of a directory) and files, that we
specify "directory" and "file" as object types. There is currently no
(standard) mechanism in FTP to determine if an object is a directory or a
file so this is needed. [You can do like archie and parse the ls(1)
listing but that is so ugly as to be rejected out of hand.... in any case
there is no standard for responses to the LIST command].

The question of what to do about multiple types of objects in a directory
may need to be addressed.

IMHO, I agree with John Curran. With something like FTP we can't bother
about every possible implementation under the sun... it's been around too
long and in may cases is too unstandardized to try to get 100% of all
implementations.

This would lead to an ftp URL like:

URL:ftp://hostport@[[[username]:password]:account]/binary/file/a/b/c

[Some people may prefer other delimiters for the "binary", "file"
separators]

b) Telnet URLs

Majority: ?

There hasn't been significant discussion on this if memory servers
correctly.

Larry asks if rlogin, tn3270 (and telnet-with-local-echo) are the same.

My comments: I think we should let telnet be pure telnet. While rlogin
and tn3270 are very similary, they probably should remain separate.
rlogin has a different default port, and tn3270 may require parameters
other than login and password. I suggest that we define similar, yet
distinct, URLs for these beasties. The telnet-with-local-echo is really a
matter of the attributes of the telnet session (if I understand the term
correctly). Can we provide a syntax to specify telnet parameters WHICH
ARE NECESSARY FOR ACCESS on the URL, hopefully the same as say the
Prospero attributes?

b) News and NNTP URLs

Majority: unclear

There was a discussion on this some time back. I belive that Tim and
Mitra were on opposite sides of this fence. Would it be possible for the
two of them to send a message 20 lines or less with a brief summary of
their respective sides? Whatever is finally decided the WG grandfathered
the news URL already.

2) Wrappers

Majority: Wrappers will exist in contexts which need them and are context
specific. However the spec should define them for plain text at least.
Currently <>, {}, '' and "" are possible candidates. Each context
(protocol, system) is free to define its own wrapper. [We should probably
look at documenting these when they come into common use].

URLs need to be identified in plain, running text. While the "URL:"
prefix allows the URL to be explicitly defined (as opposed to a process
of elimination that "if it's not a URN and if it's not a URC then it's a
URL" and although whitespace must be encoded sometimes a wrapper is
neccessary. For example, in cases where no whitespace may be present to
identify the end of the URL.

The debate remaining centres on the characters for the wrapper. Many
people currently use <> (including TimBL if I'm not mistaken). Any
character chose would cause it to be disallowed in the URL proper, having
to be quoted via the normal mechanism. Wrappers are not part of the URL
proper.

Pro: <> are available in most character sets and are already widely used.
(I perceive) Some leaning towards their "intuitive" nature. "" and '' are
used pervasively in text and {} are not available from certain systems
(IBM comes to mind if I remember correctly... it's been a long time).

Con: <> is already used in SGML and would cause additional parsing
headaches.

Con-con: We can't restrict particular characters because particular
systems use them.... this restriction would disallow most (non
alphanumeric) characters.

Personal comments: I fully understand the position of the SGML people
although I don't see many alternatives here. We could look at [] perhaps?

3) Other (to be defined URLs).

Majority: Not a lot of debate. Seem to be comforatble with the current
crop of defined URLs.... may want to wait before defining others?

That's it as far as I know... anything else that I've forgotten?

-- 
-Alan

------------------------------------------------------------------------------ Alan Emtage, "The Left in Canada is more gauche Bunyip Information Systems, than sinister" Montreal, CANADA -The Economist

bajan@bunyip.com Voice: +1 (514) 875-8611 Fax: +1 (514) 875-8134