Re: Another snapshot of the URL document.

Guido.van.Rossum@cwi.nl
Mon, 04 Jul 1994 15:29:10 +0200

Message-Id: <9407041329.AA04045=guido@voorn.cwi.nl>
To: timbl@www0.cern.ch
Subject: Re: Another snapshot of the URL document.
In-Reply-To: Your message of "Mon, 04 Jul 1994 14:13:13 MDT."
<9407041313.AA01795@ptpc00.cern.ch>
From: Guido.van.Rossum@cwi.nl
Date: Mon, 04 Jul 1994 15:29:10 +0200

> Do not confuse unsafe and reserved characters!
>
> Reserved characters have *different* meanings when encoded
> Unsafe characters have *the same* meaning when encoded.

Oops -- maybe the document should stress this more then.
Unfortunately it seems that there are now (for some protocols) four
different categories: characters that are safe and reserved
(e.g. user@host is different from user%40host), characters that are
safe and not reserved (e.g. alphanumerics), characters that are unsafe
and reserved (e.g. "/") and finally characters that are unsafe and not
reserved (e.g. "~"). I think I'd feel more comfortable if all
characters that were reserved (in some protocols or some parts of
protocols) were also safe. This would require adding ":", ";", "/",
and "?" to the list of safe characters. The rule would then simply be
to encode all unsafe characters plus all reserved characters used in a
non-reserved role. (I just noticed that the list of safe chars in the
text differs from that in the BNF -- the BNF does not list @ as safe.
Maybe that solves it. I would just hope that the text self-contained
without the BNF.)

> As any gateway, proxy, etc, is allowed to encode or decode
> any unsafe characters within a context whose safety is
> understood (eg HTTP), you cannot say that %27 and & in a
> form-generated URL mean different things.

Unfortunately this breaks Mosaic forms with "METHOD=GET". I don't
know if that's a problem.

> You can't talk about a top level directory of an HTTP server,
> as it does not have to have any directory structure at all.

This contradicts the document, which says:

| The "/" character within HTTP is used to designate a
| hierarchical structure.

It also contradicts the practice of converting a partial URL to a full
URL in the context of another full URL (I understand this is outside
the scope of the URL document but I expect it will be standardized
separately nevertheless).

Well, maybe I should have said "welcome page" instead of "top level".

> There is a convention that
>
>
> http://<host>:<port>/
>
> is the WWW URI of a welcome page and suitable default entry point to
> the server. If the last / is omitted, the parsing software
> puts it back on, as the /<path> sent to an HTTP server cannot be void.

This practice should be standardized (both the convention that
http://<host>:<port>/ is a welcome page and the convention to tack the
/ back on when omitted, but at least the latter one).

--Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>