Re: Another snapshot of the URL document.

Tim Berners-Lee (timbl@www3.cern.ch)
Tue, 5 Jul 94 12:09:45 +0200

Date: Tue, 5 Jul 94 12:09:45 +0200
From: Tim Berners-Lee <timbl@www3.cern.ch>
Message-Id: <9407051009.AA04147@www3.cern.ch>
To: Guido.van.Rossum@cwi.nl
Subject: Re: Another snapshot of the URL document.

> > Reserved characters have *different* meanings when encoded
> > Unsafe characters have *the same* meaning when encoded.
>

> Oops -- maybe the document should stress this more then.
> Unfortunately it seems that there are now (for some protocols) four
> different categories: characters that are safe and reserved
> (e.g. user@host is different from user%40host),

No -- @ is the same as %40 in all cases. The user and host strings
may neither contain either.

> characters that are
> safe and not reserved (e.g. alphanumerics),

yes,

> characters that are unsafe and reserved (e.g. "/")

what is unsafe about "/"? It is reserved.

> and finally characters that are unsafe and not
> reserved (e.g. "~").

yes.

Let's get this straight. "unsafe" and "reserved" are
exclusive and shgould match the terms in the BNF.

> I think I'd feel more comfortable if all
> characters that were reserved (in some protocols or some parts of
> protocols) were also safe.

Yes -- the list in the text seems to be at variance with the BNF.

> This would require adding ":", ";", "/",
> and "?" to the list of safe characters.

Yes (though I take your word for the exact list)

> The rule would then simply be
> to encode all unsafe characters plus all reserved characters used in a
> non-reserved role. (I just noticed that the list of safe chars in the
> text differs from that in the BNF -- the BNF does not list @ as safe.
> Maybe that solves it. I would just hope that the text self-contained
> without the BNF.)

Agreed.

> > As any gateway, proxy, etc, is allowed to encode or decode
> > any unsafe characters within a context whose safety is
> > understood (eg HTTP), you cannot say that %27 and & in a

> > form-generated URL mean different things.
>

> Unfortunately this breaks Mosaic forms with "METHOD=GET". I don't
> know if that's a problem.

Sounds like it is a problem. The form encoding ((which I don't
like anyway incidentally when it is used for input
of data rather than finindg data, as HTTP's GET should
be a read-only fundtion, idempotent and not affecting the web))
does in effect rule out the includeion of & in the values
of fields.

> > You can't talk about a top level directory of an HTTP server,
> > as it does not have to have any directory structure at all.
>

> This contradicts the document, which says:
>

> | The "/" character within HTTP is used to designate a
> | hierarchical structure.

It doesn't *have* to have a hierarchical structure. If it does,
then / is used. I guess I'm quiblling over words but one has to
to prevent later misunderstanding that http paths are filenames.

...
> This practice should be standardized (both the convention that
> http://<host>:<port>/ is a welcome page and the convention to tack the
> / back on when omitted, but at least the latter one).

I agree. The place for the first is not obvious -- in the HTTP spec I
guess. But it is only a convention, as the path is opaque and
so one can map anything into it. Which is important.

The tacking on of the / should go into the URL spec I agree

> --Guido van Rossum, CWI, Amsterdam <Guido.van.Rossum@cwi.nl>
> URL: <http://www.cwi.nl/cwi/people/Guido.van.Rossum.html>

Tim