Re: Propose draft-ietf-uri-relative-url-05.txt for Proposed Standard

drtr1@cam.ac.uk
Mon, 6 Feb 95 16:41 GMT

Message-Id: <m0rbWUn-0007atC@grus.cus.cam.ac.uk>
Date: Mon, 6 Feb 95 16:41 GMT
To: uri@bunyip.com
Subject: Re: Propose draft-ietf-uri-relative-url-05.txt for Proposed Standard
From: drtr1@cam.ac.uk

There seem to be some differences in the URL definitions contained in
the draft and in RFC 1738; it is certainly confusing on first reading these
documents (with a view to writing a URL parser). Maybe this is because
they are defining subtly different objects, although both BNFs define a
'url'.

1. Are national characters allowed in a URL?
This seems the most significant difference. RFC 1738 has
unreserved = alpha | digit | safe | extra

whereas the draft (draft-ietf-uri-relative-url-05.txt) has
unreserved = alpha | digit | safe | extra | national

Hence the draft allows national characters in most parts of most URLs, whereas
the RFC does not.

2. file, ftp and http cannot _always_ be parsed using the generic-RL syntax.

In section 2.3, the draft states:
> Finally, the following schemes can always be parsed using the
> generic-RL syntax.
>
> file Host-specific Files
> ftp File Transfer Protocol
> http Hypertext Transfer Protocol
> nntp USENET news using NNTP access

The generic-RL syntax has a path element defined as
segment = *pchar
pchar = uchar | ":" | "@" | "&" | "="

with ";" and "?" reserved for delimiting the params and query.
However, the RFC allows ";" in an http path segment, and "?" in an ftp or
file path segment.

In fact, this is not much of a problem if you do not assert that these
schemes can _always_ be parsed using the generic-RL syntax.

David Robinson. (drtr@ast.cam.ac.uk)