Date: Tue, 8 Mar 94 14:32:03 CST
Message-Id: <9403082032.AA07380@boombox.micro.umn.edu>
From: "Mark P. McCahill" <mpm@boombox.micro.umn.edu>
To: hoymand@joe.uwex.edu, timbl@www3.cern.ch, uri@bunyip.com
Subject: Re: gopher+ support in the URL draft.
In message <9403081452.AA26689@joe.uwex.edu> Dirk Herr-Hoyman writes:
> At 7:17 PM 3/7/94 -0600, Mark P. McCahill wrote:
> >With a minor change to the syntax of the current gopher URL, we can have
> >both plain old gopher and gopher+ in the same URL and have a nice regular
> >syntax.
>
> >The improved syntax is basically the current gopher URL syntax, but uses an
> >encoded <tab> character as a seperator rather than the <?> character. Using
> ><tab> as a seperator is convenient because <tab> cannot occur in gopher
> >selector strings. The only time the <?> is used in the current gopher
> >selector is when the URL points to a gopher search engine and is passing
> >the search engine a search string (tth of words for which to search)... the
> >virtue of having a regular syntax and using a seperator character that
> >cannor occur in gopher selector strings should to be obvious.
> >The format of a gopher URL is:
> >
>
> Mark, I'm not following why you are changing from using ? as the seperator
> for searches to tab. It seems that if you let %09 be the gopher+ "flag"
> you could still use ? for searches. Perhaps I'm just being dense here and
> you can educate me. I've listed some counter examples that appear to work
> for me.
>
If you use "?" as a seperator, then you have to encode all occurrances of "?"
in all gopher selector strings (yuck). Gopher selector strings are guaranteed
NOT to have <tab> in them which makes <tab> an excellent choise as a seperator.
Gopher selector strings can (and do) contain "?" characters. So, "?" was a poor
choice for a separator in gopher URLs.
Using only one seperator within the gopher URL would make the parsing of the
URL easier, because the syntax would be more consistent, and you get to skip
encoding a character that should never have been used as a seperator in the
first place.
Since clients will have to reworked to deal with gopher+ URLs anyway,
this is an excellent time to make a better character the seperator.
> >An example of a URL pointing to a gopher type 7 item (a search engine)
> >where the string foobar is to be submitted to the search engine is:
> >
> > gopher://host [port]/7a_gopher_selector%09foobar
> >
> gopher://host [port]/7a_gopher_selector?foobar
>
> If this were a 7+ search, then
>
> gopher://host [port]/7a_gopher_selector%09?foobar
>
> Which looks to both preserve the existing search syntax and allow for
> gopher+.
>
The issue here is how important it is to preserve the current syntax for
referring to a search for a specific set of words.
Is it important enough that all future comsumers of gopher URLs have to know
about "?" as a seperator for search words, AND require "?" to be encoded in all
future gopher selector strings inside URLs? Given that there is a chance to get
it right and use a seperator that cannot occur in the selector strings
(a <tab>) this seems like the time to fix the gopher URL.
The only case that this breaks for the existing installed base is where the
URL refers to a gopher search engine AND neds to pass specific search terms to
the search engine... for all other cases it is backward-compatible. URLs
referring to the search case are also not widespread in the installed base.
> I would prefer to see an orthagonal treatment of searches (and anything
> else) within the not-so-opaque selector string. Are we going to allow for
> a separate evolution of these "added" parts of the selector strings by each
> service
We probably have to allow for this or live with URLs that cripple some
services. Crippled access to services is too high a price to pay for
uniformity.
> or are we going to try and do something uniform? I am seeing
> selectors proposed that will have to be parsed,
If we consistently used <tab> as a seperator there would be less parsing of
the gopher URL to get to something that you send to a server than if "?" is
used.
> rather than merely passed
> directly to the server, which leads me to wonder whether we shouldn't think
> about (* oh, no *) a STRUCTURE.
>
> Mark, I'd also like to see something about how attributes are to be
> handled, even if it's brief. For example,
>
> gopher://host/0selector%09!
>
> Returns the attributes. This is very different than
>
> gopher://host/0selector
>
> which actually fetches the item. The implecation here (and for ASK+
> blocks) is that what happens cannot be determined merely by looking at the
> single character gtype, you must know whether it's gopher+ or not.
>
OK, how about this:
===================================================================
GOPHER
Gopher selector strings may contain any characters other than tab, return, or
linefeed, so it is important to encode all disallowed characters and encode any
space characters so these characters are not altered during transport of the
URL. Note that since gopher selector string are opaque and map to native file
systems of the gopher server, encoding of disallowed characters in the selector
string, is done to map to binary codes rather than ISO character sets. In other
words, the "%" character followed by two hexadecimal digits is used to encode
binary data. Do not interpret gopher selector strings.
The format of a gopher URL is:
1.) A single-character field to denote the Gopher type of the resource to
which the URL refers.
2.) The gopher selector string.
Note that some gopher selector strings begin with a copy of the gopher type
character, in which case that character will occur twice consecutively. Also
note that the gopher selector string may be an empty string since this is how
gopher clients refer to the top-level directory on a gopher server.
3.) An encoded tab character (%09) to seperate the gopher selector string from
the optional search string (see 4 below).
If the URL does not refer to a Gopher+ item and if there is no gopher search
string then parts 3, 4, 5, and 6 of the URL are optional
4.) The gopher search string.
If the URL refers to a search to be submitted to a gopher search engine the
search string is required. Otherwise this is an empty string.
5.) An encoded tab character (%09) to seperate the gopher search string from
the optional gopher+ string (see 6 below). Note that if the URL refers to a
gopher+ item and does not have a gopher search string, there will be two
encoded tab characters in a row.
6.) The Gopher+ string. Gopher+ strings may consist of a one or more
characters. For instance, the "!" character is used in the Gopher+ string
to refer to all of a gopher+ item's attributes... while
"+VIEWS:application/postscript" might be used to refer to the postscript
alternate view of a gopher+ item.
So, the format of a Gopher URL path refering to a gopher type "T" item is:
gopher://host [port]/T[gopher_selector]%09[search_string]%09[gopher+_string]
Examples:
An example of a URL pointing to a gopher type 0 item (a document) is:
gopher://host [port]/0a_gopher_selector
An example of a URL pointing to a gopher type 7 item (a search engine)
where the string foobar is to be submitted to the search engine is:
gopher://host [port]/7a_gopher_selector%09foobar
An example of a URL pointing to a Gopher+ type 0 item (a document) is:
gopher://host [port]/0a_gopher_selector%09%09some_gplus_stuff
An example of a URL pointing to a Gopher+ type 0 (document) item's attribute
information is:
gopher://host [port]/0a_gopher_selector%09%09!
An example of a URL pointing to a Gopher+ document's postscript
representation is:
gopher://host [port]/0a_gopher_selector%09%09+VIEWS:application/postscript
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
===================================================================
Mark P. McCahill
gopherspace engineer/University of Minnesota
mpm@boombox.micro.umn.edu
612 625 1300 (voice) 612 625 6817 (fax)