Date: Fri, 29 Apr 1994 18:44:51 -700 (PST)
From: Mitra <mitra@pandora.sf.ca.us>
Subject: Re: Seperating URC format and URN->URC resolution
To: Peter Deutsch <peterd@bunyip.com>
In-Reply-To: <9404300045.AA13678@expresso.bunyip.com>
Message-Id: <Pine.3.89.9404291837.B17051-0100000@pandora.sf.ca.us>
On Fri, 29 Apr 1994, Peter Deutsch wrote:
> > On Fri, 29 Apr 1994, Peter Deutsch wrote:
> > > Now, if there _is_ a TXT record, it might indicate several
> > > optional servers, it might indicate an optional port
> > > number for special handling, it might indicate the address
> > > of a specific publisher agent for this particular user at
> > > this site and so on, but even without any of this we can make a
> > > first stab at resolution.
> > Peter - this sounds very vague - could you give an example of the
> > connections a client might make, and the choices it would make, I think
> > this is getting very complicated - aka slow, but dont really understand
> > what you are suggesting. . . .
>
> I agree I wasn't all that clear. I should have chosen a
> better example. Let's try this again.
>
> If the URN looks like:
> URN:myurnsrvr.bunyip.com.srv.int:"ID=WAR_and_Peace"
>
> Then, this tells you what server to ask for the
> dereference, plus the string to send it to perform the
> dereference. Note that this is _not_ a URL and thus we
> don't necessarily know what protocol to speak when you get
> there at this point. I think you will either agreed upon
> this out of band, or have a mechanism for looking it up on
> the fly (eg. a subroutine that understands the TXT record
> technique, or whatever).
Peter - that's the problem, there is no out-of-band unless we define a
protocol for it. The client receives a URN, it needs a deterministic set
of steps to go through to find a set of URLs neither of your alternatives
"agreed upon this out of band" or "looking it up on the fly" are sets of
action a client can take - I think, that my message gave an example of
what a client would have to go through to do this.
> Now, if you happen to know that any URN served by
> bunyip.com requires WHOIS++, you don't need to query for
> the TXT record, you just go to "myurnsrvr.bunyip.com" and
> perform a WHOIS++ query, get back the result and you
> should have the URL you wanted. You could presumably also
> ask this server other questions, such as author name, etc
> for a particular URN and you thus have URN->URC
> resolution, as well.
>
> If you _don't_ know what protocol to use, then you need to
> have a mechanism to find out and call the routine that
> understands this before performing the dereference itself.
>
> So, assuming the client has something like:
>
> URN:myurnsrvr.bunyip.com.srv.int:"ID=WAR_and_Peace"
>
> and you want to dereference this to a URL, then the code
> could look something like this:
>
>
> /* ------------------------------------------------------------*\
>
> -- subroutine to dereference a URN --
>
> Assumes the following routines:
>
> resolver_protocol_to_use() - returns ID of protocol to use
> for this URN server, or ERROR
How - this involves network connections unless the URN contains the
protocol, (which breaks other rules, like URN's having a longevity greater
than the systems to deliver them). Also - how do you decide to use TXT
records or gethostbyname.
> connect_to_address() - because I was too lazy to
> show a full connect...
> process_query() - send opaque string on connection
> using appropriate protocol
> proxy_srvr() - send URN to well-known proxy
> for processing
>
> sample call:
>
> servername = "myurnsrvr.bunyip.com.srv.int";
> deref_URN(servername);
>
> All this assumes the processing routine properly constructs
> the URN structure from the query by magic and of course, any
> comments on my C indenting style should be directed to
> /dev/null... :-)
>
> \* ------------------------------------------------------------*/
>
>
>
> struct URN *deref_URN (char *servername) {
>
> if (protocol = resolver_protocol_to_use(servername)) {
> if (address = gethostbyname(servername) == NULL) {
> fprintf (stderr, "Sorry, not a legal servername\n");
> return(NULL);
> }
> if ((ptr = connect_to_address(address, protocol)) == NULL) {
> fprintf (stderr, "Sorry, can't dereference URN!\n");
> return(NULL);
> }
> else {
> return(process_query(ptr, protocol));
> }
> }
> else {
> return(proxy_srvr(servername, protocol));
> }
>
> }
>
> Of course, real code would need a lot more hand-holding...
>
> Note that in the case where you _don't_ already have the
> address of Bunyip's server, you'd need A DNS lookup
> (perhaps half a dozen UDP packets, depending upon caching,
> depth of FQDN, etc) plus one dereference query. Where
> you've already cached this result, or are using other
> techniques to find the server, there's only the
> dereference itself to worry about.
Huh ? Where is the "dereference query" in the above stuff.
>
> Now, if you _don't_ know the protocol, you could try for a
> TXT record (and thus have to return to DNS again and see
> if one is registered. If not, you're out of luck but if
> so, you can then either resolve it or send it to a proxy
> server for resolution on your behalf. This would require
> an additional lookup from DNS but otherwise the scenario
> remains unchanged and you then don't need to have a client
> that speaks every resolution protocol. Heck, there may
> even be money to be made here for a subscription URN
> dereferencing service...
Unfortunately - I dont believe proxy servers are adequate for this task
at all, the extra time delays involved in going through a intermediate
server, which has to go to the real server, are unacceptable in a network
where these can take of the order of seconds - and that this isnt going
to even return the document, only figure out where to get it and what
choices you have.
> > I have two problems with this - but both could be worked around if this
> > is really what is wanted.
> > a) it takes at least one more connection to DNS for the default case,
> > which is an unneccessary extra delay.
> Actually, only where the server information is needed and
> I don't see how you can avoid that at least once. With
> caching, alternative resource discovery, etc it may not
> need more than a single dereferencing query once you've
> established the location of a server for a particular URN
> server, modulo caching timeouts and such.
Agreed - if you want to build server caching into your client, but since
clients are usually short-lived and run by users, this means saving info in
shared files, (security issues), or each user having to have their
clients learn lots of things. It also is going to lead to a substantial
run-time memory overhead in clients as they read caching information for
a very substantial number of servers.
>
> > b) a client writer has to implement *ALL* the protocols above, in order
> > to know that they can resolve any URN they are presented with.
>
> Or have a pointer to a proxy server which will know them
> on your behalf. I think this would be a suitable
> "departmental service", or something operated by
> connectivity providers for their users, as they operate
> archies now. Of course, if Darwinian selection chooses a
> winner fairly quickly, this might not even be needed.
See above problems on speed.
>
> > I could live with this, but only if there we put together gateways
> > between a default protocol and whatever protocol the resolver uses
> > natively, I dont know about other client writers, but sticking all these
> > in makes the job harder, and the clients bigger, and I'm only going to be
> > able to use the common subset of the functionality anyway. Any client
> > writers figuring that they'd implement all of these, please speak up :-)
>
> I assume that if we offer you a frontend onto a generic
> service as part of "Son of Archie(tm)" presumably we can
> address this particular concern. It may speak Gopher, HTTP
> or WHOIS++ (or all three... :-) In any event, I don't see
> this as something that every client needs to be worried
> about in the long run. Either one protocol will be chosen,
> or proxies will exist. I suspect we'll see if I'm being
> too optimistic about this soon enough...
If servers have to implement a bunch of protocols, then that is slightly
better, but its a complete waste of time in my opinion.
I really believe that we are making a simple task unneccessarily complex
at the protocol level. The smarts should be going into the backend
resolution, not building clients with multiple protocols, caching, and
proxy servers and all that.
This is really a very simple process, in my SIMPLE scenario a client only
needs to be able to call gethostbyname, and send a simple one-packet
query to a server (which can be as smart/complex as it likes). I dont
care which protocol we use, I'm committed to implementing it in
everything I write, but not unless we can settle on ONE - I'd rather
stick with URL's and gain in speed and client-size.
- Mitra