Message-Id: <9404280048.AA11247@expresso.bunyip.com>
From: Peter Deutsch <peterd@bunyip.com>
Date: Wed, 27 Apr 1994 20:48:51 -0400
In-Reply-To: Mitra's message as of Apr 27, 15:58
To: mitra@pandora.sf.ca.us (Mitra), <uri@bunyip.com>
Subject: Re: Selecting URL from "equal" sites
Hi all,
[ Mitra wrote: ]
> Dirk Herr-Hoyman (hoymand@gate.net) wrote:
> : We've been chatting a bit about what might go in a URC to allow for URL
> : selection. There's one case we haven't dealt with and it's one that's in
> : front of us right way. How can you choose between "equal" sites? For
> : example, how does one pick an archie server or GNN server?
. . .
> : Anyone have a good idea here for the archie case?
>
> The archie case, actually falls most likely into a load-sharing case,
> not a "closeness" case, since the volume of material shifted is fairly
> small, but the servers are frequently overloaded. There have been a
> number of ideas for dealing with this, one of which is to pick an
> address like "archie.svc.int" and then put a smart DNS resolver on
> "svc.int" that monitors the load on all the archies and returns the IP
> addresses sorted by load, or if loads are equivalent, sorted by
> "closeness".
The archie case is actually a little more complex than it
might seem at first glance. First, although I've advocated
polling the server for load in the past, I'm told by Alan
(who knows a lot more about the operational specifics of
the archie/Prospero server than I do at this point) that
the load varies so quickly that the numbers returned
aren't necessarily a good descriminant. At the same time,
the archie server actually prioritizes queries based upon
type, with exact match being given higher than substring,
etc. Thus, a server with a queue full of regex queries
might still give better response to an incoming exact
match query than one with the same number of queries in
the queue which are all exact match.
FWIW, Prospero actually does allow the client to ask for
server load. I don't know of anyone who makes use of this
feature in the archie context.
I guess the moral is that tuning is a global exercise and
you need to make sure you understand what you're trying to
optimize to make sure you get it right. Load sharing is a
worthy goal on heavily loaded services. Selecting
authoratative vs. non-authoratative servers is a worthy
goal. Selecting among potential info servers based upon
"probability of it having what you want" is useful when
browsing multiple services, and so on.
I mentioned in a previous posting that I agree with Mitra
that selecting the "best" ftp server from a list of archie
hits seems to be best done in the client, since you may
choose to sort the returned list on such things as domain
(which makes some sense in Europe, but is less useful in
the U.S.), response time to the various servers,
calculated bandwidth as reported by network management
software, etc.
On the other hand, one thing the "svc.int" technique buys
us is better resource discovery, as the user would no
longer need to keep a list of servers around, as that
would be a task for the svc.int server. I'm just not sure
that the DNS server is necessarily in the best position to
order a list that is returned to a client, as often only
the client is in a position to judge which would be the
best server for it to use.
I think it is perhaps better for a helpful resource server
to return a list of candidates (and maybe even "last
reported load factor" or other hints) and let the client
decide the most appropriate place to go, based upon its own
selection criteria. Who knows, maybe the selection task
could even be divided between the server and client, with
the server dropping out certain potential candidates from
a list based upon its particular policies. The client would
then choose from the supplied list that server which seems
like the best candidate in the circumstances.
As I've told a few people now, to allow the client to make
such choices I'd love to have a "cost daemon" running on
my local network that would take a list of IP addresses or
domain names and a metric and return the list sorted on
the supplied metric (eg. sort on domain, response time,
etc). Such a program might do nothing more than consult a
static file for a preconfigured list of valid domains, or
it might be smart enough to start pinging the list and
checking response times. If it caches these pings and is
not so aggressive that it adds significantly to the network
load it could quite a useful tool.
> In other words .... I'm suggesting the solutions to these problems belong in
> something other than the URC or the URN->URC resolver.
I agree. We _can_ keep an eye out to make sure that we don't
do things at this level to hinder the process, but not all
problems are going to be solved with URIs.
- peterd
--
-----------------------------------------------------------------------------
"What do thay got, a whole lot of sand? We got a hot crustacean band!
Each little clam here, know how to jam here! Under the Sea!"
-----------------------------------------------------------------------------