Message-Id: <9405031515.AA07674@tipper.oit.unc.edu>
To: Christian Huitema <Christian.Huitema@sophia.inria.fr>
Subject: Re: Navigation with centroids
In-Reply-To: Your message of "03 May 94 17:03:11 +0200."
<199405031503.AA00963@mitsou.inria.fr>
Date: Tue, 03 May 94 11:15:27 -0400
From: Simon E Spero <ses@tipper.oit.unc.edu>
>>>>> "Christian" == Christian Huitema <Christian.Huitema@sophia.inria.fr> writes:
Christian> Problem: kicking upstairs is a normal thing to do if
Christian> the query came from downstairs, a stupid idea if it
Christian> already came from upstair. One could indeed add a mark
Christian> in the command, eg a + or a - after the "loop index",
Christian> to express that idea. But I am not sure of the
Christian> semantic... Any idea? How does WHOIS++ solve the
Christian> problem?
Here's something I posted to one of the lists a while ago, proposing one
possible search strategy using centroids and named hierarchies.
------
First cut at best-first searching.
Definitions:
Softpages problem The task of locating the most appropriate copy
of an object (images, software, etc).
Centroid A centroid is an object that represents a collection of
other objects. In this case we shall use the term to
describe a record containing a list of the
attribute-value pairs present on a directory server.
Topological hierachy In the modified [C]LDAP a server can be a member of
serveral different hiearchies. The topological
hierachy is contructed by placing links between
servers which are at opposite ends of networked
connections. This may or may not be precisely the
same as the underlying physical network, but is
intenged to be a fairly close approximation.
Sites with multiple connections may have multiple
parents.
In the topological hiearchy, parent-child relation-
ships are determined by which server polls the other.
Since connectivity is symetric, two servers may poll
each other without Oedipal complexities.
Cost(h1,h2,k) The time taken to initiate a transfer and recieve 'k'
bytes from 'h1' to 'h2'
---------------
Alogrithm:
part 1 - centroid weightings.
Each term in a centroid is given a weight. For the topological hiearchy, we
shall define this weight to be a measure of the Cost of accessing items
referenced by that term from this server. If there are several different
costs associated with a term, the weighting used should be the lowest
value. The pollee should also identify the topological parents which it
does not poll [This isn't quite right - ses]
When a poller retrieves a topological centroid and adds it to its database
of forwarding information it should increment the weights by the Cost of
accessing the polled server. Intuitively this value is valid because we are
measuring the cost of accessing the document via the network link
corresponding to this topological link.
The polled server should maintain an upwards link weighted by the cost of
accessing the poller.
part 2 - search strategy
We shall assume that the server returns a list of either records or
referrals, each labelled with its cost. The client should increment the
costs returned by the cost of accessing the server.
Client-Side
-----------
let Q be a queue of records and servers, ordered in ascending order of cost.
Initialise R, the set of already queried servers, to NIL
Initialise Q to the default server
While Q is not empty
let item = head(Q)
if item is a record,
return the record
else
remove item from Q
add item to R
add the results of querying 'item' to Q, removing
all items already in R.
fi
endwhile
Return not-found
--------
server side.
Return all matches to the query, together with the server's topological
parents.
-----------
Although the algorithm is fairly simple to understand when a match exists,
the case where the resource does not exist is slightly more complicated.
In this case, the queue will only contain servers topological parents.
It might seem that this would lead to an explosive fan-out; however,
centroids come to the rescue here, restricting the search to only
upwards links. These will eventually converge upon a backbone, and die off
pretty rapidly (we can consider a backbone to poll all its links - much
of the internal topology could be collapsed into a large cloud).
Simon