Re: caching

hallam@alws.cern.ch
Sat, 19 Feb 1994 16:58:18 +0100

Date: Sat, 19 Feb 1994 16:58:18 +0100
Message-Id: <9402191558.AA07340@dxmint.cern.ch>
From: hallam@alws.cern.ch
Subject: Re: caching

Re the cache discussion. I am having a little difficulty following this. It
appears to me that a number of separate concerns are being treated at the same
time by different people. amongst these are:

1) The syntax of a resource identifier.
2) The logical structure (semantics) of a resource identifier.
3) The implementation of the semantics.

It seems to me that we should start with 3 and work backwards, the logical
structure required to implement each possible implementation should then become
clear.

On the dicussion about time to live etc it seems to me that this sort of scheme
is only a `poor mans' version of a proper cache mechainism. It has the advantage
that the document master need keep no records of the places where the document
is cached. This important when several thousand clients are caching the same
page, the cache hit rate is relatively low and the cost involved in obtaining an
out of date document is low.

A full cache system requires that the master has knowledge of each of the
locations where a document is cached. The change protocol becomes a two phase
process:
1a) Notify all caches that the document has become invalid
1b) Wait for replies
2a) Update document
2b) Acknowledge the updater

A more complex situation occurs where the master can be updated at more than one
location. This requires the use of a lock. Before attempting the update the lock
must be obtained.

Scaling etc may be ensured by the distribution of locks throughout the cache
structure. Fragmentation on node failure may be prevented by the usual
mechanisms of quorum counts, votes etc. Fallback locking systems may also be
applied.

Such a maximal scheme would clearly be most appropriate as a backbone system.
More lightweight cache schems could then be hung off it.

This is not mirroring by the way. In mirroring you have a single update point
for the database and the changes are copied across to the mirror sites. Under
this scheme we have the option of updating a document at an arbitrary server
node. This means that we do not have a single point of failure or a bottleneck.

Note that to resolve such an address it is merely necessary to have:-

1) A unique address per document
2) A universal method for resolving an address.

If we follow the model of ISBN's or barcodes, each publisher has a unique
prefix. The publisher then assigns numbers within the namespace granted.

To resolve an address we first find resolve the prefix to discover the
address of either the publisher or a local agent. To do this we look up the
address in a catalogue. The catalogue itself is merely a document cached
via the same cluster-cache scheme as used for the documents pointed to
by the addresses themselves.

Phill Hallam-Baker