Date: Mon, 7 Jun 93 22:01:40 MET DST
From: timbl@nxoc01.cern.ch (Tim Berners-Lee)
Message-Id: <9306072001.AA18673@ nxoc01.cern.ch >
To: uri@bunyip.com, weibel@oclc.org
Subject: Re: scalability of table lookup
Date: Mon, 7 Jun 93 13:42:26 EDT
From: weibel@oclc.org (Stu Weibel)
Subject: scalability of table lookup
Stu says:
In fact, libraries have solved this problem, at least the part of the
problem relevant to this discussion. The OCLC Online Union Catalog
contains roughly 28 million records (including who holds the items)
representing some 500 or 600 million items; this system supports Inter
Library Loan (ILL) for any member library. The aspects of ILL that
don't work so well have to do with physical delivery and lending
agreements which have little to do with our problem.
I don't believe this is really a scaling problem.
If I understand you correctly, you are saying that, given an
arbitrary unstructured key, that this system can find 1 out of
500 million items, in around 5 milliseconds? That is a useful
service.
The sort of figures we have been talking about asume
(say) that each person in the US generated on average 100
identifiers a day,that is 2.5e10 identifiers a day,
for 10 years .. that's 1e14 identifiers left around if noone
throws their mail away... in fact the rate may be less but other countries
may also be doing it... That sort of figure is really manageble
if you take the square root (one punctuation mark in the URN)
> What we don't necessarily have is a fast solution.
How fast is fast? Is there a consensus here as to what constitutes
acceptable performance? 100 or 200 transactions a second is a very
reasonable target for servers of this kind (we routinely achieve 200
transactions per second on our cataloging system). The apparent speed
to the user/application would, of course, be dependent on load.
The zog experiments (I think) provided the figure of 100ms after which
a person's ability to solve problems with the system degrades. Not
much point getting it way below the ping time across the net.
Your figures definitely hold up the Peter Deutch method.
stu
Tim