Re: Z39.50 and URI

Dirk Herr-Hoyman (hoymand@gate.net)
Sat, 3 Sep 1994 10:29:46 -0400


 > In a way, Z39.50 might not be a good choice for implementing URN/URC
> server, as the response needs to be FAST.

I don't know whois++ servers, but the http servers I have seen (CERN
and NCSA) have not impressed me compared to my Z39.50 server doing
bland things - e.g. fetching a record by single entry key, closest
I can come to a file in htdocs. And put a perl script in cgi-bin
and http servers are nearly by definition slow then.
Plus getting a record/page from a several million record DB is
no different (for me) than a 20 record DB).

Guess I am saying that the Z39.50 protocol does nothing to make it
slow, EXCEPT having to do an init, so a minimum of 2 round trips.
Plus our (Z39.50 IR type environments) are by definition where
individual record fetches from large corpuses had better be fast.
bob waldstein

From: wald@library.mt.att.com
Date: Sat, 3 Sep 94 09:59:13 EDT
To: hoymand@gate.net (Dirk Herr-Hoyman)
Subject: Re: Z39.50 and URI (fwd)

> Bob, could I copy your response back to the uri@bunyip.com list for the URI
> working group?
SUre - sorry, just don't have time now to be in another group.

> Ok, how is fast? We want this to be fast like DNS. If I have a hypertext
> page with 10 URNs, all from different sites, could I get the variant sets
> for all within 2 sec. off your server?

Heh - you know this is hard (timing estimates tha is). "fast like DNS"! Want
to bet it takes me seconds to ping gate.net. Lets see, what you are asking is:

I've got 10 URNs to resolve. WIth Z39.50 done by (2 ways):
- init
-for 1 to 10
- search URN (client to server)
- piggybacked search response (server to client)
OR (better turn-around, more client work)
- init
- search of (URN1 or URN2 or URN3 ...)
- piggybacked search response

SO the time cost:
1. getting to server (connect)
2. server startup if not running process
3. init response to client
4. search to server
5. server do the lookup
6. server build search response (fetch records)
7. server response to client

Two answers:
- philosophically the above should be totally dominated by the network
parts (steps 1,3,4,7) and startup (step 2) if done tha way. I believe
for me this is true (at least for DBs unde 5 million or so records).

- I used my client minimizing overheads outside the above (can't eliminate
them totally, e.g. response formatting) against my personnel DB searching
on SSnumbers. A lot really like a URN lookup, though the retrieved
record is still somewhat larger (in the realm of 500 bytes per)
Oh, my personnel DB is ~ 300,000 records.
I will not rewrite my client for this test, so the mode is:
init
loop 1 to N
search on 1 SS#
search response
present request
present response

10 SS#s:
real 0m3.00s
user 0m1.15s
sys 0m0.28s

100 SS#s:
real 0m6.95s
user 0m1.30s
sys 0m0.51s

1000 SS#s:
real 0m20.68s
user 0m3.70s
sys 0m3.61s

Okay - you know the troubles with time(1) I presume; and of course the above
is one definitely not optimized implimentation. But gives you one
implementation benchmark point.
bob
p.s. feel free to repost the above as you please.

--
Dirk Herr-Hoyman <hoymand@gate.net> |                      Practice
CyberBeach Publishing               |       random acts of kindness
   * Internet publishing services   |          and senseless beauty
Lake Worth, Florida, USA            |                          
Home Page: <URL:http://www.gate.net/~hoymand/cyberbeach.html>
Phone:     +1.407.540.8309