Proposed URLs that robots should search

Andrew Daviel (andrew@andrew.triumf.ca)
Mon, 23 Oct 1995 21:51:17 +0100

Date: Mon, 23 Oct 1995 21:51:17 +0100
From: Andrew Daviel <andrew@andrew.triumf.ca>
To: /CN=robots/@nexor.co.uk
Subject: Proposed URLs that robots should search

With my other hat on (admin@vancouver-webpages.com), I'm
trying to build a database of URLs and other information for businesses
on the Net.

One approach is to invite people to submit their URLs using an HTML form
which asks for all the things you think might be useful. Too many fields
and the users bug out, too few and you miss things. Also, if you change
the form, earlier subscribers will have information missing.

Some database registration robots (I believe) search submitted URLs for
keywords, doing some natural language processing to discard modifiers and
prepositions. However, the trend to graphics-dominated homepages makes
such efforts of dubious utility.

In the spirit of /robots.txt, I would like to propose a set of files that
robots would be encouraged to visit:

/robots.htm - an HTML list of links that robots are encouraged to traverse
/descript.txt - a text file describing what the site (or directory) is
all about
/keywords.txt - a text file with comma-delimited keywords relevant to the
site (or directory)
/linecard.txt - for commercial sites, a text file with comma-delimited
line items (brands) manufactured or stocked
/sitedata.txt - a text file similar to the InterNIC submissions forms,
with publicly-available site data such as

Organization: organisation name
Type: commercial/non/profit/educational etc.
Admin: email of admininstration
Webmaster: email of Web admininstration
Postal: postal address
ZIP: ZIP/postcode
Country:
Position: Lat/Long
etc.

in fact, an open-ended list of keys and data, with a defined separator
such as ":", one per line. As with /robots.txt, the files are all 8+3
for compatibility with obsolete (we hope!) file systems. robots.htm should be
type text/html and the others text/plain.

None of the fields, or files, would be mandatory.
Additional text files could be listed in /robots.htm

Has this already been done?

Comments, improvements ?

Andrew Daviel email: advax@triumf.ca
TRIUMF voice: 604-222-7376
4004 Wesbrook Mall fax: 604-222-7307
Vancouver BC http://andrew.triumf.ca/~andrew
Canada V6T 2A3 49D14.7N 123D13.6W