robots.txt : a standard for robot exclusion
Friday, July 3rd, 2009WWW Robots (also called wanderers or spiders) are programs that traverse many pages in the World Wide Web by recursively retrieving linked pages. For various reasons robots are not always welcome to access certain web pages.
A simple method used to exclude robots from a server is to create a file on the server which specifies an access policy for robots. This file must be accessible via HTTP on the local URL “/robots.txt”. The contents of this file uses two records: user-agent and disallow.
It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody, and there no guarantee that all current and future robots will use it. Consider it a common facility the majority of robot authors offer the WWW community to protect WWW server against unwanted accesses by their robots.
The latest version of this document can be found on http://www.robotstxt.org/orig.html.