See also:
/robots.txt
" file
are as follows:
The file consists of one or more records separated by one or
more blank lines (terminated by CR,CR/NL, or NL). Each
record contains lines of the form
"<field>:<optionalspace><value><optionalspace>
".
The field name is case insensitive.
Comments can be included in file using UNIX bourne shell
conventions: the '#
' character is used to
indicate that preceding space (if any) and the remainder of
the line up to the line termination is discarded.
Lines containing only a comment are discarded completely,
and therefore do not indicate a record boundary.
The record starts with one or more User-agent
lines, followed by one or more Disallow
lines,
as detailed below. Unrecognised headers are ignored.
If more than one User-agent field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record.
The robot should be liberal in interpreting this field. A case insensitive substring match of the name without version information is recommended.
If the value is '*
', the record describes
the default access policy for any robot that has not
matched any of the other records. It is not allowed to
have multiple such records in the "/robots.txt
"
file.
Disallow: /help
disallows both /help.html
and
/help/index.html
, whereas
Disallow: /help/
would disallow
/help/index.html
but allow /help.html
.
Any empty value, indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record.
/robots.txt
" file
has no explicit associated semantics, it will be treated
as if it was not present, i.e. all robots will consider
themselves welcome.
/robots.txt
" file specifies
that no robots should visit any URL starting with
"/cyberworld/map/
" or
"/tmp/
", or /foo.html
:
# robots.txt for http://www.example.com/ User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space Disallow: /tmp/ # these will soon disappear Disallow: /foo.html
/robots.txt
" file specifies
that no robots should visit any URL starting with
"/cyberworld/map/
", except the robot called
"cybermapper
":
# robots.txt for http://www.example.com/ User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space # Cybermapper knows where to go. User-agent: cybermapper Disallow:
# go away User-agent: * Disallow: /
Code:
file: /Techref/inet/robots.htm, 4KB, , updated: 2021/3/3 09:58, local time: 2024/11/5 06:44,
owner: JMN-EFP-786,
18.224.57.231:LOG IN ©2024 PLEASE DON'T RIP! THIS SITE CLOSES OCT 28, 2024 SO LONG AND THANKS FOR ALL THE FISH!
|
©2024 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions? <A HREF="http://linistepper.com/Techref/inet/robots.htm"> Robots.txt lists items that (well behaved) index engines (search bots, web crawlers, etc...) should not request.</A> |
Did you find what you needed? |