Overview
A robots.txt file is an important tool when optimising a website. It tells web robots and spiders which areas of a site they may access.
What are web robots?
Web robots, also known as crawlers or spiders, are programs that index information on the internet. Some are sent by search engines like Google, and some are sent by devious programmers to harvest information such as email addresses.
Create a robots.txt file
Create a text file and name it robots.txt. Enter the following into your robots.txt file:-
User-agent: *
Disallow:
User-agent
refers to the web robot the rules that follow apply to. The *
wildcard character means all web robots.
Disallow
refers to the area of the site robots should not index.
So the above robots.txt file means all robots are allowed everywhere.
Restrict access
Apply rules to all robots
User-agent: *
Disallow: /
The above means all robots are not allowed anywhere.
User-agent: *
Disallow: /dev/
Disallow: /uat/
The above means all robots are allowed everywhere except in /dev/
and /uat/
.
Apply rules to specific robots
User-agent: BadRobot
Disallow: /
The above means BadRobot
is not allowed anywhere.
Please note that there is no guarantee that a web robot will adhere to the rules within your robots.txt file
Upload robots.txt
Once complete, upload the robots.txt file to the root directory of your website. This is normally the directory that your home page is in.
References
This article was possible thanks to robotstxt.org