A robots.txt file is an important tool when optimising a website. It tells web robots and spiders which areas of a site they may access.
What are web robots?
Web robots, also known as crawlers or spiders, are programs that index information on the internet. Some are sent by search engines like Google, and some are sent by devious programmers to harvest information such as email addresses.
Create a robots.txt file
Create a text file and name it robots.txt. Enter the following into your robots.txt file:-
User-agent: * Disallow:
User-agent refers to the web robot the rules that follow apply to. The
* wildcard character means all web robots.
Disallow refers to the area of the site robots should not index.
So the above robots.txt file means all robots are allowed everywhere.
Apply rules to all robots
User-agent: * Disallow: /
The above means all robots are not allowed anywhere.
User-agent: * Disallow: /dev/ Disallow: /uat/
The above means all robots are allowed everywhere except in
Apply rules to specific robots
User-agent: BadRobot Disallow: /
The above means
BadRobot is not allowed anywhere.
Please note that there is no guarantee that a web robot will adhere to the rules within your robots.txt file
Once complete, upload the robots.txt file to the root directory of your website. This is normally the directory that your home page is in.
This article was possible thanks to robotstxt.org