What is a Robots.txt File?
The robots.txt file or ‘The Robots Exclusion Protocol’ is a standard protocol which gives instructions about your website to search engine bots/spiders. It is a text (hence the .txt) file, think notepad, that is placed within the website heirarchy.
The robots.txt file is mainly used to block or disallow bots/spiders from visiting a folder or files on your website that you don’t want to have indexed in the search engine results.
Example of a Robots.txt File:
This code tells all robots, or User-agents, not to enter four directories (cgi-bin, images, tmp, and private) of a website:
How it Relates To SEO
When Google, Bing, and Yahoo are deciding what websites to rank in their search engine they use data that is obtained by their “Spiders” or “Robots”. These robots are sent through your website to extract data that helps the search engines determine what your website is about. Before the robot starts to search your website it will look for the Robots.txt file. Giving the webmaster the ability to tell the robot what to look for and what not to look for.
There are many files within the root of the website that have nothing to do with the subject matter, or Keywords, of the website so there is no point in having the search engine robots document those files. For example, the files that make up a WordPress site. While those files are important and are what make the website possible, they probably do not relate to what you want your site to rank for on Google.
The Robots.txt file is simliar to a Sitemap.xml file that I will cover in my next blog post.