Ever wondered how you can control which automated programs to allow or disallow from viewing and crawling your website? Well, I’m going to be published a series of posts that explain how you can do just this. Automated programs, also called, crawlers, spiders, bots and agents are used by search engine companies and intelligence collection websites to gather information about your website. Sometimes having a bot crawl your website is a good idea, for example, googlebot has to crawl your website in order to identify where your pages should be ranked in Google. But of course, there are many other bots and beasties crawling your site that don’t bring any benefit to you.
In addition to contributing to slowing your website down and potentially scrapping and re-suing information from your pages, these bots also abstract and confuse your website visitor stats. Ever looked at your website stats and found that you had 100 visitors today, only to then find 99 of the visitors were bots?
So here is a method for allowing and disallowing bots into your website.
You can disallow all automated bots by adding this:
You can disallow individual bots like this:
Alternatively you can allow only specific bots to crawl your website by adding these references
Simply create and upload a text file called robots.txt to the root of your website domain.
Go to the following website to find a list of strings you can use for crawlers and bots.
I will be writing more on this soon.