If you are thinking you have developed a truly great keyword-rich-unique-content fully optimized website for search engines and an attracting site for visitors - that's fine, but do you know you are missing something? A robots.txt file. Did you include it? By way do you know what's importance of a robots.txt file?Success of big companies lies in keeping their confidential data a secret, hidden from all. They tell world something and do something. This enables them to execute their future course of action easily and change plans according to situation. Job of robots.txt file is same. It can or cannot allow a search engine to visit some or all of your web pages. Of course a human visitor is free to visit these pages. That being case, for search engines your website may be different than what a visitor is seeing. If you think one or some of pages/files aren't good enough to be visited by a particular search engine or engines you can do it. Although this is not recommended - your website should be made in such a way it should not shy away from search engines. Nevertheless its always better to know basics of writing robots.txt file. It will help you. We will discuss farther down - robots.txt file is important. I repeat again - don't make pages you think should be hidden from search engines. If any search engine think you are up to some tricks, it may panelize your site causing a no-rank - in worst case for ever!
Every search engine has a "robot" (a software program) that does job of visiting a website. Their purpose is to "know" website, what it is all about, gather all information about it etc. Search engine robots gather this information and bring them back to their databases to show them in their search results. So, if your site is not there in their database it never shows up in search results.
Web Robots are sometimes referred to as Web Crawlers, or Spiders. Therefore process of a robot visiting your website is called "Spidering" or "Crawling". When somebody says "the search engines have spidered my website," it means search engine robots have visited their website. This robot is known by a name and has an independent IP address. This IP address is of no importance to us, but knowing their names will help since this name will be used when we create a robots.txt file. This is why file is called "robots.txt." Given below is list of robots of some of very popular search engines:
Search Engine - Robot Alexa.com - ia_archiver Altavista.com - Scooter (Bought by Yahoo) UK.Altavista.com - AltaVista-Intranet (Bought by Yahoo) Alltheweb.com - FAST-WebCrawler (Bought by Yahoo) Excite.com - ArchitextSpider Euroseek.net - Arachnoidea Gendoor.com (Genealogical Search Engine) - GenCrawler Google.com - Googlebot (http://www.google.com/bot.html) Hotbot.com (uses Inktomi's robot) - Slurp Inktomi.com Slurp - (slurp@inktomi.com) (Bought by Yahoo) Infoseek.com - UltraSeek Looksmart.com - MantraAgent Lycos.com - Lycos_Spider_(T-Rex) Northernlight.com - Gulliver Nationaldirectory.com - NationalDirectory-SuperSpider UKSearcher.co.uk - UK Searcher Spider
Writing Robots.txt:
Let's learn to write robots command. Note that there are two ways to write robots command. One is to include all commands in a text file called "robots.txt" and another is to write robots command in meta tag.
We will learn both ways of writing robots command.
Writing robots command in Meta tag:
There are 4 things you can tell a search engine robot when it requests (visits) your page:
1) Do not index this page - search engines will not index page. 2) Do not follow any links on this page - search engines will not follow links included in page, i.e. they will not index any page that this page links to. 3) Do index this page - search engines will index page. 4) Do follow links - search engines will index pages that this page links to.
Note that "index" is different than "spider". A search engine first spiders a page and then indexes it. Indexing is giving a certain importance to page on basis of its content, information, meta tags, link popularity with respect to searched keyword. All this is decided at run time. When you tell search engines not to index a page, it means they know that "certain" page exists but do not rank them. That is, a no-index page will never be shown in their search results. This in any case does not mean a no-index page will not get visitors, it might get visitors indirectly from a page which links to it. Yes, no direct visitors from search engines.
Suppose you want search engines to index and also index (follow) its linked pages then include following command in Meta Tag:
Suppose you want search engines to index a page but not follow its links then include following command in Meta Tag:
Suppose you do not want search engines to index a page but follow its links then include following command in Meta Tag:
Suppose you do not want search engines to either index or follow links of a particular page then include following command in Meta Tag:
Note: Google makes a "Cached" of every file it spiders. It's a small snap shot of page. Want to stop Google from doing so? Include following Meta Tag:
Like any meta tag above written tags should be placed in HEAD section of an HTML page:
your title Creating robots.txt file:
A robots.txt file is an independent file and should be written in a plain text editor like Notepad. Do not use MS-Word or any other text editor to create robots.txt. The bottom line is this file should have extension ".txt" else it will be useless.
Let's begin. Open Notepad (it comes free with Microsoft Windows) and save file with name "robots.txt". Make sure that extension is .txt.
By way, did you note we did not use name of any robot in meta tag! What does it indicate? Simple - by using meta you direct all search engines to do something or not do something on a page. You do not have control over any one search engine. The solution is robots.txt.
It can always happen you do not want a particular search engine to index a page for certain reasons. In that case using a robots.txt file will help. Even though I do not recommend such a thing. The search engines get you traffic, why hate them. Stop them from doing their job and they hate you. I again repeat keep your pages smart for search engines and welcome them. Fine, then why take trouble to learn robots.txt? Why should you include a robots.txt file at all?