Automated search engine robots, sometimes called "spiders" or "crawlers", are
seekers of web pages. How do they work? What is it they really do? Why are they important?You'd think with all
fuss about indexing web pages to add to search engine databases, that robots would be great and powerful beings. Wrong. Search engine robots have only basic functionality like that of early browsers in terms of what they can understand in a web page. Like early browsers, robots just can't do certain things. Robots don't understand frames, Flash movies, images or JavaScript. They can't enter password protected areas and they can't click all those buttons you have on your website. They can be stopped cold while indexing a dynamically generated URL and slowed to a stop with JavaScript navigation.
How Do Search Engine Robots Work? Think of search engine robots as automated data retrieval programs, traveling
web to find information and links.
When you submit a web page to a search engine at
"Submit a URL" page,
new URL is added to
robot's queue of websites to visit on its next foray out onto
web. Even if you don't directly submit a page, many robots will find your site because of links from other sites that point back to yours. This is one of
reasons why it is important to build your link popularity and to get links from other topical sites back to yours.
When arriving at your website,
automated robots first check to see if you have a robots.txt file. This file is used to tell robots which areas of your site are off-limits to them. Typically these may be directories containing only binaries or other files
robot doesn't need to concern itself with.
Robots collect links from each page they visit, and later follow those links through to other pages. In this way, they essentially follow
links from one page to another. The entire World Wide Web is made up of links,
original idea being that you could follow links from one place to another. This is how robots get around.
The "smarts" about indexing pages online comes from
search engine engineers, who devise
methods used to evaluate
information
search engine robots retrieve. When introduced into
search engine database,
information is available for searchers querying
search engine. When a search engine user enters their query into
search engine, there are a number of quick calculations done to make sure that
search engine presents just
right set of results to give their visitor
most relevant response to their query.
You can see which pages on your site
search engine robots have visited by looking at your server logs or
results from your log statistics program. Identifying
robots will show you when they visited your website, which pages they visited and how often they visit. Some robots are readily identifiable by their user agent names, like Google's "Googlebot"; others are bit more obscure, like Inktomi's "Slurp". Still other robots may be listed in your logs that you cannot readily identify; some of them may even appear to be human-powered browsers.