Using the Energy of Robots.txt
De BISAWiki
Sometimes, we might want search-engines not to list certain elements of the site, as well as prohibit other SE in the site all together.
This is the place where a simple, little 2 line text file called robots.txt will come in.
Once we've a web site up and running, we should make sure that all visiting search-engines can access all the pages we want them to look at.
Sometimes, we might want search-engines never to index certain areas of the site, and on occasion even ban other SE from the site altogether.
This is where a simple, little 2-line text file called robots.txt is available in.
Robots.txt rests within your websites main directory (on LINUX systems this is your /public_html/ directory), and looks some thing just like the following:
User-agent: *
Disallow:
The initial line controls the robot that'll be visiting your site, the second line controls if they are allowed in, or which areas of the site they're maybe not allowed to visit
If you prefer to handle multiple bots, then basic repeat the above mentioned lines.
Therefore an example:
User-agent: googlebot
Disallow:
User-agent: askjeeves
Disallow: /
This will enable Goggle (user-agent name GoogleBot) to see every page and directory, while at the sam-e time banning Ask Jeeves in the site fully. Discover further about roseflute2's Profile Armor Games by browsing our thrilling web page.
To locate a fairly updated set of software user names this visit http://www.robotstxt.org/wc/active/html/index.html
Its still very advisable to put a robots.txt file on your site, even though you need to let every robot to index every page of your site. It'll stop your mistake records filling up with items from search-engines attempting to access your robots.txt file that doesnt occur.
To find out more on robots.txt see, the total list of sources about robots.txt at http://www.websitesecrets101.com/robotstxt-further-reading-resources.