Using the Power of Robots.txt

De BISAWiki

Edição feita às 14h35min de 10 de julho de 2013 por TamaqsqpkwulwqGramberg (disc | contribs)
(dif) ← Versão anterior | ver versão atual (dif) | Versão posterior → (dif)

Sometimes, we may want search-engines to not index certain elements of the site, as well as prohibit other SE from the site completely.

This is the place where a simple, little 2 line text file called robots.txt comes in. I found out about web page by browsing the Internet. For further information, please have a look at: site. For further information, we know you check-out: click. Browse here at home page to check up the purpose of it.

Once we have a web site up and running, we need certainly to ensure that all visiting search-engines can access all the pages we want them to check out.

Sometimes, we might want search engines never to list certain areas of the site, or even ban other SE from the site completely.

This is the place where a simple, little 2 line text file called robots.txt comes in.

Robots.txt resides in your internet sites primary directory (on LINUX systems this really is your /public_html/ directory), and looks something such as the following:

User-agent: *

Disallow:

The first line controls the robot that will be visiting your site, the 2nd line controls if they are allowed in, or which elements of the site they're not allowed to visit

If you'd like to take care of multiple bots, then basic repeat the aforementioned lines.

So an example:

User-agent: googlebot

Disallow:

User-agent: askjeeves

Disallow: /

This will allow Goggle (user-agent name GoogleBot) to visit every service and page, while at the same time frame banning Ask Jeeves from the website fully.

To get a reasonably up to date list of robot individual names this visit if you like to allow every robot to index every page of your site, its still very advisable to place a robots.txt report in your site. It'll stop your problem logs filling with articles from se's wanting to access your robots.txt file that doesnt occur.

To learn more on robots.txt see, the entire listing of sources about robots.txt at.

Ferramentas pessoais