Adsense Alternative

Banned from Adsense? Tired of them changing the rules for their benefit? Me too. Check out what I'm using instead.
 
Randomized RSS Feeds

Learn how to get several of your large sites indexed by creating dynamic RSS feeds.

Writing A Good Robots.txt

I’ve been doing online marketing for about two years now, and I honestly never really thought about disallowing pages in my robots.txt. I was under the impression that the more pages you have in the search engines, the better.

However, Shoemoney recently revealed that Aaron helped him get a bunch of his blog posts out of Google’s supplemental index by changing his robots.txt file.

As Aaron points out, your site only has so much authority to pass around to internal pages. If you’re linking to pages that don’t offer much value, have little content on them, or are not SEO’d then you are wasting that authority.

With this in mind, the goal of a robots.txt should be to present your best pages to the search engines and have them ignore the others.

As an example, let’s analyze shoemoney’s robots.txt file with Aaron’s changes:

User-Agent: Googlebot
Disallow: /link.php
Disallow: /gallery
Disallow: /gallery2
Disallow: /gallery2/
Disallow: /gallery/
Disallow: /category/
Disallow: /page/
Disallow: /pages/
Disallow: /feed/
Disallow: /feed

The first thing this file blocks is link.php. Looking at shoemoney.com, link.php is used to cloak affiliate links, so it only makes sense to stop Google from wasting time trying to follow those links.

The next four entries block his image galleries. If you look at the gallery pages you’ll notice that most of the images don’t have descriptions associated with them, and the images themselves don’t have meaningful alt tags. Google can’t tell what any of these pages are about so having them in the index isn’t doing anybody any good.

Next is the /category/, /page, and /pages/ entries, which are a little more interesting. I can only speculate, but my guess is that Aaron suggested these changes to elminate duplicate navigation structures. If we look at the blog’s homepage, there are three different ways to get to the internal pages:

  1. You can follow the category links that are at the bottom of each post. However, if you followed only these links you would not find every post in the blog.
  2. You can follow the Previous Entries link on the bottom of the page and subsequent pages.
  3. You can click on each of the monthly archive links in the bottom of the first sidebar.

It looks like Aaron decided to use the third option and eliminate the others. This frees up alot of time/resources that the spider can use to index more of Shoemoney’s posts.

The last two entries block Googlebot from indexing the RSS feed. The RSS feed is really just a duplication of the homepage, and we would much rather have our nicely laid out homepage in the index than a bunch of XML.

After analyzing these changes I made some similar additions to WageRank’s robots.txt. I don’t really expect to be dominating the SERPS with this month-old blog, but I think it will make a difference in the future.

So, wrapping all this up, a good robots.txt file will present your site’s content and navigation structure in a way that makes it very easy for the spiders to concentrate on what is important. Some pages you should consider blocking are login pages, contact us, terms and conditions, privacy policy, image galleries, etc.

If you don’t think it will rank, don’t let the spiders waste your authority indexing it!



Related Posts
Speak Your Language
Happy Birthday Blue Hat SEO
Leveraging PR From SBS Profiles
Be Like Burger King
Subverting Advertising Networks


Tell Me What You Think...

Comment by Oscar
2007-02-19 04:24:06

I’ll bet he soon allows Googlebot to access those galleries.
Go on think about it.

 
2007-03-14 17:51:09

[…] Writing A Good Robots.Txt […]

 
2007-05-10 00:09:11

[…] Writing A Good Robots.Txt […]

 
 
Comment by mig33Citeureup
2008-06-10 03:58:04

Confuse
saluto, Nizza articolo, ti invitiamo a visitare il mio web.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment

Copyright © 2007 WageRank.com