Blocking Dynamic URLs with Robots.txt

AndrewY

Background:

My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page:

www.mysite.com/widgets.html

...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page:

http://www.mysite.com/widgets.html?price=1%2C250

http://www.mysite.com/widgets.html?price=2%2C250

http://www.mysite.com/widgets.html?price=3%2C250

As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations.

Question:

Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry.
To implement, I was going to do the following in Robots.txt:

User-agent: *

Disallow: /*?

Disallow: /*=

....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution?

Thank you!

TaitLarson

If you are happy with any URLs with query strings not being indexed your robots.txt will work fine.

Do any or your URLs with question marks in them have links to them? If so you might want to be careful blocking google from indexing them. I would think you'd lose the benefits those links would pass to your site.

AndrewY

Tait,

Thanks for the answer. I think the canonical tag would be ideal, but in terms of implementation, it would require some substantial code modification to the site / PHP code as I have a lot of categories, and adding this manually to each one would be very time consuming.

Would preventing the spiders from indexing any URLs with a "?" or "&" (which would only be dynamic URLs variations) cause any problems? Or is this just not an ideal best practice?

Thanks!

TaitLarson

I don't know if there's a good solution with robots.txt given your URL structure. However, you could use the rel=canonical link tag in the header to force google to treat many of your URLs the same way. This would help you avoid duplicate content penalties.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Blocking Dynamic URLs with Robots.txt

Browse Questions

Explore more categories

Related Questions

Robots.txt blocked internal resources Wordpress

Mass URL changes and redirecting those old URLS to the new. What is SEO Risk and best practices?

Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google

Duplicate URLs ending with #!

Attack of the dummy urls -- what to do?

How to deal with URLs and tabbed content

Using 2 wildcards in the robots.txt file

Block an entire subdomain with robots.txt?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved