Should I use meta noindex and robots.txt disallow?

ntcma

Hi, we have an alternate "list view" version of every one of our search results pages

The list view has its own URL, indicated by a URL parameter

I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling

When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled

Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"...

Thanks

ntcma

Hi,

Thanks, I will do some testing to confirm that this behaves how I would like it to

gazzerman1

if all pages are 100#5 not indexed then I would block it in robots.txt, Google's John Muller confirmed to me that Googlebot will continue to crawl every link to check to see if a nofollow or noindex has changed status.

So as a result we blocked our pages with robots.txt and saw a great increases in index/crawl rates on pages we want Google to pay attention to. It also reduces waste in server resources.

However if there are any pages that are index, if you block them in robots.txt then Googlebot will never be able to crawl the link to determine that it should be noindex. This means it could stay in a permanent stage of indexed.

I hope that answers all your questions?

ntcma

When you say:

nofollow will tell the crawlers to not crawl the page

I believe you mean to say that this will tell the crawlers not to crawl the links on the page, the page itself is itself still "crawled" is it not?

But yes, you are right to say, that once robots.txt disallow is in place, the meta tag will not be seen and thus be moot (at which point I may as well take it off).

It would be nice to be able to say "don't crawl this and don't put it in the index"... but is there a way?

Shawn_Huber

noindex only tells the search crawlers to not include the page in the index but still allows for them to crawl the page. nofollow will tell the crawlers to not crawl the page.

robots.txt will accomplish this as well but both I think would be overkill.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Should I use meta noindex and robots.txt disallow?

Browse Questions

Explore more categories

Related Questions

What happens to crawled URLs subsequently blocked by robots.txt?

How many images should I use in structured data for a product?

Utf-8 symbols in the Title or Meta Description?

Meta Robot Tag:Index, Follow, Noodp, Noydir

Robots.txt, does it need preceding directory structure?

Robots.txt: Can you put a /* wildcard in the middle of a URL?

Should I remove Meta Keywords tags?

Robots.txt is blocking Wordpress Pages from Googlebot?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved