Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?

sparrowdog

I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up.

I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove.

Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.

sparrowdog

Looks like I can only do the first thousand. It's a start though. Thank you for the information.

Many of the URL's on my list, when put in to Google search, are giving me 80-100 other variants I can remove by hand.

http://www.mathewporter.co.uk/list-a-domains-indexed-pages-in-google-docs/ for anyone else following.

sparrowdog

Finally getting around to doing this and noticed that when I change the start number to anything above 900, it doesn't work - ie: it's only letting me look at the first 1,000 results for some reason.

The list of 1,000 has given me some good URL's to search off for the filtering thingy that was generating all the garbage URL's but I'd love to get past 1,000 if I can.

Does anyone know how?

sparrowdog

Correct. I have gone in to URL Parameters already and set them to Crawl 'No URLs' for those we don't want crawled.

We haven't added those parameters listed in there in to the robots.txt file yet, but I will do that now. I had an initial consult today and we ran way over time when we discovered all this stuff so I have another appointment in a couple of weeks.

We have a sitemap of all the category pages and relevant static pages on the site already and Google has those indexed nicely. We just need to get rid of the 240,000 pages it has indexed that we don't want in there (frightening I know - it's a really high number).

I greatly appreciate you taking the time to respond. Thank you.

sparrowdog

Thanks. There's a lot of auto-generated content, duplicate pages and we've set the robots.txt file up to exclude a large number of them. Now we wait.

Very helpful and greatly appreciated. Thank you.

DeanAndrews

Hi,

I'm going to assume that as you have said it's an e-commerce site that the URL parameters are created by product variations, filters, sorts etc. If so then you must already be seeing those parameters on the URL of your site as you navigate and in your analytics or search results.

Your SEO specialist should easily be able to add those parameters to the robots file. Then personally I would resubmit a site map for completeness and wait for results to take effect.

FedeEinhorn

Joanne,

I'm afraid there's no way to know which pages are actually indexed from your Webmaster Tools. You can use a simple search in Google: site:domain.com and it will list "all" your indexed pages, however, there's no way to export that as a report.

You can create a report using some "hack". Login to your Google Drive, create a new spreadsheet and use the following command to populate rows:

=importXml("https://www.google.com/search?q=site:www.yourdomainnamehere.com&num=100&start=1"; "//cite")

This will load the first 100 results. You will need to repeat the process for every 1000 results you have, changing the last variable: "start=1" to "start=100" and then "start=200", etc (you see where I'm going). This could really be a pain in the butt for your site's size.

My recommendation is you navigate your own site, decide which pages should be removed and then create the robots.txt regardless what google has indexed. Once you complete your robots.txt, it will take a few weeks (or even a month) to have the blocked pages removed.

Hope that helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?

Browse Questions

Explore more categories

Related Questions

Trying to get Google to stop indexing an old site!

Google Is Indexing my 301 Redirects to Other sites

No Index thousands of thin content pages?

How do you check the google cache for hashbang pages?

Best way to remove full demo (staging server) website from Google index

Links from non-indexed pages

How is Google crawling and indexing this directory listing?

Wordpress blog in a subdirectory not being indexed by Google

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved