How can I get a list of every url of a site in Google's index?

94501

I work on a site that has almost 20,000 urls in its site map. Google WMT claims 28,000 indexed and a search on Google shows 33,000. I'd like to find what the difference is.

Is there a way to get an excel sheet with every url Google has indexed for a site?

Thanks... Mike

KaneJamison

If this is still an issue you're facing, have you checked the sitemap settings to see which page types are getting included? For example, a site with a few thousand tags that are not entered in the sitemap but not yet set to noindex could easily produce extra pages like this.

The next step is parameterization. Anything going on there with search URLs or product URLs? eg ?refid=1235134&q=search+term or ?prod=152134&variant=blue

If you really want to scrape through Google, get a list of your sitemap and scrape queries like "inurl:domain.com/a", "inurl:domain.com/b", "inurl:domain.com/c". etc. This should allow you to dive deeper into the site map to see what Google really has indexed. For URL subfolders with tons of URLs like domain.com/product/a, you'll want to do the same thing at a subfolder level instead of root URLs.

KaneJamison

You can do that with a tool like Scrapebox or Outwit. Go slow, or else you'll need to use proxies to get Google to respond fast enough. As another commenter mentioned, it's probably against TOS.

DJ123

You could probably write a macro to do this, although just because you could doesn't mean you should. I don't think it is advisable because you do not want to violate any terms of use for anyone. That is never a good thing.

94501

Yes, WMT API doesn't have it. The site site:xxxx.com search is where are got one of the two too high numbers. Thanks... Mike

94501

Hi Marijn,

Thanks for the suggestions. 2.5 years of G/A organic landing pages is 10,000 urls.... 1/2 as many as the site map and 1/3rd as many as Google says indexed. On scraping google, do you know of a tool for that?

Thanks... Mike

Kingof5

Might be something you can get from the WMT API.

Also, to really see how many pages are indexed, do a site:xxxx.com search, go to the last page, include omitted results, go to the last page again, and add up how many you have. That's probably the most accurate number.

Martijn_Scheijbeler

Hi Mike,

There a couple of solutions, neither of them provide you with 100% of data. The best would be to export a list of landing pages from Google Analytics or your favorite web analytics tool segmented by organic search/ Google. This would provide you with a list of pages that received traffic via search and so are indexed. If you cross reference them with your sitemaps that might already help you out a bit. Besides that you could crawl and scrape the URLS for a site:xxx.com search.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

How can I get a list of every url of a site in Google's index?

Browse Questions

Explore more categories

Related Questions

How do internal search results get indexed by Google?

Forwarded vanity domains, suddenly resolving to 404 with appended URL's ending in random 5 characters

Is Chamber of Commerce membership a "paid" link, breaking Google's rules?

Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google

Partial Match or RegEx in Search Console's URL Parameters Tool?

404's - Do they impact search ranking/how do we get rid of them?

There's a website I'm working with that has a .php extension. All the pages do. What's the best practice to remove the .php extension across all pages?

Posing QU's on Google Variables "aclk", "gclid" "cd", "/aclk" "/search", "/url" etc

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved