How to extract URLs from a site (without bringing the server down!)

neooptic

Hi everybody.

One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.

However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.

Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!

Dr-Pete

Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):

http://www.screamingfrog.co.uk/seo-spider/

It's a good tool, and nice to have around, IMO.

Dan-Petrovic

Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?

AlanMosley

why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv

neooptic

Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?

YannickVeys

Scrape Google?
Make your own scraper and keep the requests per second really low ?
Maybe the site has an automated sitemap somewhere ?
Google webmaster tools -> download "internal links" table

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

How to extract URLs from a site (without bringing the server down!)

Browse Questions

Explore more categories

Related Questions

Folders in url structure?

Help Setting Up 301 Redirects from Coldfusion Site to Wordpress Site.

Tools/Software that can crawl all image URLs in a site

URL Structure On Site - Currently it's domain/product-name NOT domain/category/product name is this bad?

Strange URL's for client's site

Does my "spam" site affect my other sites on the same IP?

Urls with or without .html ending

Using a third party server to host site elements

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved