Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to create site map for large site (ecommerce type) that has 1000's if not 100,000 of pages.
-
I know this is kind of a newbie question but I am having an amazing amount of trouble creating a sitemap for our site Bestride.com. We just did a complete redesign (look and feel, functionality, the works) and now I am trying to create a site map. Most of the generators I have used "break" after reaching some number of pages. I am at a loss as to how to create the sitemap. Any help would be greatly appreciated!
Thanks
-
I agree with Chris. With such large websites it would be advisable having a sitemap index and then splitting the index into various individual indexes such as Pages, Products, Categories, images, media, tags etc.
-
The easiest thing i can think of is to write a script that works with your dispatcher to create a site map. The format I would use is add the page and all of the "product images" on the page to the map and move to the next. At the same time I would use an auto increment variable to keep track of how many lines you have written. When you get around 50k, write out the name of the next site map file that the program will create and have them chained together this way.
-
That's a great help Chris, thank you! And thanks to all for your help!
-
Typically, a sitemap is going to include every page on the site. As Francesca said, each sitemap can be up to 50K urls and if you need multiple sitemaps then you create a sitemap index that points to the rest of the sitemaps.
-
Thanks for the feedback!
I will look into screamingfrog for sure.
@Lesley - we are using a custom platform (in house) so we don't have that functionality. The issue is that we have a lot of inventory (millions) of cars. We have built (and are releasing new functionality today) to provide internal links so that Google can crawl all the inventory easily (users can too :). My question about sitemaps has boiled down to this: Do we need to build the sitemap to include every single page (all the inventory) or do we provide a "map" so that google can find the top pages and then crawl the inventory from there. Again the site is bestride.com. If anyone wants to take a look at the site, that would be fantastic!
Thanks
-
Are you using a custom platform or an off the shelf e-commerce package? Most off the shelf packages actually have a module that can create a site map and a lot have it where you can cron it too.
-
Of course, you can also use the moz's crawl test report at http://pro.moz.com/tools/crawl-test
-
Hi Kristin,
Each sitemap.xml can support maximum 50.000 URLs. So, If you have a site with more than 100K, It'd be better to create 2 or 3 o 4 etc sitemaps.xml in order to contain all URLs. Hope it is useful.
Kind regards!
Francesca
-
You can use screamingfrog to create your sitemap. You just need to license it for crawl more than 500 URI.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?
Does anyone know the linking of hashtags on Wix sites does it negatively or positively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please? For example at the bottom of this blog post https://www.poppyandperle.com/post/face-painting-a-global-language the hashtags are linked, but they don't go to a page, they go to search results of all other blogs using that hashtag. Seems a bit of a strange approach to me.
Technical SEO | | Mediaholix0 -
If I'm using a compressed sitemap (sitemap.xml.gz) that's the URL that gets submitted to webmaster tools, correct?
I just want to verify that if a compressed sitemap file is being used, then the URL that gets submitted to Google, Bing, etc and the URL that's used in the robots.txt indicates that it's a compressed file. For example, "sitemap.xml.gz" -- thanks!
Technical SEO | | jgresalfi0 -
Why Can't Googlebot Fetch Its Own Map on Our Site?
I created a custom map using google maps creator and I embedded it on our site. However, when I ran the fetch and render through Search Console, it said it was blocked by our robots.txt file. I read in the Search Console Help section that: 'For resources blocked by robots.txt files that you don't own, reach out to the resource site owners and ask them to unblock those resources to Googlebot." I did not setup our robtos.txt file. However, I can't imagine it would be setup to block google from crawling a map. i will look into that, but before I go messing with it (since I'm not familiar with it) does google automatically block their maps from their own googlebot? Has anyone encountered this before? Here is what the robot.txt file says in Search Console: User-agent: * Allow: /maps/api/js? Allow: /maps/api/js/DirectionsService.Route Allow: /maps/api/js/DistanceMatrixService.GetDistanceMatrix Allow: /maps/api/js/ElevationService.GetElevationForLine Allow: /maps/api/js/GeocodeService.Search Allow: /maps/api/js/KmlOverlayService.GetFeature Allow: /maps/api/js/KmlOverlayService.GetOverlays Allow: /maps/api/js/LayersService.GetFeature Disallow: / Any assistance would be greatly appreciated. Thanks, Ruben
Technical SEO | | KempRugeLawGroup1 -
Should I disavow links from pages that don't exist any more
Hi. Im doing a backlinks audit to two sites, one with 48k and the other with 2M backlinks. Both are very old sites and both have tons of backlinks from old pages and websites that don't exist any more, but these backlinks still exist in the Majestic Historic index. I cleaned up the obvious useless links and passed the rest through Screaming Frog to check if those old pages/sites even exist. There are tons of link sending pages that return a 0, 301, 302, 307, 404 etc errors. Should I consider all of these pages as being bad backlinks and add them to the disavow file? Just a clarification, Im not talking about l301-ing a backlink to a new target page. Im talking about the origin page generating an error at ping eg: originpage.com/page-gone sends me a link to mysite.com/product1. Screamingfrog pings originpage.com/page-gone, and returns a Status error. Do I add the originpage.com/page-gone in the disavow file or not? Hope Im making sense 🙂
Technical SEO | | IgorMateski0 -
When creating parent and child pages should key words be repeated in url and page title?
We are in the direct mail advertising business: PrintLabelAndMail.com Example: Parent:
Technical SEO | | JimDirectMailCoach
Postcard Direct Mail Children:
Postcard Mailings
Postcard Design
Postcard Samples
Postcard Pricing
Postcard Advantages should "postcard" be repeated in the URL and Page Title? and in this example should each of the 5 children link back directly to the parent or would it be better to "daisy chain" them using each as parent for the next?0 -
Volusion eCommerce Site 302s and Canonicalization
There have been a couple other threads concerning this topic so I apologize, but I have an iteration on the main question that has not been answered. Crawl Diagnostics is giving me a bunch of 302 temporary redirect notices. For example, here is a page title URL:
Technical SEO | | anneoaks
http://store.in-situ.com/Rugged-Conductivity-Meter-p/0073380.htm and here is the redirect:
http://store.in-situ.com/Rugged-Conductivity-Meter-p/tape-clt-meter.htm?1=1&CartID=0 The first link is actually a child product of:
http://store.in-situ.com//Rugged-Conductivity-Meter-p/tape-clt-meter.htm Volusion tech support told me they believe most of them are meta redirects but could not find any documentation on them. All the other threads concerning this have said to either change the 302s to 301s, which I don't think is possible, or to add a nofollow tag. My question is do I need to do anything if both those pages are canonical to the parent product? Should I be passing on the linkjuice if neither of those pages are of high value?0 -
Are Collapsible DIV's SEO-Friendly?
When I have a long article about a single topic with sub-topics I can make it user friendlier when I limit the text and hide text just showing the next headlines, by using expandable-collapsible div's. My doubt is if Google is really able to read onclick textlinks (with javaScript) or if it could be "seen" as hidden text? I think I read in the SEOmoz Users Guide, that all javaScript "manipulated" contend will not be crawled. So from SEOmoz's Point of View I should better make use of old school named anchors and a side-navigation to jump to the sub-topics? (I had a similar question in my post before, but I did not use the perfect terms to describe what I really wanted. Also my text is not too long (<1000 Words) that I should use pagination with rel="next" and rel="prev" attributes.) THANKS for every answer 🙂
Technical SEO | | inlinear0 -
Blank pages in Google's webcache
Hello all, Is anybody experiencing blanck page's in Google's 'Cached' view? I'm seeing just the page background and none of the content for a couple of my pages but when I click 'View Text Only' all of teh content is there. Strange! I'd love to hear if anyone else is experiencing the same. Perhaps this is something to do with the roll out of Google's updates last week?! Thanks,
Technical SEO | | A_Q
Elias0