Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
XML sitemap generator only crawling 20% of my site
-
Hi guys,
I am trying to submit the most recent XML sitemap but the sitemap generator tools are only crawling about 20% of my site. The site carries around 150 pages and only 37 show up on tools like xml-sitemaps.com. My goal is to get all the important URLs we care about into the XML sitemap.
How should I go about this?
Thanks
-
I believe it's not a significant issue if the sitemap encompasses the core framework of your website. As long as the sitemap is well-organized, omitting a few internal pages is acceptable since Googlebot will crawl all pages based on the sitemap. Take a look at the <a href="https://convowear.in">example page</a> that also excludes some pages, yet it doesn't impact the site crawler's functionality.
-
Yes Yoast on WordPress works fine for sitemap generation. I would also recommend that. Using on all of my blog sites.
-
If you are using WordPress then I would recommend to use Yoast plugin. It generates sitemap automatically regularly. I am also using it on my blog.
-
I'm using Yoast SEO plugin for my website. It generates the Sitemap automatically.
-
My new waterproof tent reviews blog facing the crawling problem. How can I fix that?
-
use Yoast or rankmath ot fix it
آموزش سئو در اصفهان https://faneseo.com/seo-training-in-isfahan/
-
Patrick wrote a list of reasons why Screaming Frog might not be crawling certain pages here: https://moz.com/community/q/screamingfrog-won-t-crawl-my-site#reply_300029.
Hopefully that list can help you figure out your site's specific issue.
-
This doesn't really answer my question of why I am not able to get all links into the XML sitemap when using xml sitemap generators.
-
I think it's not a big deal if the sitemap covers the main structure of your site. If your sitemap is constructed in a really decent structure, then missing some internal pages are acceptable because Googlebot will crawl all of your pages based on your site map. You can see the following page which also doesn't cover all of its pages, but there's no influence in terms of site crawler.
-
Thanks Boyd but unfortunately I am still missing a good chunk of URLs here and I am wondering why? Do those check on internal links in order to find these pages?
-
Use Screaming Frog to crawl your site. It is free to download the software and you can use the free version to crawl up to 500 URLs.
After it crawls your site you can click on the Sitemaps tab and generate an XML sitemap file to use.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
[Very Urgent] More 100 "/search/adult-site-keywords" Crawl errors under Search Console
I just opened my G Search Console and was shocked to see more than 150 Not Found errors under Crawl errors. Mine is a Wordpress site (it's consistently updated too): Here's how they show up: Example 1: URL: www.example.com/search/adult-site-keyword/page2.html/feed/rss2 Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword/page2.html Example 2 (this surprised me the most when I looked at the linked from data): URL: www.example.com/search/adult-site-keyword-2.html/page/3/ Linked From: www.example.com/search/adult-site-keyword-2.html/page/2/ (this is showing as if it's from our own site) http://a-spammy-adult-site.com/search/adult-site-keyword-2.html Example 3: URL: www.example.com/search/adult-site-keyword-3.html Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword-3.html How do I address this issue?
Intermediate & Advanced SEO | | rmehta10 -
Would you rate-control Googlebot? How much crawling is too much crawling?
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day. A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors. I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is. Questions to Enterprise SEOs: *Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached. *We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have. Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back? *What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate. Thanks
Intermediate & Advanced SEO | | lzhao0 -
SEO site Review
Does anyone have suggestions on places that provide in depth site / analytics reviews for SEO?
Intermediate & Advanced SEO | | Gordian0 -
Where is the best place to put a sitemap for a site with local content?
I have a simple site that has cities as subdirectories (so URL is root/cityname). All of my content is localized for the city. My "root" page simply links to other cities. I very specifically want to rank for "topic" pages for each city and I'm trying to figure out where to put the sitemap so Google crawls everything most efficiently. I'm debating the following options, which one is better? Put the sitemap on the footer of "root" and link to all popular pages across cities. The advantage here is obviously that the links are one less click away from root. Put the sitemap on the footer of "city root" (e.g. root/cityname) and include all topics for that city. This is how Yelp does it. The advantage here is that the content is "localized" but the disadvantage is it's further away from the root. Put the sitemap on the footer of "city root" and include all topics across all cities. That way wherever Google comes into the site they'll be close to all topics I want to rank for. Thoughts? Thanks!
Intermediate & Advanced SEO | | jcgoodrich0 -
Merging Sites: Will redirecting the old homepage to an internal page on the new site cause issues?
I've ended up with two sites which have similar content (but not duplicate) and target similar keywords, rather than trying to maintain two sites I would like to merge the sites together. The old site is more of a traditional niche site and targets a particular set of keywords on its homepage, the new site is more of an authority site with a magazine type homepage and targets the same set of keywords from an internal page. My question is: Should I redirect the old site's homepage to the relevant internal page on the new website...
Intermediate & Advanced SEO | | lara_dar
...or should I redirect the old site's homepage to the new site's homepage? (the old site's homepage backlinks are a mixture of partial match keyword anchor text, naked URLs and branded anchor text) I am in two minds (a & b!) (a) Redirecting to the internal page would be great for ranking as there are some decent backlinks and the content is similar (b) But usually when you do a 301 redirect the homepage usually directs to the new homepage and some of the old site's links are related to the domain rather than the keyword (e.g. http://www.site.com) and some people will be looking for the site's homepage. What do you think? Your help is much appreciated (and hope this makes sense...!)0 -
Micro sites?
Hi, I have been speaking to seo firms regarding strategies and they mentioned setting up micro sites under domains that are relevant. i.e setting up armanidoamin.co.uk and we use it as a blog type site to update all info, product reviews, news relating to armani. Whats peoples thoughts on this? Does it work? Is it worth the effort? Im not so sure but obviously looking for ideas. Cheers
Intermediate & Advanced SEO | | YNWA0 -
XML Sitemap for classifieds
I have seeon some trends for sites which do not even use XML sitemp and robots e.g. see this site. How do you see if sitemap is not used. Also for classified websites, should ad pages be included in sitemap because after certain duration those ads will be deleted and google might not be able to crawl. What do you suggest about XML sitemap for classified website.
Intermediate & Advanced SEO | | MozAddict0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0