Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Include or exclude noindex urls in sitemap?
-
We just added tags to our pages with thin content.
Should we include or exclude those urls from our sitemap.xml file? I've read conflicting recommendations.
-
Hi vcj and the rest of you guys
I would be very interested in learning what strategy you actually went ahead with, and the results. I have a similar issue as a result of pruning, and removing noindex pages from the sitemap makes perfect sense to me. We set a noindexed follow on several thousand pages without product descriptions/thin content and we have set things up so when we add new descriptions and updated onpage elements, the noindex is automatically reversed; which sounds perfect, however hardly any of the pages to date (3000-4000) are indexed, so looking for a feasible solution for exactly the same reasons as you.
We have better and comparable metrics and optimization than a lot of the competition, yet rankings are mediocre, so looking to improve on this.
It would be good to hear your views
Cheers
-
I'm aware of the fact Google will get to them sooner or later.
The recommendation from Gary Illyes (from Google), as mentioned in this post, was the reason for my asking the question. Not trying to outsmart Google, just trying to work within their guidelines in the most efficient way possible.
-
Just to put things into perspective,
if these URLs are all already indexed and you have used "noindex" on those pages, sooner or later google will re-crawl these pages and they will be removed. You may want to remove them from the index ASAP for some reason, but it wont really change anything. Because Google will not deindex your noindex pages just because they are in your sitemap.xml.
Google deindexes a sie only when it is time to re-crawl the page.Google never recommends using noindex in sitemaps, and google wont suggest that in their blocking search indexing results guidelines. Also Google indicates the following:
"Google will completely drop the page from search results, even if other pages link to it. If the content is currently in our index, we will remove it after the next time we crawl it. (To expedite removal, use the Remove URLs tool in Google Webmaster Tools.)"But hey! every SEO has its own take.. Some tend to try outsmart Google some not..
Good luck
-
That opens up other potential restrictions to getting this done quickly and easily. I wouldn't consider it best practices to create what is essentially a spam page full of internal links and Googlebot will likely not crawl all 4000 links if you have them all there. So now you'd be talking about maybe making 20 or so thin, spammy looking pages of 200+ internal links to hopefully fix the issue.
The quick, easy sounding options are not often the best option. Considering you're doing all of this in an attempt to fix issues that arose due to an algorithmic penalty, I'd suggest trying to follow best practices for making these changes. It might not be easy but it'll lessen your chances of having done a quick fix that might be the cause, or part of, a future penalty.
So if Fetch As won't work for you (considering lack of manpower to manually fetch 4000 pages), the sitemap.xml option might be the better choice for you.
-
Thanks, Mike.
What are your thoughts on creating a page with links to all of the pages we've Noindexed, doing a Fetch As and submitting that URL and its linked pages? Do you think Google would dislike that?
-
You could technically add them to the sitemap.xml in the hopes that this will get them noticed faster but the sitemap is commonly used for the things you want Google to crawl and index. Plus, placing them in the sitemap does not guarantee Google is going to get around to crawling your change or those specific pages. Technically speaking, doing nothing and jut waiting is equally as valid. Google will recrawl your site at some point. Sitemap.xml only helps if Google is crawling you to see it. Fetch As makes Google see your page as it is now which is like forcing part of a crawl. So technically Fetch As will be the more reliable, quicker choice though it will be more labor-intensive. If you don't have the man-hours to do a project like that at the moment, then waiting or using the Sitemap could work for you. Google even suggests using Fetch As for urls you want them to see that you have blocked with meta tags: https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466
-
There are too many pages to do that (unless we created a page with links to all of the Noindexed pages, then asked Google to crawl that and all linked pages, though that seems like it might be a bad approach). It's an ecommerce website and we Noindexed nearly 4,000 pages that had thin or duplicate content (manufacturer descriptions, no description on brand page, etc) and had no organic traffic in the past 90 days.
This site was hit by Panda in September 2014 and isn't ranking for things it should be – pages with better backlink profiles, higher DA/PA, better content, etc. than our competitors. Our thought is we're not ranking because of a penalty against thin/duplicate content. So we decided to Noindex these pages, improve the content on products that are selling and getting traffic, then work on improving pages that we've Noindex before switching them back to Index.
Basically following recommendations from this article: https://moz.com/blog/pruning-your-ecommerce-site
-
If the pages are in the index and you've recently added a NoIndex tag with the express purpose of getting them removed from the index, you may be better served doing crawl requests in Search Console of the pages in question.
-
Thanks for your response!
I did some more digging. This seems to contradict your suggestion:
https://twitter.com/methode/status/653980524264878080
If the goal is to have these pages removed from the index, and having them in the sitemap means they'll be picked up sooner by Google's crawler, then it seems to make sense that they should be included until they're removed from the index.
Am I misinterpreting this?
-
Hi
The reason you submit a sitemap to a searchengine is to ease and aid in crawling process for the pages that you want to get indexed. It speeds up the crawling process and lets search engine to discover all those pages that has no inner linkings to it etc..
A "noindex" tag does the opposite.
So no, you should not include noindex pages inside your sitemap files.
In general you should avoid pages that are not returning 200 also.Good luck
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Same URL for languages sub-directories
Hi All, I have a main domain and 9 different subdirectories for languages, example: www.example.com/page.html www.example.com/uk/page-uk.html www.example.com/es/page-es.html we are implementing hreflang tags for the languages, but we are thinking to get rid of the dashes on the languages URL: -uk or -es, so it will be: www.example.com/page.html www.example.com/uk/page.html www.example.com/es/page.hrml would this be a problem? to have same page names even if they are in different subdirectories? would we need to add canonical tags, at lease for the main domain URLs? www.kornferry.com/page.html Thank you, Rachel
Technical SEO | | RaquelSaiz0 -
Sitemap indexed pages dropping
About a month ago I noticed my pages indexed from my sitemap are dropping.There are 134 pages in my sitemap and only 11 are indexed. It used to be 117 pages and just died off quickly. I still seem to be getting consistant search traffic but I'm just not sure whats causing this. There are no warnings or manual actions required in GWT that I can find.
Technical SEO | | zenstorageunits0 -
Should I include tags in sitemap?
Hello All, I was wondering if you should include tags and categories in your sitemap. In the past on previous blogs I have always left tags and categories out. The reason for this is a good friend of mine who has been doing SEO for a long time and inhouse always told me that this would result in duplicate content. I thought that it would be a great idea to get some input from the SEOmoz community as this obviously has a big affect on your blog and the number of pages indexed. Any help would be great. Thanks, Luke Hutchinson.
Technical SEO | | LukeHutchinson1 -
Host sitemaps on S3?
Hey guys, I run a dynamic web service and I will start building static sitemaps for it pretty soon. The fact that my app lives in a multitude of servers doesn't make it easy to distribute frequently updated static files throughout the servers. My idea was to host the files in AWS S3 and point my robots.txt sitemap directive there. I'll use a sitemap index so, every other sitemap will be hosted on S3 as well. I could dynamically mirror the content from the files in S3 through my app, but that would be a little more resource intensive than just serving the static files from a common place. Any ideas? Thanks!
Technical SEO | | tanlup0 -
Best Practices for adding Dynamic URL's to XML Sitemap
Hi Guys, I'm working on an ecommerce website with all the product pages using dynamic URL's (we also have a few static pages but there is no issue with them). The products are updated on the site every couple of hours (because we sell out or the special offer expires) and as a result I keep seeing heaps of 404 errors in Google Webmaster tools and am trying to avoid this (if possible). I have already created an XML sitemap for the static pages and am now looking at incorporating the dynamic product pages but am not sure what is the best approach. The URL structure for the products are as follows: http://www.xyz.com/products/product1-is-really-cool
Technical SEO | | seekjobs
http://www.xyz.com/products/product2-is-even-cooler
http://www.xyz.com/products/product3-is-the-coolest Here are 2 approaches I was considering: 1. To just include the dynamic product URLS within the same sitemap as the static URLs using just the following http://www.xyz.com/products/ - This is so spiders have access to the folder the products are in and I don't have to create an automated sitemap for all product OR 2. Create a separate automated sitemap that updates when ever a product is updated and include the change frequency to be hourly - This is so spiders always have as close to be up to date sitemap when they crawl the sitemap I look forward to hearing your thoughts, opinions, suggestions and/or previous experiences with this. Thanks heaps, LW0 -
HTML Sitemap Pagination?
Im creating an a to z type directory of internal pages within a site of mine however there are cases where there are over 500 links within the pages. I intend to use pagination (rel=next/prev) to avoid too many links on the page but am worried about indexation issues. should I be worried?"
Technical SEO | | DMGoo0 -
No indexing url including query string with Robots txt
Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL0 -
Duplicate canonical URLs in WordPress
Hi everyone, I'm driving myself insane trying to figure this one out and am hoping someone has more technical chops than I do. Here's the situation... I'm getting duplicate canonical tags on my pages and posts, one is inside of the WordPress SEO (plugin) commented section, and the other is elsewhere in the header. I am running the latest version of WordPress 3.1.3 and the Genesis framework. After doing some testing and adding the following filters to my functions.php: <code>remove_action('wp_head', 'genesis_canonical'); remove_action('wp_head', 'rel_canonical');</code> ... what I get is this: With the plugin active + NO "remove action" - duplicate canonical tags
Technical SEO | | robertdempsey
With the plugin disabled + NO "remove action" - a single canonical tag
With the plugin disabled + A "remove action" - no canonical tag I have tried using only one of these remove_actions at a time, and then combining them both. Regardless, as long as I have the plugin active I get duplicate canonical tags. Is this a bug in the plugin, perhaps somehow enabling the canonical functionality of WordPress? Thanks for your help everyone. Robert Dempsey0