Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it best practice to have a canonical tags on all pages
The website I'm working on has no canonical tags. There is duplicate content so rel=canonicals need adding to certain pages but is it best practice to have a tag on every page ?
Intermediate & Advanced SEO | | ColesNathan0 -
Canonical Chain
This is quite advanced so maybe Rand can give me an answer? I often have seen questions surrounding a 301 chain where only 85% of the link juice is passed on to the first target and 85% of that to the next one, up to three targets. But how about a canonical chain? What do I mean by this:? I have a client who sells lighting so I will use a real example (sans domain) I don't want 'new-product' pages appearing in SERPS. They dilute link equity for the categories they replicate and often contain identical products to the main categories and subcategories. I don't want to no index them all together I'd rather tell Google they are the same as the higher category/sub category. (discussion whether a noindex/follow tag would be better?) If I canonicalize new-products/ceiling-lights-c1/kitchen-lighting-c17/kitchen-ceiling-lights-c217 to /ceiling-lights-c1/kitchen-lighting-c17/kitchen-ceiling-lights-c217 I then subsequently discover that everything in kitchen-ceiling-lights-c217 is already in /kitchen-lighting-c17 and I decide to canonicalize those two - so I place a /kitchen-lighting-c17 canonical on /kitchen-ceiling-lights-c217. Then what happens to the new-products canonical? Is it the same rule - does it pass 85% of link equity back to the non new-product URL and 85% of that back to the category? does it just not work? or should I do noindexi/follow Now before you jump in: Let's assume these are done over a period of time because the obvious answer is: Canonicalize both back to /ceiling-lights-c1/kitchen-lighting-c17 I know that and that is not what I am asking. What if they are done in a sequence what is the real result? I don't want to patronise anyone but please read this carefully before giving an answer. Regards Nigel Carousel Projects.
Intermediate & Advanced SEO | | Nigel_Carr0 -
Googlebot being redirected but not users?
Hi, We seem to have a slightly odd issue. We noticed that a number of our location category pages were slipping off 1 page, and onto page 2 in our niche. On inspection, we noticed that our Arizona page had started ranking in place of a number of other location pages - Cali, Idaho, NJ etc. Weirdly, the pages they had replaced were no longer indexed, and would remain so, despite being fetched, tweeted etc. One test was to see when the dropped out pages had been last crawled, or at least cached. When conducting the 'cache:domain.com/category/location' on these pages, we were getting 301 redirected to, you guessed it, the Arizona page. Very odd. However, the dropped out pages were serving 200 OK when run through header checker tools, screaming frog etc. On the face of it, it would seem Googlebot is getting redirected when it is hitting a number of our key location pages, but users are not. Has anyone experienced anything like this? The theming of the pages are quite different in terms of content, meta etc. Thanks.
Intermediate & Advanced SEO | | Sayers0 -
Duplicate content due to parked domains
I have a main ecommerce website with unique content and decent back links. I had few domains parked on the main website as well specific product pages. These domains had some type in traffic. Some where exact product names. So main main website www.maindomain.com had domain1.com , domain2.com parked on it. Also had domian3.com parked on www.maindomain.com/product1. This caused lot of duplicate content issues. 12 months back, all the parked domains were changed to 301 redirects. I also added all the domains to google webmaster tools. Then removed main directory from google index. Now realize few of the additional domains are indexed and causing duplicate content. My question is what other steps can I take to avoid the duplicate content for my my website 1. Provide change of address in Google search console. Is there any downside in providing change of address pointing to a website? Also domains pointing to a specific url , cannot provide change of address 2. Provide a remove page from google index request in Google search console. It is temporary and last 6 months. Even if the pages are removed from Google index, would google still see them duplicates? 3. Ask google to fetch each url under other domains and submit to google index. This would hopefully remove the urls under domain1.com and doamin2.com eventually due to 301 redirects. 4. Add canonical urls for all pages in the main site. so google will eventually remove content from doman1 and domain2.com due to canonical links. This wil take time for google to update their index 5. Point these domains elsewhere to remove duplicate contents eventually. But it will take time for google to update their index with new non duplicate content. Which of these options are best best to my issue and which ones are potentially dangerous? I would rather not to point these domains elsewhere. Any feedback would be greatly appreciated.
Intermediate & Advanced SEO | | ajiabs0 -
Duplicate URLs ending with #!
Hi guys, Does anyone know why a site can contain duplicate URLs ending with hastag & exclamation mark e.g. https://site.com.au/#! We are finding a lot of these URLs (as duplicates) and i was wondering what they are from developer standpoint? And do you think it's worth the time and effort adding a rel canonical tag or 301 to these URLs eventhough they're not getting indexed by Google? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Removing duplicate content
Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs. 1. We've instituted canonical tags site wide. 2. We are using the parameters function in Webmaster Tools. 3. We are using 301 redirects on all of the obsolete URLs 4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals. 5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed. None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical. Any ideas on how to clean up the mess?
Intermediate & Advanced SEO | | AMHC0 -
Pagination duplicate title and meta description
Hello, Getting a lot of duplicate title and meta description errors via google webmaster tools. For best SEO practices, do i no-index the page/2's, page/3's...? More importantly, i see how MOZ did it by adding "page 3" to their titles such as http://moz.com/blog?page=3. Is that a better way of doing it? If so, how do i do that on Yoast SEO? Thank you so much!
Intermediate & Advanced SEO | | Shawn1240 -
Duplicate Title tags even with rel=canonical
Hello, We were having duplicate content in our blog (a replica of each post automatically was done by the CMS), until we recently implemented a rel=canonical tag to all the duplicate posts (some 5 weeks ago). So far, no duplicate content were been found, but we are still getting duplicate title tags, though the rel=canonical is present. Any idea why is this the case and what can we do to solve it? Thanks in advance for your help. Tej Luchmun
Intermediate & Advanced SEO | | luxresorts0