Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to fix duplicate content for homepage and index.html
-
Hello,
I know this probably gets asked quite a lot but I haven't found a recent post about this in 2018 on Moz Q&A, so I thought I would check in and see what the best route/solution for this issue might be. I'm always really worried about making any (potentially bad/wrong) changes to the site, as it's my livelihood, so I'm hoping someone can point me in the right direction.
Moz, SEMRush and several other SEO tools are all reporting that I have duplicate content for my homepage and index.html (same identical page).
According to Moz, my homepage (without index.html) has PA 29 and index.html has PA 15. They are both showing Status 200. I read that you can either do a 301 redirect or add rel=canonical
I currently have a 301 setup for my http to https page and don't have any rel=canonical added to the site/page. What is the best and safest way to get rid of duplicate content and merge the my non index and index.html homepages together these days? I read that both 301 and canonical pass on link juice but I don't know what the best route for me is given what I said above.
Thank you for reading, any input is greatly appreciated!
-
OK, Paul, I hear what you are saying. It's a very open and obvious diss.
I'm not sure what you are saying makes any difference to the argument that the canonical way here is not the way to go. I was explaining in the simplest way, I would not want, and I'm sure you would not want either, a live page like this - the home page, live and canonicalised.
(It's a given that the canonical works like a 301, passing link juice to the preferred version.)
So thanks but it makes no difference - delete & 301 every time.
Google is heightening its distrust of canonicals - the new Seach Console tool reveals which pages are the preferred canonical and it's something of a surprise to SEOs!
If you feel like playing top trumps again then why not PM me? - it's so much better and the uninitiated do not need to see it!
Cheers Nigel
-
A proper canonical tag does a lot more than "just be telling Google not to rank it" When used properly (i.e. pages that truly do contain the same content), the canonicalised page passes its ranking signals back to the canonical source.
I agree with Kristina - while a 301 would be preferable (it's a directive, while canonical tags are taken as suggestions), a canonical tag would be vastly better than not doing anything about the issue. At least until the dev can get the problem with the 301-redirect properly resolved.
Paul
-
It's best practice to redirect, but if that's not an option, the canonical route should help the problem a lot! You'll probably lose some link equity with this route, but it should clear up duplicate content issues from Google's side.
-
Hi Dre
If you just do a canonical then the page will still be live, you will just be telling Google not to rank it. Best practice is to remove it all together and 301. It is bad practice having more than one version of your home page, (any page) live!
Regards Nigel
-
Thank you so much for all the responses. So it sounds like 301 redirect through htaccess is the way to go. What is the difference between using the 301 through htaccess vs using rel=canonical in my case? Does the 301 provide better link juice vs rel=canonical or is canonical just not applicable in this case? Thanks for all the replies and helpful suggestions again!
EDIT: I spoke to my developer (who is hosting and maintaining my site now).. he said he tried to do 301 through htaccess but it seems to be crashing the site (and trust me he is very good at what he does). Part of the problem is that my site is VERY old (originally build about 10 years ago and NOT updated once since).. he has been slowly updating and cleaning up the site slowly and he will try to figure out why the 301 is crashing the site and not working but in the mean time how safe is it to use rel=canonical instead of a 301?
Thanks again!
-
Hi dre
Your site really shouldn't be generating an index.html in the first place but if it is you must make sure that there is a 301 in the htaccess file sending all traffic to the single homepage URL as Lynn correctly points out this will be a permanent redirect.
It is very simple to do. Both versions are treated as separate pages (as http and https) so you are essentially showing a duplicate site to Google so your rankings will be terrible until you change.
Regards Nigel
-
Hello there,
You can use .htaccess URL rewrite to remove all the .html from your URL, here's the rewrite rules.
RewriteEngine On
RewriteRule ^index.html$ / [R=301,L]
RewriteRule ^(.*)/index.html$ /$1/ [R=301,L]Once you added this rules you should also fix all your internal links make sure they link to the URL without .html
Hope this helps,
Joseph Yap
-
"I currently have a 301 setup for my http to https page" - great! Also, you should check if your inner pages redirecting from HTTP-versions to HTTPS too.
index.html should redirect to the homepage main version with 301 Permanent Redirect.
-
Google consider HTTP and HTTPS as two separate protocols. Since the contents are same on both versions, google bots consider it as duplicate content. Adding a canonical URL will solve this problem. If you have any doubts, feel free to ask.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does hover over content index well
i notice increasing cases of portfolio style boxes on site designs (especially wordpress templates) where you have an image and text appears after hover over (sorry for my basic terminology). does this text which appears after hover over have much search engine value or as it doesnt immediately appear on pageload does it carry slightly less weight like tabbed content? any advice appreciated thanks neil
On-Page Optimization | | neilhenderson0 -
No-index all the posts of a category
Hi everyone! I would like no-indexing all the posts of a specific category of my wordpress site. The problem is that the structure of my URL is composed without /category/: www.site-name.ext/date/post-name/
On-Page Optimization | | salvyy
so without /category-name/ Is possibile to disallow the indexing of all the posts of the category via robots.txt? Using Yoast Plugin I can put the no-index for each post, but I would like to put the no-index (or disallow/) a time for all the post of the category. Thanks in advance for your help and sorry for my english. Mike0 -
Duplicate content penalty
when moz crawls my site they say I have 2x the pages that I really have & they say I am being penalized for duplicate content. I know years ago I had my old domain resolve over to my new domain. Its the only thing that makes sense as to the duplicate content but would search engines really penalize me for that? It is technically only on 1 site. My business took a significant sales hit starting early July 2013, I know google did and algorithm update that did have SEO aspects. I need to resolve the problem so I can stay in business
On-Page Optimization | | cheaptubes0 -
Duplicate Content for Men's and Women's Version of Site
So, we're a service where you can book different hairdressing services from a number of different salons (site being worked on). We're doing both a male and female version of the site on the same domain which users are can select between on the homepage. The differences are largely cosmetic (allowing the designers to be more creative and have a bit of fun and to also have dedicated male grooming landing pages), but I was wondering about duplicate pages. While most of the pages on each version of the site will be unique (i.e. [male service] in [location] vs [female service] in [location] with the female taking precedent when there are duplicates), what should we do about the likes of the "About" page? Pages like this would both be unique in wording but essentially offer the same information and does it make sense to to index two different "About" pages, even if the titles vary? My question is whether, for these duplicate pages, you would set the more popular one as the preferred version canonically, leave them both to be indexed or noindex the lesser version entirely? Hope this makes sense, thanks!
On-Page Optimization | | LeahHutcheon0 -
Duplicate Content when Using "visibility classes" in responsive design layouts? - a SEO-Problem?
I have text in the right column of my responsive layout which will show up below the the principal content on small devices. To do this I use visibility classes for DIVs. So I have a DIV with with a unique style text that is visible only on large screen sizes. I copied the same text into another div which shows only up only on small devices while the other div will be hidden in this moment. Technically I have the same text twice on my page. So this might be duplicate content detected as SPAM? I'm concerned because hidden text on page via expand-collapsable textblocks will be read by bots and in my case they will detect it twice?Does anybody have experiences on this issue?bestHolger
On-Page Optimization | | inlinear0 -
Does schema.org assist with duplicate content concerns
The issue of duplicate content has been well documented and there are lots of articles suggesting to noindex archive pages in WordPress powered sites. Schema.org allows us to mark-up our content, including marking a components URL. So my question simply, is no-indexing archive (category/tag) pages still relevant when considering duplicate content? These pages are in essence a list of articles, which can be marked as an article or blog posting, with the url of the main article and all the other cool stuff the scheme gives us. Surely Google et al are smart enough to recognise these article listings as gateways to the main content, therefore removing duplicate content concerns. Of course, whether or not doing this is a good idea will be subjective and based on individual circumstances - I'm just interested in whether or not the search engines can handle this appropriately.
On-Page Optimization | | MarkCA0 -
Our sitemap is not indexed well
Hey there, Hope you guys can help. We get the following error: Nested indexing. Another Sitemap index refers to the index of sitemaps. The thing is that we cant find the error they are talking about. Thanks!!!!
On-Page Optimization | | Comunicare0 -
Avoiding "Duplicate Page Title" and "Duplicate Page Content" - Best Practices?
We have a website with a searchable database of recipes. You can search the database using an online form with dropdown options for: Course (starter, main, salad, etc)
On-Page Optimization | | smaavie
Cooking Method (fry, bake, boil, steam, etc)
Preparation Time (Under 30 min, 30min to 1 hour, Over 1 hour) Here are some examples of how URLs may look when searching for a recipe: find-a-recipe.php?course=starter
find-a-recipe.php?course=main&preperation-time=30min+to+1+hour
find-a-recipe.php?cooking-method=fry&preperation-time=over+1+hour There is also pagination of search results, so the URL could also have the variable "start", e.g. find-a-recipe.php?course=salad&start=30 There can be any combination of these variables, meaning there are hundreds of possible search results URL variations. This all works well on the site, however it gives multiple "Duplicate Page Title" and "Duplicate Page Content" errors when crawled by SEOmoz. I've seached online and found several possible solutions for this, such as: Setting canonical tag Adding these URL variables to Google Webmasters to tell Google to ignore them Change the Title tag in the head dynamically based on what URL variables are present However I am not sure which of these would be best. As far as I can tell the canonical tag should be used when you have the same page available at two seperate URLs, but this isn't the case here as the search results are always different. Adding these URL variables to Google webmasters won't fix the problem in other search engines, and will presumably continue to get these errors in our SEOmoz crawl reports. Changing the title tag each time can lead to very long title tags, and it doesn't address the problem of duplicate page content. I had hoped there would be a standard solution for problems like this, as I imagine others will have come across this before, but I cannot find the ideal solution. Any help would be much appreciated. Kind Regards5