Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate content on subdomains.
-
Hi Mozer's,
I have a site www.xyz.com and also geo targeted sub domains www.uk.xyz.com, www.india.xyz.com and so on. All the sub domains have the content which is same as the content on the main domain that is www.xyz.com.
So, I want to know how can i avoid content duplication.
Many Thanks!
-
It would probably be better (and more likely to get you responses) if you started a new question - this one is three years old. Generally, I think it depends on your scope. If you need some kind of separation (corporate, legal, technical), then separate domains or sub-domains may make sense. They're also easier to target, in some ways. However, you're right that authority may be diluted and you'll need more marketing effort against each one.
If resources are limited and you don't need each country to be a fully separate entity, then you'll probably have less headaches with sub-folders. I'm speaking in broad generalities, though - this is a big decision that depends a lot on the details.
-
Dear all,
I have bought 30 geo top level domains. This is for an ecommerce project that has not launcehd yet (and isn't indexed by Google).
I am now at a point where I can change/consolidate all domains as sub domains or sub folders or keep things as they are.
I just worry that link building would be scattered and not focused and that it might be better to concentrate the efforts on one domain.
What are your views on this?
Many thanks.
-
Yeah - I'm really afraid that stacking all those sub-domains is going to cause you long-term issues with your link-building, and that some of those sub-domains could fragment. If the country needs to be in a sub-domain, then I think the hybrid approach (with "/shop" as a sub-folder) may cause you less trouble.
I will warn, though, that any change like this carries some risk. You'll have to put proper 301-redirects in place.
I might try the href lang tags first, though, and see if it helps the current problem (it may take a few weeks). Changing too many aspects of the on-page SEO at once could cause you a lot of grief.
-
shop. pages are simply new pages which are added for products to be sold with ease. I think that i might move shop.uk.xyz.com pages to uk.xyz.com/shop/product as in a sub folder. Do you think this will help in passing on the link juice to those pages after the change and would be easy for me to include them in the sitemap as well??
-
If you have separate GWT profiles, then I think the XML sitemap may have to be under the sub-domain - Google has to be able to access it from a sub-domain URL. It doesn't have to be in the root of the sub-domain.
I'm not clear on what the "shop." pages are, but stacking sub-domains like that sounds like it's getting pretty messy. Why the separation?
-
I have already created separate profiles for the subdomains, but my only worry is where to place the sitemap on the server eg in the root directory of the root domain or in the root directory of the sub domain.
Coming to the (2) the pages which i want to include in the site map are my product pages. so want to know if shop.uk.xyz.com can be included in the sitemap which will be for uk.xyz.com and also if does that count as a internal page of uk.xyz.com
-
It is probably best to create separate profiles in Google Webmaster Tools, because then you can target the sub-domains to the countries in question. At that point, you could also set up separate sitemaps. It'll give you a cleaner view of how each sub-domain is indexed and ranking.
I'm not sure I understand (2) - why wouldn't you include those pages in the sitemap?
-
Thank you for your inputs. I has relly helped me understand the situation.
I will try to implement this and let you know how I have done on this. Also I had few more things on this:
1. do i require a separate sitemap and robots file for all the sub domains and where shall i place it on the server?
2. in the sub domain there are pages like shop.uk.xyz.com/product1. so can i include that in the sitemaps as those are the pages which i really want to rank for.
-
There's no perfect answer. Canonical tags would keep the sub-domains from ranking, in many cases. The cross-TLD stuff is weird, though - Google can, in some cases, ignore the canonical if they think that one sub-domain is more appropriate for the country/ccTLD the searcher is using.
Sub-domains can be tricky in and of themselves, unfortunately, because they sometimes fragment and don't pass link "juice" fully to the root domain. I generally still think sub-folders are better for cases like this, but obviously that would be a big change (and potentially risky).
You could try the rel="alternate" hreflang tags. They're similar to canonical (a bit weaker), but basically are designed to handle the same content in different languages and regions:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=189077
They're basically designed for exactly this problem. You can set the root domain to "en-US", the UK sub-domain to "en-UK", etc. I've heard generally good things, and they're low-risk, but you have to try it and see. They can be a little tricky to implement properly.
-
No, 301 and canonicals are completely different
A 301 will redirect a page and a canonical is setting the preferred version of the page. For example:
301 - you have an old version of the page that looks like this www.example.com/p?=153 and you want it to look like www.example.com/red-apples. You would use a 301 from the old page (www.example.com/p?=153) to the new page (www.example.com/red-apples)
Canonical - Lets go back to the red apples example. Lets say you have a ecommerce site and you have different ways to search for products. One way is to search by fruit and the other by color. So what you'll have is two versions of the end result. For example. You'll have www.example.com/fruit/red-apples and you might have www.example.com/red/red-apples. Since both of those pages show the same information you don't want the engines to think its duplicate content so you can add a rel=canonical link element to both pages to the preferred version of the two. (ie you might want to have the canonical be www.example.com/red-apples) That's all it does. It tells the engines your preferred version of the pages that may be the same.
Back to your original post, you really don't need to "noindex" but I thought you were having a duplicate content issue and that would solve the issue. (Generally, Google won't penalize you this sort of duplicate content)
Here is what I would do.
If you don't have Google Webmaster tools already set up then do so. Verify each version of your subdomain, (ie. india.xyz.com, uk.xyz.com, etc)(let me know if you need help) and then set your Geo Target for each them manually (You'll have to set this up manually because you have a gTLD and not a ccTLD)
How to set your Geo Target manually.
To to a particular version of your site in WMT (ie. india.xyz.com) and click on "configuration" then "settings". Under "settings" the first sections says "Geographical Target". "Check" the box and then use the drop down to select "india".
Repeat this for all of your subdomains for each specific country.
This will let Google know that you are trying to target users in a specific country.
If you have the money to invest in it, I would also try to have those subdomains hosted by a server in each particular country. (strong signal for Google)
Hope it helps.
-
Thanx Darin!
I have few doubts on this:
1. is rel canonical like a 301 redirect? As my concern is if my user goes to www.uk.xyz.com/productx , will he be redirected to to www.xyz.com/product
2. my sub domain pages are ranking in the country specific search engine. For ex, www.uk.xyz.com is ranking for keywords in google.co.uk. So if i noindex then i will loose my search engine presence in the country specific search engine.
PS the content on the pages is all same apart from the product currency.
-
I disagree. I said "noindex" not "nofollow". Link juice will be passed but not show up in the Serps. I do agree with you though that the strategy as a whole, if there is in-fact exact/duplicate content, seems to be a waste. Unless these pages are in another language, I don't see the point of this subdomain strategy.
-
Canonical will help to remove duplicate issues and also to consolidate your link values. I didn't see any issue with cross domain implementation.
If you add "noindex" to any of these pages, you won't get any link credit.
-
Short Answer: Set a canonical url on the pages to the root domain version and noindex the subdomain pages.
What this does is avoid the duplicate content problem. Generally, those subdomain pages won't rank anyway because the same information is on the "main" site. You can still build links to those subdomain pages and do a strong internal link structure to help the "main" site rankings.
The only negative to this is that the pages in your subdomain won't rank. That's not necessarily a bad thing but just know they won't. But, if the pages are truly duplicate content, they won't rank anyway.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content on URL trailing slash
Hello, Some time ago, we accidentally made changes to our site which modified the way urls in links are generated. At once, trailing slashes were added to many urls (only in links). Links that used to send to
Intermediate & Advanced SEO | | yacpro13
example.com/webpage.html Were now linking to
example.com/webpage.html/ Urls in the xml sitemap remained unchanged (no trailing slash). We started noticing duplicate content (because our site renders the same page with or without the trailing shash). We corrected the problematic php url function so that now, all links on the site link to a url without trailing slash. However, Google had time to index these pages. Is implementing 301 redirects required in this case?1 -
Contextual FAQ and FAQ Page, is this duplicate content?
Hi Mozzers, On my website, I have a FAQ Page (with the questions-responses of all the themes (prices, products,...)of my website) and I would like to add some thematical faq on the pages of my website. For example : adding the faq about pricing on my pricing page,... Is this duplicate content? Thank you for your help, regards. Jonathan
Intermediate & Advanced SEO | | JonathanLeplang0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
Problems with ecommerce filters causing duplicate content.
We have an ecommerce website with 700 pages. Due to the implementation of filters, we are seeing upto 11,000 pages being indexed where the filter tag is apphended to the URL. This is causing duplicate content issues across the site. We tried adding "nofollow" to all the filters, we have also tried adding canonical tags, which it seems are being ignored. So how can we fix this? We are now toying with 2 other ideas to fix this issue; adding "no index" to all filtered pages making the filters uncrawble using javascript Has anyone else encountered this issue? If so what did you do to combat this and was it successful?
Intermediate & Advanced SEO | | Silkstream0 -
PDF for link building - avoiding duplicate content
Hello, We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product. We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful. My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content? Thanks.
Intermediate & Advanced SEO | | BobGW0 -
News sites & Duplicate content
Hi SEOMoz I would like to know, in your opinion and according to 'industry' best practice, how do you get around duplicate content on a news site if all news sites buy their "news" from a central place in the world? Let me give you some more insight to what I am talking about. My client has a website that is purely focuses on news. Local news in one of the African Countries to be specific. Now, what we noticed the past few months is that the site is not ranking to it's full potential. We investigated, checked our keyword research, our site structure, interlinking, site speed, code to html ratio you name it we checked it. What we did pic up when looking at duplicate content is that the site is flagged by Google as duplicated, BUT so is most of the news sites because they all get their content from the same place. News get sold by big companies in the US (no I'm not from the US so cant say specifically where it is from) and they usually have disclaimers with these content pieces that you can't change the headline and story significantly, so we do have quite a few journalists that rewrites the news stories, they try and keep it as close to the original as possible but they still change it to fit our targeted audience - where my second point comes in. Even though the content has been duplicated, our site is more relevant to what our users are searching for than the bigger news related websites in the world because we do hyper local everything. news, jobs, property etc. All we need to do is get off this duplicate content issue, in general we rewrite the content completely to be unique if a site has duplication problems, but on a media site, im a little bit lost. Because I haven't had something like this before. Would like to hear some thoughts on this. Thanks,
Intermediate & Advanced SEO | | 360eight-SEO
Chris Captivate0 -
Can PDF be seen as duplicate content? If so, how to prevent it?
I see no reason why PDF couldn't be considered duplicate content but I haven't seen any threads about it. We publish loads of product documentation provided by manufacturers as well as White Papers and Case Studies. These give our customers and prospects a better idea off our solutions and help them along their buying process. However, I'm not sure if it would be better to make them non-indexable to prevent duplicate content issues. Clearly we would prefer a solutions where we benefit from to keywords in the documents. Any one has insight on how to deal with PDF provided by third parties? Thanks in advance.
Intermediate & Advanced SEO | | Gestisoft-Qc1 -
Capitals in url creates duplicate content?
Hey Guys, I had a quick look around however I couldn't find a specific answer to this. Currently, the SEOmoz tools come back and show a heap of duplicate content on my site. And there's a fair bit of it. However, a heap of those errors are relating to random capitals in the urls. for example. "www.website.com.au/Home/information/Stuff" is being treated as duplicate content of "www.website.com.au/home/information/stuff" (Note the difference in capitals). Anyone have any recommendations as to how to fix this server side(keeping in mind it's not practical or possible to fix all of these links) or to tell Google to ignore the capitalisation? Any help is greatly appreciated. LM.
Intermediate & Advanced SEO | | CarlS0