Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate Titles caused by multiple variations of same URL
-
Hi. Can you please advise how I can overcome this issue. Moz.com crawle is indicating I have 100's of Duplicate Title tag errors. However this is caused because many URL's have been indexed multiple times in Google. For example.
www.abc.com
www.abc.com/?b=123What can I do to stop this issue being reported as duplictae Titles, as well as content?
I was thinking maybe I can use Robots.txt to block various query string parameters. I'm Open to ideas and examples.
-
Depending on how you implement the canonicals, you should see a decrease in your duplicate errors, which will be replaced by canonical notices. Ideally, there won't be anything to ignore.
-
Thank you for your response.
Does this mean for each main page I have i.e.
etc,
if I put a Rel="canonical", I can then ignore messages of duplicate content for URL's reported such as
abc.com/page1 (put a rel="canonical" on this page)
abc.com/page2 (put a rel="canonical" on this page)
abc.com/index.html (put a rel="canonical" on this page)
etc,
?
-
*Edit: Miki beat me to it, but here's a little more explanation.
The first thing to note here is that Google's indexing doesn't actually have any effect on your Moz crawl report. All of the data you see there comes from our very own rogerbot, which crawls similarly to googlebot.
Though Google's crawler has a wide variety of ways to locate and index content, rogerbot can only crawl links on your site. If your crawl report is picking up each of these URLs, then there must be links pointing to those URLs somewhere on your site. The danger here is that Google and the other search engines will pick up those variants and not be able to determine which of them is the "real" one. That could lead to a) Google listing a URL you'd rather it didn't, or b) Google not understanding how to list your site at all.
A few of these have pretty simple fixes—index.html should be 301 redirected to your root domain, for example. Rel="canonical" is very applicable here, too. Here are a couple resources you may want to check out:
http://moz.com/learn/seo/canonicalization - Best practices article on canonicalization
http://moz.com/learn/seo/redirection - Best practices article on redirectsI hope that helps!
Matt Roney
Moz Customer Mentor -
I would redirect all variations to www.abc.com as well as REL=Canonical back to www.abc.com. This should solve you issues.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does google ignore ? in url?
Hi Guys, Have a site which ends ?v=6cc98ba2045f for all its URLs. Example: https://domain.com/products/cashmere/robes/?v=6cc98ba2045f Just wondering does Google ignore what is after the ?. Also any ideas what that is? Cheers.
Intermediate & Advanced SEO | | CarolynSC0 -
Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?
I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?
Intermediate & Advanced SEO | | Jonathan.Smith0 -
Removing duplicate content
Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs. 1. We've instituted canonical tags site wide. 2. We are using the parameters function in Webmaster Tools. 3. We are using 301 redirects on all of the obsolete URLs 4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals. 5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed. None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical. Any ideas on how to clean up the mess?
Intermediate & Advanced SEO | | AMHC0 -
Attack of the dummy urls -- what to do?
It occurs to me that a malicious program could set up thousands of links to dummy pages on a website: www.mysite.com/dynamicpage/dummy123 www.mysite.com/dynamicpage/dummy456 etc.. How is this normally handled? Does a developer have to look at all the parameters to see if they are valid and if not, automatically create a 301 redirect or 404 not found? This requires a table lookup of acceptable url parameters for all new visitors. I was thinking that bad url names would be rare so it would be ok to just stop the program with a message, until I realized someone could intentionally set up links to non existent pages on a site.
Intermediate & Advanced SEO | | friendoffood1 -
[E-commerce] Duplicate content due to color variations (canonical/indexing)
Hello, We currently have a lot of color variations on multiple products with almost the same content. Even with our canonicals being set, Moz's crawling tool seems to flag them as duplicate content. What we have done so far: Choosing the best-selling color variation (our "master product") Adding a rel="canonical" to every variation (with our "master product" as the canonical URL) In my opinion, it should be enough to address this issue. However, being given the fact that it's flagged as duplicate by Moz, I was wondering if there is something else we should do? Should we add a "noindex,follow" to our child products and "index,follow" to our master product? (sounds to me like such a heavy change) Thank you in advance
Intermediate & Advanced SEO | | EasyLounge0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
Keep multiple domains or combine them?
I need some help figuring out if I should combine multiple domains or if I should let them be separate? I have domain1.com, domain2.com, and domain3.com. Well, domain1.com owns domain2.com and domain3.com. And currently domain1.com points to domain2.com and domain3.com from the homepage. They are going through some changes at their business, and now the option is on the table to combine the domains or still let them be separate as long as they link to each other. What is the best way to handle this and are there more things I should go through before making a decision? None of them have a ton of links to them, and they aren't super robust, but would just to have some advice. Thanks a lot
Intermediate & Advanced SEO | | Rocket.Fuel0 -
Magento: URLs for Products in Multiple Categories
I am working in Magento to build out a large e-commerce site with several thousand products. It's a great platform, but I have run into the issue of what it does to URLs when you put a product into multiple categories. Basically, "a book" in two categories would make two URLs for one product: 1) /books/a-book 2) author-name/a-book So, I need to come up with a solution for this. It seems I have two options: Found this from a Magento SEO article: 'Magento gives you the ability to add the name of categories to path for product URL's. Because Magento doesn't support this functionality very well - it creates duplicate content issues - it is a very good idea to disable this. To do this, go to System => Configuration => Catalog => Search Engine Optimization and set "Use categories path for product URL's to "no".' This would solve the issues and be a quick fix, but I think it's a double edged sword, because then we lose the SEO value of our well named categories being in the URL. Use Canonical tags. To be fair, I'm not even sure this is possible. Even though it is creating different URLs and, thus, poses a risk of "duplicate content" being crawled, there really is only one page on the admin side. So, I can't go to all of the "duplicate" pages and put a canonical tag, because those duplicate pages don't really exist on the back-end. Does that make sense? After typing this out, it seems like the best thing to do probably will be to just turn off categories in the URL from the admin side. However, I'd still love any input from the community on this. Thanks!
Intermediate & Advanced SEO | | Marketing.SCG0