Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block Baidu crawler?
-
Hello!
One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk.
Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site?
What do you suggest?
-
I'm also trying to get this done as well, not sure if its doable on Volusion(don't use them).
Yandex actually crawls more than Baidu for me, and both don't benefit me at all(sucks when you pay for the bandwidth)
-
Thanks for that I have just looked that up-I didn't realise that this was such a common problem.
-
Hi
Further to Ally's answer, in my experiance Baidu tends to ignor the robot.txt, so just do it on the server side.
S
-
Thanks Ally for your answer, will now block Baidu
-
Hi Stefan,
You can block the Baidu crawler in in the robots.txt.
There should be no adverse affect to your site. As this is not an area you are targeting and has no future long term benerfit to your business. Blocking the crawler will mean that your server has less load to deal with from the unnecessary traffic you have been receiving.
You can block the spiders in the following ways:
- Robots.txt (below is code for Baidu)
User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /- Blocking Spiders via the Apache Configuration File httpd.conf
See the below article for more details on this method
http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites
You may also want to check out:
I hope this helps,
Ally
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking certain countries via IP address location
We are a US based company that ships only to US and Canada. We've had two issues arise recently from foreign countries (Russia namely) that caused us to block access to our site from anyone attempting to interact with our store from outside of the US and Canada. 1. The first issue we encountered were fraudulent orders originating from Russia (using stolen card data) and then shipping to a US based International shipping aggregator. 2. The second issue was a consistent flow of Russian based "new customer" entries. My question to the MOZ community is this: are their any unintended consequences, from an SEO perspective, to blocking the viewing of our store from certain countries.
Technical SEO | | MNKid150 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
How to block "print" pages from indexing
I have a fairly large FAQ section and every article has a "print" button. Unfortunately, this is creating a page for every article which is muddying up the index - especially on my own site using Google Custom Search. Can you recommend a way to block this from happening? Example Article: http://www.knottyboy.com/lore/idx.php/11/183/Maintenance-of-Mature-Locks-6-months-/article/How-do-I-get-sand-out-of-my-dreads.html Example "Print" page: http://www.knottyboy.com/lore/article.php?id=052&action=print
Technical SEO | | dreadmichael0 -
Block Quotes and Citations for duplicate content
I've been reading about the proper use for block quotes and citations lately, and wanted to see if I was interpreting it the right way. This is what I read: http://www.pitstopmedia.com/sem/blockquote-cite-q-tags-seo So basically my question is, if I wanted to reference Amazon or another stores product reviews, could I use the block quote and citation tags around their content so it doesn't look like duplicate content? I think it would be great for my visitors, but also to the source as I am giving them credit. It would also be a good source to link to on my products pages, as I am not competing with the manufacturer for sales. I could also do this for product information right from the manufacturer. I want to do this for a contact lens site. I'd like to use Acuvue's reviews from their website, as well as some of their product descriptions. Of course I have my own user reviews and content for each product on my website, but I think some official copy could do well. Would this be the best method? Is this how Rottentomatoes.com does it? On every movie page they have 2-3 sentences from 50 or so reviews, and not much unique content of their own. Cheers, Vinnie
Technical SEO | | vforvinnie1 -
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like: staging.domain.com
Technical SEO | | fthead9
User-agent: *
Disallow: / in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.0 -
Blocking URL's with specific parameters from Googlebot
Hi, I've discovered that Googlebot's are voting on products listed on our website and as a result are creating negative ratings by placing votes from 1 to 5 for every product. The voting function is handled using Javascript, as shown below, and the script prevents multiple votes so most products end up with a vote of 1, which translates to "poor". How do I go about using robots.txt to block a URL with specific parameters only? I'm worried that I might end up blocking the whole product listing, which would result in de-listing from Google and the loss of many highly ranked pages. DON'T want to block: http://www.mysite.com/product.php?productid=1234 WANT to block: http://www.mysite.com/product.php?mode=vote&productid=1234&vote=2 Javacript button code: onclick="javascript: document.voteform.submit();" Thanks in advance for any advice given. Regards,
Technical SEO | | aethereal
Asim0 -
How Can I Block Archive Pages in Blogger when I am not using classic/default template
Hi, I am trying to block all the archive pages of my blog as Google is indexing them. This could lead to duplicate content issue. I am not using default blogger theme or classic theme and therefore, I cannot use this code therein: Please suggest me how I can instruct Google not to index archive pages of my blog? Looking for quick response.
Technical SEO | | SoftzSolutions0 -
Is blocking RSS Feeds with robots.txt necessary?
Is it necessary to block an rss feed with robots.txt? It seems they are automatically not indexed (http://googlewebmastercentral.blogspot.com/2007/12/taking-feeds-out-of-our-web-search.html) And, google says here that it's important not to block RSS feeds (http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html) I'm just checking!
Technical SEO | | nicole.healthline0