Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
-
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:
staging.domain.com
User-agent: *
Disallow: /in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.
-
Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.
-
I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.
-
Matt/Ryan-
Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.
I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example
-
.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.
-
Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.
I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.
You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.
-
Correct.
-
Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.
The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.
-
Hi Matt.
Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.
-
Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.
-
Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.
http://staging.domin.com/robots.txt
User-agent: *
Disallow: /That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?
-
Use an .htaccess file to only allow from certain ip addresses or ranges.
Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm
-
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Place a robots.txt file in the root of the subdomain.
User-agent: *
Disallow: /This method will block the subdomain while leaving your primary domain unaffected.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does having a sub-domain on a different server affect SEO?
I'm working with a company that has a hard-coded website on the root domain, and then a WordPress blog on a subdomain on a separate server. We're planning on implementing a hub and spoke model for their content, hosting the main hubs on the root domain and the linked articles on the blog. Is having the blog on a different server going to hinder our SEO efforts?
Technical SEO | | KaraParlin0 -
Do URLs with canonical tags get indexed by Google?
Hi, we re-branded and launched a new website in February 2016. In June we saw a steep drop in the number of URLs indexed, and there have continued to be smaller dips since. We started an account with Moz and found several thousand high priority crawl errors for duplicate pages and have since fixed those with canonical tags. However, we are still seeing the number of URLs indexed drop. Do URLs with canonical tags get indexed by Google? I can't seem to find a definitive answer on this. A good portion of our URLs have canonical tags because they are just events with different dates, but otherwise the content of the page is the same.
Technical SEO | | zasite0 -
My Homepage Won't Load if Javascript is Disabled. Is this an SEO/Indexation issue?
Hi everyone, I'm working with a client who recently had their site redesigned. I'm just going through to do an initial audit to make sure everything looks good. Part of my initial indexation audit goes through questions about how the site functions when you disable, javascript, cookies, and/or css. I use the Web Developer extension for Chrome to do this. I know, more recently, people have said that content loaded by Javascript will be indexed. I just want to make sure it's not hurting my clients SEO. http://americasinstantsigns.com/ Is it as simple as looking at Google's Cached URL? The URL is definitely being indexed and when looking at the text-only version everything appears to be in order. This may be an outdated question, but I just want to be sure! Thank you so much!
Technical SEO | | ccox10 -
Staging site and "live" site have both been indexed by Google
While creating a site we forgot to password protect the staging site while it was being built. Now that the site has been moved to the new domain, it has come to my attention that both the staging site (site.staging.com) and the "live" site (site.com) are both being indexed. What is the best way to solve this problem? I was thinking about adding a 301 redirect from the staging site to the live site via HTACCESS. Any recommendations?
Technical SEO | | melen0 -
URLs in Greek, Greeklish or English? What is the best way to get great ranking?
Hello all, I am Greek and I have a quite strange question for you. Greek characters are generally recognized as special characters and need to have UTF-8 encoding. The question is about the URLs of Greek websites. According the advice of Google webmasters blog we should never put the raw greek characters into the URL of a link. We always should use the encoded version if we decide to have Greek characters and encode them or just use latin characters in the URL. Having Greek characters un-encoded could likely cause technical difficulties with some services, e.g. search engines or other url-processing web pages. To give you an example let's look at A) http://el.wikipedia.org/wiki/%CE%95%CE%BB%CE%B2%CE%B5%CF%84%CE%AF%CE%B1which is the URL with the encoded Greek characters and it shows up in the browser asB) http://el.wikipedia.org/wiki/Ελβετία The problem with A is that everytime we need to copy the URL and paste it somewhere (in an email, in a social bookmark site, social media site etc) the URL appears like the A, plenty of strange characters and %. This link sometimes may cause broken link issues especially when we try to submit it in social networks and social bookmarks. On the other hand, googlebot reads that url but I am wondering if there is an advantage for the websites who keep the encoded URLs or not (in compairison to the sites who use Greeklish in the URLs)! So the question is: For the SEO issues, is it better to use Greek characters (encoded like this one http://el.wikipedia.org/wiki/%CE%95%CE%BB%CE%B2%CE%B5%CF%84%CE%AF%CE%B1) in the URLs or would it be better to use just Greeklish (for example http://el.wikipedia.org/wiki/Elvetia ? Thank you very much for your help! Regards, Lenia
Technical SEO | | tevag0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Redirecting blog.<mydomain>.com to www.<mydomain>.com\blog</mydomain></mydomain>
This is more of a technical question than pure SEO per se, but I am guessing that some folks here may have covered this and so I would appreciate any questions. I am moving from a WordPress.com-based blog (hosted on WordPress) to a WordPress installation on my own server (as suggested by folks in another thread here). As part of this I want to move from the format blog.<mydomain>.com to www.mydomain.com\blog. I have installed WordPress on my server and have imported posts from the hosted site to my own server. How should I manage the transition from first format to the second? I have a bunch of links on Facebook, etc that refer to URLs of the blog..com format so it's important that I redirect.</mydomain> I am running DotNetNuke/WordPress on my own IIS/ASP.Net servers. Thanks. Mark
Technical SEO | | MarkWill0 -
/index.php in sitemap? take it out?
Hi Everyone, The following was automatically generated at xml-sitemaps.com Should I get rid of the index.php url from my sitemap? If so, how do I go about redirecting it in my htaccess ? <url><loc>http://www.mydomain.ca/</loc></url>
Technical SEO | | RogersSEO
<url><loc>http://www.mydomain.ca/index.php</loc></url> thank you in advance, Martin0