What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?

fthead9

Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:

staging.domain.com
User-agent: *
Disallow: /

in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.

KeriMorgret

Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.

RyanKent

I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.

fthead9

Matt/Ryan-

Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.

I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example

bloggidy

.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.

bloggidy

Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.

I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.

You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.

RyanKent

Correct.

RyanKent

Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.

The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.

RyanKent

Hi Matt.

Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.

fthead9

Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.

fthead9

Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.

http://staging.domin.com/robots.txt

User-agent: *
Disallow: /

That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?

bloggidy

Use an .htaccess file to only allow from certain ip addresses or ranges.

Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm

RyanKent

What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?

Place a robots.txt file in the root of the subdomain.

User-agent: *
Disallow: /

This method will block the subdomain while leaving your primary domain unaffected.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?

Browse Questions

Explore more categories

Related Questions

Is there a way to get a list of all pages of your website that are indexed in Google?

Google is still indexing the old domain a year after 301 redirects are put in place

Is it better to use XXX.com or XXX.com/index.html as canonical page

Google indexing despite robots.txt block

How to block text on a page to be indexed?

Disallow: /404/ - Best Practice?

Getting a video displaying a lightbox indexed

OK to block /js/ folder using robots.txt?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved