Academy
Protecting staging environments in short

Prevent embarrassing situations by making sure you properly protect your staging environments.

The best way to go about this is using HTTP authentication.

Is your staging environment already indexed by search engines? No sweat. With this guide, you’ll learn how to quickly and effectively reverse that.

We see it happen all the time: staging or development environments containing websites that are works in progress, left available for all the world to see. And often they’re getting indexed by search engines too!

Don’t believe me?

Check out this query – inspired by Peter Nikolow’s tweet (follow him on Twitter, he’s both funny and smart!).

Why having accessible staging environments is a bad thing

Or rather, a doubly-bad thing: bad from the viewpoints of both business and SEO.

The business viewpoint

Do you want others to see your “lorem ipsum” content and laugh, or even—god forbid—read about a huge announcement such as an acquisition or rebranding that should have been kept secret until the new site was launched?

It’s unprofessional, and above all it’s not very smart. It’s even a sales tactic for certain agencies: they look for other agencies that are making these mistakes and then pitch to their clients, leveraging the embarrassing situation.

Dawn Anderson
Dawn Anderson

(Indexed) staging URLs allow competitors to see future plans for development. They only need to follow a discovered development URL to access the whole staging site meaning your future web strategy / design is revealed. A simple search in Google for site:staging.* or site:test.* or any number of other common names which dev teams use as subdomains for their staging environments spits out a whole plethora of even well known brands staging environments which have not protected their environment sufficiently.

Plus, since staging environments are often - at best - a half-built product, when visitors stumble across them the user experience will be terrible and isn’t likely to leave a good impression.

The SEO viewpoint

Besides the embarrassment, having a staging environment that’s indexed by search engines can lead to duplicate content issues when the staging environment and the production environment are highly similar.

Having accessible and indexed staging environments is totally unnecessary, because it’s easy to prevent. In this article we’ll show you how to do it, what methods you can use, and what to do if your staging environment is already indexed.

From here forward, when we refer to staging environments, we may be referring to both development and staging environments.

When checking for duplicate content as part of a site audit, I regularly uncover staging sites that haven’t been properly protected. Sometimes these are outdated staging sites from previous versions of a site, but once in a while I uncover unlaunched development sites that contain sensitive information that’s not yet ready to be published. Unprotected staging environments aren’t just SEO issues - they’re business risks and should be handled with the appropriate care.

Kerstin Reichert
Kerstin Reichert

I have seen plenty of staging sites indexed. I think one of the issues is that indexing and crawling are not fully understood and that the business impact is often ignored.

Make sure to always password protect staging BEFORE you do anything else. Once indexed it is a real pain to remove those URLs in a timely manner. It might seem to be a good solution, but in the case of fully protecting your staging environment from bots and humans alike canonicals do not work, noindex doesn’t work and neither does robots.txt as there are other ways humans, Google (and other search engines) can find your content.

Apart from the technical issues think about the business impact of products and services being discovered before launch or any other sensitive information. This could have very serious consequences.

If you do find your pages indexed think of an efficient way to remove them. Create lists to remove them in bulk and make sure search engines can “see” the noindex, so don’t block crawlers through robots.txt.

Sam McRoberts
Sam McRoberts

I routinely run into two problems on this front: self-hosted staging sites (such as WPEngine that automatically generates staging.domain.com), and staging sites on web developer’s own domains (e.g. client.developer.com and developer.com/client/).

In the first scenario, you run the risk of having indexable duplicate content on your own domain, and just blocking a domain via robots.txt doesn’t guarantee it won’t get indexed, especially if you or others link to it (which is common, since links to dev site resources often find their way into the site code and get missed at launch).

In the second scenario, with a mirror of your site on a developer’s domain, you run the risk of cross-domain duplicate content, and since the content was up on the developer’s domain first, there’s a risk Google may see you as the duplicator and them as the original source (not a good situation).

Always, always block bots from reaching your staging environments. Not just with a robots.txt file, but with something more secure (login/password, IP restricted access or both).

Second, always make sure your developers check all the code on your site at launch and remove any links to the staging environment. You want your internal links pointing to the live site, not somewhere else.

If you do both, you’ll be in a good spot!

What are development and staging environments?

When you’re working on a completely new website or new functionality, you don’t just do that on your live website (often called your “production environment”), because websites are easy to break. It’s a best practice to work with different, separate environments where you develop and test new functionality.

So what different environments are there besides a production environment?

  • Development environment: this is where developers initially test their code. Often they do this on their local machines, so if that’s the case, there isn’t any danger at all of this environment being accessible to others and getting indexed by search engines. If it’s not kept locally, but instead for instance on a dev.example.com subdomain, there is a risk of it being accessible to others and being indexed.
  • Staging environment (often also called the “test environment”): this is where releases are staged and new functionality is tested before a release. New content is published here so it can be checked to ensure it looks as intended. Staging environments often aren’t run locally: different team members need to be able to easily access it, and so it usually runs on a subdomain or a separate domain.

Pro-tip: since you’re reading up on protecting staging environments, it’s likely you’re going to be doing a website migration soon. To migrate flawlessly, check our website migration guide to make sure you aren’t forgetting any crucial steps in the migration process!

Security through obscurity is not a feasible strategy

Not telling anyone about your “secret” staging environment is a case of “security through obscurity”. It’s not a feasible strategy to use. Especially not as the only layer of protection.

What if someone accidentally publishes a link to the staging environment? Or pushes some code to production that accidentally includes canonical or hreflang references to the staging environment?

Not only does this create issues in your production environment, it also leads to search engines picking up your staging environment’s scent. And they will queue it up for crawling unless you make it impossible for them to access the staging environment, or give them rules of engagement to follow.

Katherine Watier Ong
Katherine Watier Ong

I was hired by a large enterprise site that had lost lots of traffic and rankings. It turns out that they decided to launch their beta site without HTTP authentication so that their stakeholders could look at the site for six months prior to completing their redesign.

Unfortunately, the beta site started to accumulate backlinks - over 5,000 of them from major papers. The use of the beta site also caused the main domain to lose 50,000 visitors per month in traffic when it was “relaunched”, partially due to the staging server (and other issues). The organization winded up migrating any unique content to the main domain and 301 redirecting everything. After about a year, the site is almost up to its previous traffic levels.

How to protect your development and staging environments

Now it’s clear why you need to protect your development and staging environments. But how do you do it? There are multiple ways to go about this, but which one is best?

We’ll discuss the pros and cons of every method, taking into account:

  • User-friendliness: the degree to which the method doesn’t add extra inconvenience.
  • Third-party access: the degree to which the method prevents third parties from accessing an environment.
  • SEO-friendliness: the degree to which the method keeps search engines from indexing an environment.
  • Monitoring-friendliness: the degree to which the method lets you monitor the protected environments for SEO purposes.
  • Risk of human error: the degree to which the method may lead to human errors, impacting SEO.
Method User-friendliness Third-party access SEO-friendliness Monitoring-friendliness Risk of human error
HTTP auth no no no no no
VPN no no no no no
Robots directives no no no no no
Robots.txt no no no no no
Canonical links no no no no no
Whitelisting specific user-agents no warning no no no

Method 1: HTTP authentication - your best choice 🏆

The best way to prevent both users and search engines from gaining access to your development and staging environments is to use HTTP Authentication. Be sure, meanwhile, to implement it using HTTPS, because you don’t want usernames and passwords to travel in plaintext over the wire.

We recommend whitelisting the IP addresses at your office, and providing external parties and remote team members access via a username/password combination.

This way search engines can’t access anything, and you have total control over who can see what. You can prepare your staging environment with the same robots.txt that you’ll be using on the production environment, as well as the correct robots directives and canonicals. This lets you gain a representative picture of your staging environment when you’re monitoring it for issues and changes prior to launching.

Another benefit of this is that it’s not prone to developers forgetting to publish the right robots.txt, robots directives, and canonicals on the production environment.

This is a much better approach than using robots.txt and/or robots noindex directives and canonical links, because those don’t prevent other people from accessing them, and search engines will not always honor such directives.

What’s more, when using HTTP authentication it’s still possible to use Google’s testing tools such as AMP, Mobile-friendliness, and Structured Data Testing Tool. Just set up a tunnel.

How do I set up HTTP authentication?

Below you’ll find some resources on how to set up HTTP Authentication on Apache, nginx, and IIS:

Dean Cruddace
Dean Cruddace
Cultured Digital

Password protect your staging environment, that’s it. Nothing fancy, just password protect it. Google can’t see the other side of the password protection (unless you set up a tunnel for testing purposes). Even when using a robots.txt Disallow: / and meta robots noindex, you may still end up with pages getting indexed if there are enough links and/or other signals pointing to it.

Many SEO tools including ContentKing are already setup to handle HTTP authentication for crawling and testing this kind of setup. You may want to crawl a bit slower than you normally do as staging setups don’t usually have as many resources allocated or have caching setup so you need to be more careful than normal about the load you put on the system.

Tomek Rudzki
Tomek Rudzki

Staging environments should always be well protected. Users shouldn’t be able to view it; your shiny new staging version can have a lot of bugs, which may diminish the users’ trust. Also, it may be vulnerable to hacker attacks. Additionally, you don’t want Google to index it (in order to avoid duplicate content issues).

But how do you test a staging website using Google’s tools? What if you want to use the URL Inspection tool to ensure that Google can properly render your website? And what if you want to test your Schema.org implementation using the Structured Data Testing tool?

You have two options:

  • You can set up a tunnel (as mentioned above).
  • You can use a solution proposed by David Sottimano. You can whitelist Google’s IP addresses (66.x) and thus, allow them to access it. Because this can be tricky, you need to limit the period for which you’ve whitelisted Google’s IP addresses and for the entire duration of the whitelisting period you need to make sure your staging version is non-indexable!

Method 2: VPN access

VPN stands for “virtual private network.” You basically connect your local machine so as to become part of the company network. And now that you’re part of the company network, you can access the staging environment. Anyone who’s not part of the network cannot access it. This means that neither third parties nor search engines can access the staging environment.

Having access through a VPN offers most of the benefits of HTTP authentication. However, there’s one big drawback: SEO monitoring solutions that aren’t running locally may not work out of the box, or at all. Not being able to track your development team’s progress is troublesome, and it becomes truly problematic when you’re dealing with truly big websites.

Method 3: Robots directives

Robots directives are used to communicate preferences surrounding crawling and indexing. You can for instance ask search engines not to index certain pages, and not to follow (certain) links.

You can define robots directives in a page’s HTTP header (X-Robots-Tag header), or via the meta robots directive in its <head> section. Because you’ll have other content types besides just pages on your staging environment, it’s recommended that you use the X-Robots-Tag header to make sure that PDF files, for instance, don’t get indexed.

Robots directives, like the name implies, are meant for robots (“crawlers”). They don’t prevent 3rd party access. They do send a moderately strong signal to search engines not to index pages. I say “moderately” because search engines can still decide to ignore the robots directives and index your pages. It’s also not a monitoring-friendly solution, as—similar to robots.txt—it may lead to false positives being reported by SEO tools.

On top of that, there’s a huge risk of human error, as staging robots directives are often accidentally carried over into the production environment.

Method 4: Robots.txt

The robots.txt file states the rules of engagement for crawlers, so by using robots.txt, you can ask search engines to keep out of your staging environment. The most common way to do this is by including the following contents in robots.txt:

User-agent: *
Disallow: /

This prevents search engine crawlers from crawling the site, but they may still index it if they find links to it, leading to listings like these:

Google description not available robots.txt

Some people include the unofficial Noindex directive in their robots.txt. We don’t recommend doing this, as it’s a worse way to prevent your staging environment from being accessible than using a Disallow directive, since it’s really an unofficial directive.

Your robots.txt doesn’t offer any actual protection against third-party access to the site, and it throws off SEO monitoring tools as well, potentially leading to false positives. Plus, you’re creating a huge risk of human error: here once again, the robots.txt from the staging environment is often accidentally carried over into the production environment.

Dawn Anderson
Dawn Anderson

Simply adding a robots.txt to staging environments, or adding the staging environment folder to a live site, isn’t a recommended strategy since you’re effectively giving competitors (or anyone else) a direct path to the environment.

Furthermore, search engines might still index URLs added to a robots.txt file, particularly if a URL is linked to (e.g. when that developer uploads a file to live with a link to the staging environment in it, boosting its popularity). We often see pages indexed which shouldn’t be and the meta-description simply says something along the lines of we have no further information on this since the search engine can’t actually access the page (they only discovered and indexed the URL, based on other signals) to see what’s there.

Bartłomiej Kudyba
Bartłomiej Kudyba

You should always prevent your staging environments from getting indexed. If they do get indexed, it could lead to duplicate content issues, or they could be perceived as thin content. Your “lorem ipsum” could get indexed and you would then have to take measures to undo it.

If you want to hide your staging from search engine crawlers and users, a password protected environment is a must-have.

When you need to show a temporary page to your users, you can secure it from search engine crawlers as well, but remember to think through how you will want to reveal it to users and crawlers when it’s ready.

When we released Onely, it was all about timing it with our keynote event and we ran into some issues regarding the robots.txt rule we used to secure our temporary page. The day after the keynote event, Googlebot still couldn’t crawl our website because the robots.txt file wasn’t refreshed yet. As John Mueller stated in response to this scenario: “We cache the robots.txt for about a day…

Method 5: Canonical links

The canonical link informs search engines of the canonical version of a page. If the staging environment is referencing the production environment, all signals should be consolidated with the production environment.

Otherwise, canonical links resemble robots directives, especially in their downsides:

  • They still let third parties access the staging environment.
  • They’re not a monitoring-friendly solution, as they may lead to false positives being reported by SEO tools.
  • There’s a risk of human error, as canonical directives from staging are sometimes accidentally carried over into the production environment.

Method 6: Whitelisting specific user agents

The whitelisting of specific user agents for access to a staging environment can be used to allow SEO specialists to monitor a staging environment, as long as their SEO tooling supports setting custom user agents. They could create a made-up user agent and use that, while blocking all other user agents (including browsers).

But this isn’t a very user-friendly approach, because manual verification through your browser is made harder. It’s not a very secure approach either: When third parties know you’re working for or at company X, and they’re aware of your user agent (perhaps because they’re a disgruntled customer)—they may be able to gain access to the staging environment.

How can you find out if your staging environment is being indexed?

Here are a few ways to find out if your staging environment is being indexed. Here are the two most common ones:

Option 1: site query

If you know that your staging environment is running on a subdomain, you can try a site query such as: site:example.com -inurl:www

This query returns all the Google-indexed pages for the domain example.com except the ones containing “www”.

Here’s a link to an example query

Option 2: Google Analytics

If you don’t know the URL of your staging environment, you can try checking in Google Analytics:

  • Navigate to Audience > Technology and choose Network.
  • Select Hostname as the Primary Dimension.
  • Look for hostnames that have a different domain, or contain subdomains such as staging, test or dev.

Option 3: Google Search Console

With the consolidation of properties in Google Search Console, it’s now much easier to spot pages that shouldn’t be indexed.

Whether your staging environment is set up a separate domain, a subdomain or subfolder: if you’ve verified the domain you’ll be able to see all pages that are indexed, and all queries that your domain is ranking for. Right in the overviews you’re used to look at;

  • Performance > Queries
  • Performance > Pages
  • Index > Coverage

Our special thanks goes out to Rhea Drysdale and Martijn Oud for mentioning this to us!

Rhea Drysdale
Rhea Drysdale

Setting up Google Search Console for a dev environment will give you insights as to whether or not anything has been indexed and will give you the ability to quickly remove any URLs that accidentally sneak into the index. We then deploy a combination of robots.txt and HTTP Authentication to help keep everything out of the SERPs. URLs with credentials (https://user:[email protected]) could potentially be crawled and indexed when accidentally shared via email and collaborative docs that have loose sharing restrictions.

We need to make sure there are instructions for a crawler to not index anything from the dev server. If HTTP Authentication with a username and password is not a viable option for a client we have found white listing client- and company IP addresses to be very effective. Setting up ContentKing to keep track of the robots.txt file has been a life saver when a robots.txt is accidentally edited or deleted by a developer.

Getting your already indexed staging environment removed from the index

Uh-oh. Your staging environment has already been indexed by search engines, and you’re the one who has to fix it. Well, the good news is: if you follow the steps below, you’re good. And they’re easy.

Step 1: hide search results

Verify the staging environment in webmaster tools such as Google Search Console and Bing Webmaster Tools and URL Removal (see Google’s documentation and Bing’s documentation on this). For Google, this request is often granted within hours (Bing takes a little longer), and then your staging environment won’t show up in any search results. But here’s the catch: it’s still in Google’s and Bing’s indexes; it’s just not shown. In Google’s case, the staging environment is only hidden for 90 days. So within this timeframe, you need to make sure to request removal of your pages from search engines’ indexes in the right way: via the robots noindex directive.

Step 2: applying the noindex directive and getting pages recrawled

Make sure you apply the robots noindex directive on every page in your staging environment. To speed up the process of search engines recrawling these pages, submit an XML sitemap. Now watch your server logs for search engine crawlers’ requests for your previously indexed (and now “noindexed”) pages, to make sure they’ve “gotten the message.”

In most cases, these 90 days are enough time to signal to search engines that they should remove the staging-environment pages from their index. But if they aren’t, just rinse and repeat.

Once it’s all done, protect the staging environment using HTTP authentication to make sure this doesn’t happen again and remove the XML sitemap from Google Search Console and Bing Webmaster Tools.

Suganthan Mohanadasan
Suganthan Mohanadasan

You can add a “noindex” tag to the indexed pages on the staging environment and submit a new sitemap via Google Search Console and Bing Webmaster Tools. The reason I recommend submitting the sitemap is to ensure search engines can see the “noindex tag”. Once the search engines re-crawl the indexed pages on the staging environment, they should start removing these pages from its index. Once that’s done you can get rid of the sitemap. You can go ahead and secure the staging by blocking search engines from re-crawling (and indexing). The best way to do this is to add HTTP authentication.

Other useful resources

Here are some other useful resources on protecting staging environments:

Now keep on learning!

Now that you’ve learned about the best way to protect your staging environment, keep on learning with these articles:

Start your free 14-day trial

Get up and running in 20 seconds

Please enter a valid domain name (www.example.com).
  • No credit card required
  • No installation needed
  • No strings attached