How to Remove URLs from Google Search in a Snap!
While most SEOs focus on achieving the highest rankings, sometimes the exact opposite is needed — removing URLs from Google.
For instance when you're battling outdated or duplicate content, an indexed staging environment, or indexed pages that contain sensitive personal data.
Whatever the situation is, with this guide you'll be able to quickly remove the URLs from Google!
While many SEOs are mainly focused on getting their content indexed quickly by Google, the very opposite – getting it removed quickly – is frequently needed too.
Maybe your entire staging environment got indexed, sensitive content that never should have been accessible to Google got indexed, or spam pages added as a result of your website getting hacked are surfacing in Google – whatever it is, you’ll want those URLs to be removed quickly, right?
In this guide, we’ll explain exactly how to achieve that.
Here are the most common situations where you need to remove URLs from Google quickly:
- You’re dealing with duplicate or outdated content
- Your staging environment has been indexed
- Your site has been hacked and contains spam pages
- Sensitive content that has accidentally been indexed
In this article, we’ll take a detailed look at all of these situations and how to get these URLs removed as soon as possible.
How to remove URLs with duplicate or outdated content
Having duplicate or outdated content on your website is arguably the most common reason for removing URLs from Google.
Most outdated content holds no value for your visitors, but it can still hold value from an SEO point of view. Meanwhile, duplicate content can significantly hurt your SEO performance, as Google could be confused about what URL to index, and rank.
Let ContentKing keep an eye out for duplicate and outdated content on your sites.
The particular actions you need to take to remove these URLs from Google depends on the context of the pages you want to get removed, as we'll explain below.
When content needs to remain accessible to visitors
Sometimes URLs need to remain accessible to visitors, but you don’t want Google to index them, because they could actually hurt your SEO. This applies to duplicate content for instance.
Let's take an example:
You run an online store, and you’re offering t-shirts that are exactly the same except for their different colors and sizes. The product pages don’t have unique product descriptions; they just each have a different name and image.
In this case, Google may consider the content of their product pages to be near-duplicate.
Having near-duplicate pages leads to Google both having to decide which URL to choose as the canonical one to index and spending your precious crawl budget on pages that don’t add any SEO value.
In this situation, you have to signal to Google which URLs need to be indexed, and which need to be removed from the index. Your best course of action for a URL depends on these factors:
- The URL has value: if the URL is receiving organic traffic and/or incoming links from other sites, you should canonicalize them to this preferred URL that you want to have indexed. Google will then assign its value to the preferred URL, while the other URLs still remain accessible to your visitors.
The URL has no value: if the URL isn’t receiving organic traffic and doesn’t have incoming links from other sites, just implement the
noindexrobots tag. This sends Google a clear message not to index the URL, resulting in them not showing it on the search engine results pages (SERP). It's important to understand that in this case, Google won’t consolidate any value.
Having lots of low quality, thin or duplicate content can negatively impact your SEO efforts. If you have duplicate content issues, you don't necessarily need to remove the offending pages, you can canonicalise these pages instead if they're needed for other reasons. You could also merge the duplicated pages to create a stronger, more high quality piece of content. I recently purged content on a website and saw a 32% increase in organic traffic for the entire website.
If you want to avoid duplicate content issues on product variants, it's essential to build a solid SEO strategy and be ready to adapt if you see the need for change.
Suppose your catalog consists solely of simple (child) products where each product represents a specific variation. In that case, you will surely want to index them all, even though the differences between product variations aren't significant. Still, you will need to closely monitor their performance and, if any duplicate content issues emerge, introduce parent products to your online store. Once you start showing parent products on the frontend, you need to adjust your indexing strategy.
When you have both parent and child products visible on the frontend as separate items, I strongly suggest implementing the same
rel canonicalon all products to avoid duplicate content issues. In these circumstances, the preferred version should be a parent product that serves as a collection of all product variants. This change will not only improve your store's SEO, but it will also give a significant boost to its UX performance since your customers will be able to find their desired product variant more easily.
All of his, of course, refers only to products with the same or very similar content. If you have unique content on all product pages, each page should have a self-referencing canonical URL.
When content shouldn’t remain accessible to visitors
If there's outdated content on your website that no one should see, there are two possible ways to handle it, depending on the context of the URLs:
- If the URLs have traffic and/or links: implement 301 redirects to the most relevant URLs on your website. Avoid redirecting to irrelevant URLs, as Google might consider these to be soft-404 errors. This would lead to Google not assigning any value to the redirect target.
- If the URLs don’t have any traffic and/or links: return the HTTP 410 status code, telling Google that the URLs were permanently removed. Google's usually very quick to remove the URLs from its index when you use the 410 status code.
Once you've implemented the redirects, still submit the old sitemap to Google Search Console as well as the new one and leave it there for 3-4 months. This way, Google will pick up the redirects quickly and the new URLs will begin to show in the SERPs.
Remove cached URLs with Google Search Console
Google usually keeps a cached copy of your pages which they make take quite long to update, or to remove. If you want to prevent visitors from seeing the cached copy of the page, use the "Clear Cache URL" feature in Google Search Console.
Please note that you can instruct Google not to keep cached copies of your pages by using the noarchive meta robots tag.
How to remove staging environment URLs
Staging and acceptance environments are used for testing releases and approving them. These environments are not meant to be accessible and indexable for search engines, but they often mistakenly are – and then you end up with staging-environment URLs (“staging URLs” from here on out) that have been indexed by Google.
It happens, live and learn.
In this section we’ll explain how to quickly and effectively get those pesky staging URLs out of Google!
When staging URLs aren’t outranking production URLs
In most cases, your staging URLs won't outrank production URLs. If this is the case for you too, just follow the steps to remedy this issue. Otherwise, skip to the next section.
When staging URLs are outranking production URLs
If your staging URLs are outranking your production URLs, you need to ensure Google assigns the staging URLs’ signals to the production URLs, while at the same time also ensuring that visitors don’t end up on the staging URLs.
When deploying website changes to live, talk to your developers about making sure the process is 100% bulletproof. There are some “SEO bits” that could easily harm your website’s progress if not managed correctly. These involve:
- Robots.txt file.
- Web server config files like
- Files you use for the meta tag deployment process (to protect your staging environment from getting indexed and live website from getting de-indexed).
- JS files which are involved in content and DOM rendering.
I’ve seen healthy websites drop in Google’s SERPs just because, during the deployment to live process, the robots.txt file was overwritten by the staging version with
Disallow: /directive or the other way around: the indexing flood gates were opened because important directives were removed.
How to remove spam URLs
If your website has been hacked, and it contains a ton of spam URLs, you should get rid of them as quickly as possible so that they don’t (further) hurt your SEO performance, and your trustworthiness in the eyes of your visitors.
Follow the steps below to quickly reverse the damage.
Step 1: Use Google Search Console's Removals Tool
Google’s Removals Tool helps you quickly remove spammy pages from Google SERPs. And again, keep in mind that this tool doesn’t deindex the pages – it only temporarily hides them.
Step 2: Remove the spam URLs and serve a 410
Restore your website’s previous state by restoring a backup. Run updates, and then add additional security measures to ensure your site isn’t vulnerable anymore. Then check whether all the spam URLs are gone from your website. It’s best to return a 410 HTTP status code when they are requested, to make it abundantly clear that these URLs are gone and will never return.
Step 3: Create an additional XML sitemap
Include the spam URLs in a separate XML sitemap and submit it to Google Search Console. This way, Google can quickly “eat through” the spam URLs, and you can easily monitor the removal process via Google Search Console.
Spam URLs can seriously hurt your SEO performance. Let ContentKing alert you about any suspicious growth of pages on your website before it’s too late.
How to remove URLs with sensitive content
If you collect sensitive data, such as customer details or resumes from job applicants, on your website it’s vital to keep them safe. Under no circumstances should this data get indexed by Google – or any other search engine for that matter.
However, mistakes are made, and sensitive content can find its way into Google’s search results. No sweat though: we’ll explain how to get this content removed from Google quickly.
Step 1: Use Google Search Console URL’s Removal Tool
Hiding URLs with sensitive content through the GSC’s removal tool is the fastest way to get Google to stop showing them in its SERPs. However, keep in mind that the tool merely hides the submitted pages for 180 days; it doesn’t remove them from Google's index.
Step 2: Remove the content and serve a 410
If you don’t need to have the sensitive content on your website anymore, you can delete the URLs and return the 410 HTTP status code. That tells Google the URLs have been permanently removed.
Step 3: Use an additional XML sitemap
To control and monitor the process of removing URLs with sensitive content, add them to a separate XML sitemap, and submit it in Google Search Console.
Step 4: Prevent sensitive-data leaks from happening
To prevent sensitive content from getting indexed and leaked again, take appropriate security measures to keep this from happening.
Don’t forget about your non-HTML files!
If you apply a noindex tag to your pages, Google can sometimes still find assets and attachments that we don't want to be discoverable, such as PDFs and images - to make sure these are not found, you’ll need to use the
x-robots noindex taginstead. However, there is a challenge with using robots headers which is testing and monitoring for them. Thankfully ContentKing makes this easy!
How to remove content that's not on your site
If you're finding that other websites are using your content, here are several ways to remove it from Google.
Reach out to the website owner
The first thing you should do is to reach out to the people running the website. In a lot of these cases, "the intern" mistakenkly copied your content and they'll take swift action. You can offer them to point a cross-domain canonical to your content along with a link, ask them to 301 redirect it to your own URL or to just remove it altogether.
What if the website’s owners aren’t responding or refuse to take any action?
If the website’s owners aren’t cooperative, you have a few ways to ask Google to remove it:
- For (opens in a new tab), you can use this (opens in a new tab).
- For legal violations, you can (opens in a new tab) filed under applicable law.
- If you have found content violating your copyright, you can (opens in a new tab).
How to remove images from Google Search
While it's not recommended to use the robots.txt file to remove indexed pages from Google Search, Google does recommend using it to remove indexed images.
We know this sounds confusing, and thanks for (opens in a new tab) for bringing this up!
Google's documentation isn't very clear on this, because if you look at the (opens in a new tab) you'll find in the section where they're also talking about both HTML and non-HTML files the line
Do not use robots.txt as a blocking mechanism.:
At the same time, their (opens in a new tab) says:
So, how do you go about removing these images?
Say some images in the folder
/images/secret/ have been accidentally indexed. Here's how to remove them:
The next time Googlebot downloads your robots.txt, they'll see the Disallow directive for the images and remove the images from its index.
It is not possible to have a noindex meta tag on an image. We could use the X-Robots response header to specify noindex, however Google recommends instead that we rely on the Removals tool or block the problematic image URL with robots.txt.
Luckily, this is the one time a disallow in robots.txt will work to remove URLs from the index - and is recommended by Google for non-emergency image removal.
We can exclude the images from just Google Image Search by specifying the user agent Googlebot-Image, or from all Google searches by specifying Googlebot.
There are plenty of situations in which you'll want to remove URLs from Google quickly.
Keep in mind there's no "one size fits all" approach to this, as each situation requires a different approach. And if you're been reading between the lines, you'll have noticed that most of the situations in which you need to remove URLs can actually be prevented.
Forewarned is forearmed!