Meta Robots Tag Guide

June 8, 2021

Meta robots tags are an essential tool to improve search engine’s crawling and indexing behavior, and to control your snippets in the SERP.

In this article we'll explain how to do this, how interpretation and support differs per search engines and how the meta robots tag relates to the X-Robots-Tag and robots.txt file.

What is the meta robots tag?

The meta robots tag gives site owners power over search engines’ crawling and indexing behavior and how their snippets are served in search engine result pages (SERPs).

The meta robots tag goes into the <head> section of your HTML and is just one of the meta tags that live there.

Arguably the most well-known meta robots tag is the one telling search engines not to index a page:

<meta name="robots" content="noindex,follow" />

You can provide the same instructions by including them in the HTTP header using the X-Robots-Tag. The X-Robots-Tag is often used to prevent non-HTML content such as PDF and images from being indexed.

Meta robots directives

We prefer to talk about meta robots directives instead of meta robots tags, because calling them “meta robots tags” is incorrect (see “anatomy of the meta element” below).

Meta robots directives are not to be confused with robots.txt directives. These are two different ways of communicating with search engines about different aspects of their crawling and indexing behavior. But they do influence one another, as we’ll see further down the article.

Anatomy of the meta element

A visual of the anatomy of the meta robots tag element

Let’s use the meta robots directive example mentioned above to explain what’s what:

  • The entire code snippet is called the meta element.
  • The <meta and /> are the opening and closing tags.
  • There’s an attribute called name with the value robots. robots applies to all crawlers but can be replaced with a specific user-agent.
  • And then there’s an attribute called content with the value noindex,follow. noindex,follow can be replaced with other directives.

Why is it important?

Firstly, meta robots directives give you much-needed control over search engines’ crawling and indexing behavior. Without any direction, search engines will try to crawl and index all the content they come across. That’s their default behavior.

Secondly, search engines will generate a snippet for your URLs when they rank them. They take your meta description as input, though they will often instead come up with their own snippet—based on your page’s content—if they think it’ll perform better.

Audit your meta robots directives

Quickly find out if your meta robots directives are set up right!

Now, let’s look at a few applications of the meta robots directives in protecting your SEO performance:

  • Prevent a duplicate content issue by applying the meta robots noindex directive to PPC landing pages and on-site search result pages. Note that robots directives will not pass on any authority and relevancy like the canonical URL would.
  • Prevent search engines from indexing content that should never be indexed because you’re providing discounts or some other offer that you don’t want to be available to the entire world.
  • Remove sensitive content that has been indexed: if search engines have indexed content they should never have indexed in the first place, apply the meta robots noindex directive to remove the content from their indices. You can use the same technique when fixing crawler traps.
  • Selectively apply the meta robots noindex directive to discontinued products to keep providing users with a good user experience.

We deliberately have not listed making sure staging environments don’t get indexed by search engines as a use case.

That's because using HTTP Auth is a much better solution, as it prevents access from both users and search engines and keeps the meta robots noindex directive from carrying over to the production site.

Meta robots syntax explained

Before we dig in deeper, let’s cover some of the basics:

The syntax is not case sensitive

Meta robots directives are not case sensitive, meaning the examples below are all valid:

<meta name="robots" content="noindex,follow" />
<meta name="ROBOTS" content="noindex,follow" />
<meta name="robots" content="NOINDEX,FOLLOW" />

Separating directives with commas for Google

For Google, you need to separate directives with a comma. A space doesn’t cut it:

<meta name="robots" content="noindex follow" />

Spaces after commas not required

You’re not required to use spaces after commas between directives. So, the examples below are both valid:

<meta name="robots" content="noindex,follow" />
<meta name="robots" content="noindex, follow" />

Now, let’s move on to the directives themselves!

Meta robots directives in detail

In this section we’ll cover the most common meta directives you’ll come across in the wild and what exactly they mean. We'll focus primarily on directives support from Google, as they are the dominant search engine.

Here are the directives we'll cover:

Meta robots “all”

By default, search engines will crawl and index any content they come across, unless specified otherwise. If you want to explicitly define that this is allowed, you can do so with the following directive:

<meta name="robots" content="all" />

Meta robots “index”

While not necessary because it’s default behavior, if you want to make it explicit to search engines that they are allowed to index a page, you can do so with the meta robots directive below.

<meta name="robots" content="index" />

Meta robots “index,follow”

Oftentimes the index directive is combined with the follow directive, leading to:

<meta name="robots" content="index,follow" />

These directives essentially mean the same thing as the one above that only states index, since follow is default search engine behavior as well.

Meta robots “noindex”

The meta robots noindex directive tells search engines not to index a page. Here’s what the meta robots noindex directive looks like:

<meta name="robots" content="noindex" />

The example above tells search engines they shouldn’t index the page, but they should feel free to follow all its links, because it’s not explicitly stated they shouldn’t.

The noindex directive carries a lot of weight, so when search engines find it, they are quick to remove content from their index. The other side of the coin is that it’s tough to get this content re-indexed when for example you’ve accidentally applied the noindex directive.

Be alerted of rogue noindex directives

Be alerted immediately about rogue noindex directives to prevent SEO disasters!

Meta robots “noindex,follow”

You’ll frequently find meta robots noindex being combined with the follow directive. It tells search engines not to index the page—but that it’s fine to follow the links:

<meta name="robots" content="noindex,follow" />

At the risk of sounding like a broken record, <meta name="robots" content="noindex" /> and <meta name="robots" content="noindex,follow" /> mean the same thing, since follow is default search engine crawler behavior.

Meta robots “noindex,nofollow”

You can also combine the meta robots noindex directive with a nofollow meta directive (not to be confused with the nofollow link attribute):

<meta name="robots" content="noindex,nofollow" />

The noindex,nofollow combination tells search engines not to index the page and not to follow the links on the page, meaning no link authority should be passed on either.

Search engines significantly decrease re-crawling of a noindexed page, essentially leading to a noindex, nofollow situation, because links on a page that doesn’t get crawled are not followed.

Meta robots “none”

The meta robots none directive is actually a shortcut for noindex,nofollow, which we covered just above. Here’s what the meta robots none directive looks like:

<meta name="robots" content="none" />

It’s not used very often, and folks often think it means the exact opposite: index,follow.

So be careful with this one!

Meta robots “noarchive”

The meta robots noarchive directive prevents search engines from presenting a cached version of a page in the SERP. If you don’t specify the noarchive directive, search engines may just go ahead and serve a cached version of the page. So again, this is an opt-out directive.

Here’s what the noarchive directive looks like:

<meta name="robots" content="noarchive" />

It’s frequently combined with other directives though. For example, you’ll commonly see it used together with the noindex and nofollow directives:

<meta name="robots" content="noindex,nofollow,noarchive" />

This means search engines shouldn’t index the page, shouldn’t follow any of its links and shouldn’t cache the page either.

Meta robots “nosnippet”

The meta robots nosnippet directive tells search engines not to show a text snippet (usually drawn from the meta description) or video preview for the page.

Here’s what the nosnippet directive looks like:

<meta name="robots" content="nosnippet" />

If we were to apply the meta robots nosnippet directive to our redirects article, the snippet would then look like this:

Screenshot of a SERP snippet where the nosnippet directive has been used

Search engines may still show an image thumbnail if they think this results in a better user experience. For Google, this applies to regular Web Search, Google Images, and Google Discover. The nosnippet directive also functions as a noarchive directive.

If the nosnippet directive is not included, Google will generate a text snippet and video preview on its own.

On Google specifically, you can prevent some of your page’s content from showing up in a snippet by using the data-nosnippet HTML attribute. While it’s not a meta robots directive, it is closely related to them, so we need to touch on this here.

The data-nosnippet HTML attribute can be used on span, div, and section elements. Here’s an example:

<p>This may be shown in a snippet, while 
<span data-nosnippet>this will not be shown in a snippet</span>.</p>

Learn more about the data-nosnippet attribute here (opens in a new tab).

Meta robots “max-snippet”

The meta robots max-snippet directive tells search engines to limit the page’s snippet (generally drawn from the page’s meta description) to a specified number of characters.

Here's an example where the snippet will have a maximum length of 50 characters:

<meta name="robots" content="max-snippet:50" />

Meta robots “max-snippet:0”

When you specify max-snippet:0, you’re telling search engines not to show a snippet—essentially the same as the meta robots nosnippet directive we just described above:

<meta name="robots" content="max-snippet:0" />

Meta robots “max-snippet:-1”

When you specify max-snippet:-1, you’re explicitly telling search engines they can determine the snippet’s length themselves, which is their default behavior:

<meta name="robots" content="max-snippet:-1" />

Since October 2019, sites classified as “European press publications” are shown in Google without a snippet by default. You can opt-in to having your snippets shown by using the max-snippet and max-image-preview directives, or you can choose to remove your site from the list of European press publications via Google Search Console (opens in a new tab). Learn more about this here (opens in a new tab).

If you’re using the Yoast SEO plugin on WordPress, you’ll find that it automatically opts in when you include the following snippet: <meta name="robots" content="index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1" />

Less-important meta robots directives

Now we’ve arrived at the less important meta robots directives, which we’ll only touch on briefly.

What goes for the other meta robots directives goes for these too: if they aren’t defined, search engines will do as they please.

Here’s what the directives signal to search engines:

  • unavailable_after: "remove a page from your index after a specific date". The date should be specified in a widely adopted format, such as for example ISO 8601 (opens in a new tab). The directive is ignored if no valid date/time is specified. By default there is no expiration date for content. It’s basically a timed noindex directive, so be careful when using it.
  • noimageindex: "don’t index the images on this page".
  • max-image-preview: "define a maximum size for the image preview for a page, with possible values: none, standard and large".
  • max-video-preview: "define a maximum for the preview length of videos on the page".
  • notranslate: "don't offer a translated version of the page in your search results".

How can you combine meta robots directives?

In addition to being able to combine directives, you can also provide directives to different crawlers. Each crawler will use the sum of the directives provided to them, that is: they stack.

To illustrate how, let’s look at an example:

<meta name="robots" content="nofollow" />
<meta name="googlebot" content="noindex" />

These directives are interpreted as follows:

  • Google: noindex,nofollow
  • Other search engines: nofollow

How do search engines interpret conflicting directives?

As you can imagine, when you start stacking directives, it’s easy to mess up. If a scenario presents itself where there are conflicting directives, Google will default to the most restrictive one.

Take for example the following directives:

<meta name="robots" content="index" />
<meta name="googlebot" content="noindex" />

Verdict: Google will err on the side of caution and not index the page.

But, the way conflicting directives are interpreted can differ among search engines. Let’s take another example:

<meta name="robots" content="index" />
<meta name="robots" content="noindex" />

Google will not index this page, but Yandex will do the exact opposite and index it.

So keep this in mind, and make sure that your robots directives work right for the search engines that are important to you.

X-Robots-Tag—the HTTP header equivalent

Non-HTML files such as images and PDF files don’t have an HTML source that you can include a meta robots directive in. If you want to signal your crawling and indexing preferences to search engines for these files, your best bet is to use the X-Robots-Tag HTTP header.

Let’s briefly touch on HTTP headers.

When a visitor or search engine requests a page from a web server, and the page exists, the web server typically responds with three things:

  1. HTTP Status Code: the three-digit response to the client’s request (e.g. 200 OK).
  2. HTTP Headers: headers containing for example the content-type that was returned and instructions on how long the client should cache the response.
  3. HTTP Body: the body (e.g. HTML, CSS, JavaScript etc.), which is used to render and display the page in a browser.

The X-Robots-Tag can be included in the HTTP Headers. Here’s a screenshot of a page’s HTTP response headers taken from Chrome Web Inspector, for a page that contains a X-Robots-Tag: noindex:

Screenshot of Google Web Inspector HTTP Response that includes X-Robots-Tag noindex

So how does this work in practice?

Configuring X-Robots-Tag on Apache

For example, if you're using the Apache web server, and you’d like to add a noindex,nofollow X-Robots-Tag to the HTTP response for all of your PDF files, add the following snippet to your .htaccess file or httpd.conf file:

<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex,nofollow"
</Files>

Or perhaps you want to make images of file types PNG, JPG, JPEG, and GIF non-indexable:

<Files ~ "\.(png|jpe?g|gif)$">
Header set X-Robots-Tag "noindex"
</Files>
Are your meta robots tags and X-Robots-Tags in conflict?

Do a quick check with ContentKing and find out if you’re sending Google into a tailspin!

Configuring X-Robots-Tag on nginx

Meanwhile on the nginx web server, you need to edit a site’s .conf file.

To remove all PDF files from search engines’ indices across an entire site, use this:

location ~* \.pdf$ {
  add_header X-Robots-Tag "noindex, nofollow";
}

And to noindex the images, use this:

location ~* \.(png|jpe?g|gif)$ {
  add_header X-Robots-Tag "noindex";
}

Note that tweaking your web server configuration can negatively impact your entire website’s SEO performance. Unless you're comfortable with making changes to your web server's configuration, it’s best to leave these changes to your server administrator.

Because of this, we highly recommend monitoring your sites with ContentKing. Our platform immediately flags any changes so that you can revert the changes before they have a negative impact on your SEO performance.

SEO best practices for robots directives

Stick to these best practices around robots directives:

  • Avoid conflicting robots directives: avoid using both meta robots and X-Robots-Tag directives to signal your crawling and indexing preferences for your pages, as it’s easy to mess up and send conflicting instructions. It’s fine to use meta robots directives on pages and X-Robots-Tag for your images and PDFs though—just make sure you're not using both methods of delivering robots directive instructions on the same file.
  • Don’t disallow content with important robots directives: if you disallow content using your robots.txt, search engines won’t be able to pick up that content’s preferred robots directives. Say for example you apply the noindex directive to a page, and go on to disallow access to that same page. Search engines won’t be able to see the noindex, and they may still keep the page in their index for a long time.
  • Don’t combine noindex directive with canonical URL: a page that has both a noindex directive and a canonical to another page is confusing for search engines. In rare cases, this results in the noindex being carried over to the canonical target. Learn more.
  • Don’t apply noindex directive to paginated pages: because search engines (Google especially) understand paginated pages well, they treat them differently and won’t see them as duplicate content. And keep in mind that in practice, over time a noindex directive becomes a noindex,nofollow, closing the door on a discovery path for the content that’s linked through paginated pages. Learn more.
  • No hreflang to pages with noindex: hreflang signals to search engines which content variants are available to different audiences, sending a signal that these need to be indexed. Therefore, avoid referencing pages that have a noindex directive.
  • Don’t include pages with noindex in XML sitemap: pages that shouldn’t be indexed, shouldn’t be included in your XML sitemap either, since the XML sitemap is used to signal to search engines which pages they should crawl and index.

Meta robots vs X-Robots-Tag vs Robots.txt

The Meta robots directives, X-Robots-Tag and robots.txt all have their own unique uses. To summarize what we’ve covered so far, here’s what they can be used for:

  Meta robots X-Robots-Tag Robots.txt
Used on HTML files Non-HTML files Any file
Scope Per page Unlimited Unlimited
Influences crawling yes icon nofollow yes icon nofollow yes icon
Influences indexing yes icon yes icon yes icon*
Consolidates signals no icon no icon no icon

* Content that’s disallowed in the robots.txt generally will not get indexed. But in rare cases, this still can happen.

Support across search engines

It’s not just the interpretation of conflicting robots directives that can differ per search engine. The directives supported and the support for their delivery method (HTML or HTTP Header) can vary too. If a cell in the table below has a green checkmark (yes icon), both HTML and HTTP header implementations are supported. If there’s a red cross (no icon), none are supported. If only one is supported, it’s explained.

Directive Google Bing/Yahoo Yandex
all yes icon no icon Only meta robots
index yes icon yes icon Only meta robots
follow yes icon yes icon Only meta robots
noindex yes icon yes icon yes icon
nofollow yes icon yes icon yes icon
none yes icon no icon yes icon
noarchive yes icon yes icon yes icon
nosnippet yes icon yes icon no icon
max-snippet yes icon yes icon no icon

And now, on to the less important ones:

Directive Google Bing/Yahoo Yandex
unavailable_after yes icon no icon no icon
noimageindex yes icon no icon no icon
max-image-preview yes icon yes icon no icon
max-video-preview yes icon yes icon no icon
notranslate yes icon no icon no icon

Wrapping up and moving on

Solid technical SEO is all about sending search engines the right signals. And the meta robots directive is just one of those signals.

So continue learning how to take search engines by the hand with our guide on Controlling Crawling & Indexing!

ContentKing Academy Content Team

Steven is ContentKing's VP of Community. This means he's involved in everything community and content marketing related. Right where he wants to be. He gets a huge kick out of letting websites rank and loves to talk SEO, content marketing and growth.

Jessica is a content marketer for ContentKing. Her days are spent writing marketing content, cycling around canals in Amsterdam and attempting to master the Dutch language.