Meta Robots Tag Guide
Meta robots tags are an essential tool to improve search engine’s crawling and indexing behavior, and to control your snippets in the SERP.
In this article we'll explain how to do this, how interpretation and support differs per search engines and how the meta robots tag relates to the X-Robots-Tag and robots.txt file.
What is the meta robots tag?
The meta robots tag gives site owners power over search engines’ crawling and indexing behavior and how their snippets are served in search engine result pages (SERPs).
The meta robots tag goes into the
<head> section of your HTML and is just one of the meta tags that live there.
Arguably the most well-known meta robots tag is the one telling search engines not to index a page:
You can provide the same instructions by including them in the
HTTP header using the X-Robots-Tag. The X-Robots-Tag is often used to prevent non-HTML content such as PDF and images from being indexed.
Meta robots directives
We prefer to talk about meta robots directives instead of meta robots tags, because calling them “meta robots tags” is incorrect (see “anatomy of the meta element” below).
Meta robots directives are not to be confused with robots.txt directives. These are two different ways of communicating with search engines about different aspects of their crawling and indexing behavior. But they do influence one another, as we’ll see further down the article.
Anatomy of the meta element
Let’s use the meta robots directive example mentioned above to explain what’s what:
- The entire code snippet is called the
/>are the opening and closing tags.
- There’s an attribute called
namewith the value
robotsapplies to all crawlers but can be replaced with a specific user-agent.
- And then there’s an attribute called
contentwith the value
noindex,followcan be replaced with other directives.
Why is it important?
Firstly, meta robots directives give you much-needed control over search engines’ crawling and indexing behavior. Without any direction, search engines will try to crawl and index all the content they come across. That’s their default behavior.
Secondly, search engines will generate a snippet for your URLs when they rank them. They take your meta description as input, though they will often instead come up with their own snippet—based on your page’s content—if they think it’ll perform better.
Quickly find out if your meta robots directives are set up right!
Now, let’s look at a few applications of the meta robots directives in protecting your SEO performance:
- Prevent a duplicate content issue by applying the meta robots
noindexdirective to PPC landing pages and on-site search result pages. Note that robots directives will not pass on any authority and relevancy like the canonical URL would.
- Prevent search engines from indexing content that should never be indexed because you’re providing discounts or some other offer that you don’t want to be available to the entire world.
- Remove sensitive content that has been indexed: if search engines have indexed content they should never have indexed in the first place, apply the meta robots
noindexdirective to remove the content from their indices. You can use the same technique when fixing crawler traps.
- Selectively apply the meta robots
noindexdirective to discontinued products to keep providing users with a good user experience.
Meta robots syntax explained
Before we dig in deeper, let’s cover some of the basics:
- The syntax is not case sensitive
- Separating directives with commas is required for Google
- Spaces after commas are not required
The syntax is not case sensitive
Meta robots directives are not case sensitive, meaning the examples below are all valid:
Separating directives with commas for Google
For Google, you need to separate directives with a comma. A space doesn’t cut it:
Spaces after commas not required
You’re not required to use spaces after commas between directives. So, the examples below are both valid:
Now, let’s move on to the directives themselves!
Meta robots directives in detail
In this section we’ll cover the most common meta directives you’ll come across in the wild and what exactly they mean. We'll focus primarily on directives support from Google, as they are the dominant search engine.
Here are the directives we'll cover:
Meta robots “all”
By default, search engines will crawl and index any content they come across, unless specified otherwise. If you want to explicitly define that this is allowed, you can do so with the following directive:
Meta robots “index”
While not necessary because it’s default behavior, if you want to make it explicit to search engines that they are allowed to index a page, you can do so with the meta robots directive below.
Meta robots “index,follow”
index directive is combined with the
follow directive, leading to:
These directives essentially mean the same thing as the one above that only states
follow is default search engine behavior as well.
Meta robots “noindex”
The meta robots
noindex directive tells search engines not to index a page. Here’s what the meta robots
noindex directive looks like:
The example above tells search engines they shouldn’t index the page, but they should feel free to follow all its links, because it’s not explicitly stated they shouldn’t.
noindex directive carries a lot of weight, so when search engines find it, they are quick to remove content from their index. The other side of the coin is that it’s tough to get this content re-indexed when for example you’ve accidentally applied the
Be alerted immediately about rogue noindex directives to prevent SEO disasters!
Meta robots “noindex,follow”
You’ll frequently find meta robots
noindex being combined with the
follow directive. It tells search engines not to index the page—but that it’s fine to follow the links:
At the risk of sounding like a broken record,
<meta name="robots" content="noindex" /> and
<meta name="robots" content="noindex,follow" /> mean the same thing, since
follow is default search engine crawler behavior.
Meta robots “noindex,nofollow”
You can also combine the meta robots
noindex directive with a
nofollow meta directive (not to be confused with the nofollow link attribute):
noindex,nofollow combination tells search engines not to index the page and not to follow the links on the page, meaning no link authority should be passed on either.
Meta robots “none”
The meta robots
none directive is actually a shortcut for
noindex,nofollow, which we covered just above. Here’s what the meta robots
none directive looks like:
It’s not used very often, and folks often think it means the exact opposite:
So be careful with this one!
Meta robots “noarchive”
The meta robots
noarchive directive prevents search engines from presenting a cached version of a page in the SERP. If you don’t specify the
noarchive directive, search engines may just go ahead and serve a cached version of the page. So again, this is an opt-out directive.
Here’s what the
noarchive directive looks like:
It’s frequently combined with other directives though. For example, you’ll commonly see it used together with the
This means search engines shouldn’t index the page, shouldn’t follow any of its links and shouldn’t cache the page either.
Meta robots “nosnippet”
The meta robots
nosnippet directive tells search engines not to show a text snippet (usually drawn from the meta description) or video preview for the page.
Here’s what the
nosnippet directive looks like:
If we were to apply the meta robots
nosnippet directive to our redirects article, the snippet would then look like this:
Search engines may still show an image thumbnail if they think this results in a better user experience. For Google, this applies to regular Web Search, Google Images, and Google Discover. The
nosnippet directive also functions as a
nosnippet directive is not included, Google will generate a text snippet and video preview on its own.
Meta robots “max-snippet”
The meta robots
max-snippet directive tells search engines to limit the page’s snippet (generally drawn from the page’s meta description) to a specified number of characters.
Here's an example where the snippet will have a maximum length of 50 characters:
Meta robots “max-snippet:0”
When you specify
max-snippet:0, you’re telling search engines not to show a snippet—essentially the same as the meta robots
nosnippet directive we just described above:
Meta robots “max-snippet:-1”
When you specify
max-snippet:-1, you’re explicitly telling search engines they can determine the snippet’s length themselves, which is their default behavior:
Less-important meta robots directives
Now we’ve arrived at the less important meta robots directives, which we’ll only touch on briefly.
What goes for the other meta robots directives goes for these too: if they aren’t defined, search engines will do as they please.
Here’s what the directives signal to search engines:
unavailable_after: "remove a page from your index after a specific date". The date should be specified in a widely adopted format, such as for example ISO 8601 (opens in a new tab). The directive is ignored if no valid date/time is specified. By default there is no expiration date for content. It’s basically a timed
noindexdirective, so be careful when using it.
noimageindex: "don’t index the images on this page".
max-image-preview: "define a maximum size for the image preview for a page, with possible values:
max-video-preview: "define a maximum for the preview length of videos on the page".
notranslate: "don't offer a translated version of the page in your search results".
How can you combine meta robots directives?
In addition to being able to combine directives, you can also provide directives to different crawlers. Each crawler will use the sum of the directives provided to them, that is: they stack.
To illustrate how, let’s look at an example:
These directives are interpreted as follows:
- Other search engines:
How do search engines interpret conflicting directives?
As you can imagine, when you start stacking directives, it’s easy to mess up. If a scenario presents itself where there are conflicting directives, Google will default to the most restrictive one.
Take for example the following directives:
Verdict: Google will err on the side of caution and not index the page.
But, the way conflicting directives are interpreted can differ among search engines. Let’s take another example:
Google will not index this page, but Yandex will do the exact opposite and index it.
So keep this in mind, and make sure that your robots directives work right for the search engines that are important to you.
X-Robots-Tag—the HTTP header equivalent
Non-HTML files such as images and PDF files don’t have an HTML source that you can include a meta robots directive in. If you want to signal your crawling and indexing preferences to search engines for these files, your best bet is to use the
X-Robots-Tag HTTP header.
Let’s briefly touch on HTTP headers.
When a visitor or search engine requests a page from a web server, and the page exists, the web server typically responds with three things:
HTTP Status Code: the three-digit response to the client’s request (e.g.
HTTP Headers: headers containing for example the
content-typethat was returned and instructions on how long the client should cache the response.
HTTP Body: the body (e.g.
X-Robots-Tag can be included in the HTTP Headers. Here’s a screenshot of a page’s HTTP response headers taken from Chrome Web Inspector, for a page that contains a
So how does this work in practice?
Configuring X-Robots-Tag on Apache
For example, if you're using the Apache web server, and you’d like to add a
noindex,nofollow X-Robots-Tag to the HTTP response for all of your PDF files, add the following snippet to your
.htaccess file or
Or perhaps you want to make images of file types
Do a quick check with ContentKing and find out if you’re sending Google into a tailspin!
Configuring X-Robots-Tag on nginx
Meanwhile on the nginx web server, you need to edit a site’s
To remove all PDF files from search engines’ indices across an entire site, use this:
And to noindex the images, use this:
Note that tweaking your web server configuration can negatively impact your entire website’s SEO performance. Unless you're comfortable with making changes to your web server's configuration, it’s best to leave these changes to your server administrator.
Because of this, we highly recommend monitoring your sites with ContentKing. Our platform immediately flags any changes so that you can revert the changes before they have a negative impact on your SEO performance.
SEO best practices for robots directives
Stick to these best practices around robots directives:
- Avoid conflicting robots directives: avoid using both meta robots and X-Robots-Tag directives to signal your crawling and indexing preferences for your pages, as it’s easy to mess up and send conflicting instructions. It’s fine to use meta robots directives on pages and X-Robots-Tag for your images and PDFs though—just make sure you're not using both methods of delivering robots directive instructions on the same file.
Don’t disallow content with important robots directives: if you disallow content using your robots.txt, search engines won’t be able to pick up that content’s preferred robots directives. Say for example you apply the
noindexdirective to a page, and go on to
disallowaccess to that same page. Search engines won’t be able to see the
noindex, and they may still keep the page in their index for a long time.
Don’t combine noindex directive with canonical URL: a page that has both a
noindexdirective and a canonical to another page is confusing for search engines. In rare cases, this results in the
noindexbeing carried over to the canonical target. Learn more.
Don’t apply noindex directive to paginated pages: because search engines (Google especially) understand paginated pages well, they treat them differently and won’t see them as duplicate content. And keep in mind that in practice, over time a
noindexdirective becomes a
noindex,nofollow, closing the door on a discovery path for the content that’s linked through paginated pages. Learn more.
No hreflang to pages with noindex:
hreflangsignals to search engines which content variants are available to different audiences, sending a signal that these need to be indexed. Therefore, avoid referencing pages that have a
- Don’t include pages with noindex in XML sitemap: pages that shouldn’t be indexed, shouldn’t be included in your XML sitemap either, since the XML sitemap is used to signal to search engines which pages they should crawl and index.
Meta robots vs X-Robots-Tag vs Robots.txt
The Meta robots directives, X-Robots-Tag and robots.txt all have their own unique uses. To summarize what we’ve covered so far, here’s what they can be used for:
|Used on||HTML files||Non-HTML files||Any file|
* Content that’s disallowed in the robots.txt generally will not get indexed. But in rare cases, this still can happen.
Support across search engines
It’s not just the interpretation of conflicting robots directives that can differ per search engine. The directives supported and the support for their delivery method (HTML or HTTP Header) can vary too. If a cell in the table below has a green checkmark (), both HTML and HTTP header implementations are supported. If there’s a red cross (), none are supported. If only one is supported, it’s explained.
|all||Only meta robots|
|index||Only meta robots|
|follow||Only meta robots|
And now, on to the less important ones:
Wrapping up and moving on
Solid technical SEO is all about sending search engines the right signals. And the meta robots directive is just one of those signals.
So continue learning how to take search engines by the hand with our guide on Controlling Crawling & Indexing!