What you can expect from this article
The meta robots tag instructs search engines which pages you want them to index and how. This article provides an in-depth look at some of the intricacies of this tag and, more importantly, shows you how to put it to work for you today.
What is a meta robots tag?
The meta robots tag allows you to fine-tune what content search engines should index and display to users within SERPs (search engine result pages). The meta robots tag can be found in the HTML source of a page, and looks something like this:
This specific example tells all search engines not to index the page, but to follow the links it finds on the page and to pass on link authority. These instructions (
noindex,follow) are called search engine directives and will be explained shortly.
Why you should care about the meta robots tag
Whether you’re a website owner or an SEO specialist, you need to be able to clearly signal to search engines how you want your websites indexed. The meta robots tag makes that possible.
Even though search engines have come a long way in understanding websites, when it comes to indexing you don’t want to leave it up to their algorithms to determine what pages ought to be indexed and which ones not. That alone is reason enough to make the meta robots tag an essential part of your SEO toolbox.
The meta robots tag is often used to combat duplicate content. Duplicate content are the same or very similar pages that are accessible through multiple URLs, which gives conflicting signals to search engines, essentially confusing them.
However, it should be said that there are other — often better — mechanisms available to prevent duplicate content issues, such as canonical URLs and robots.txt. However, there are some specific use-cases for the meta robots tag, which we’ll take a look at in a moment. But first, let’s learn what directives are available to instruct search engines.
The meta robots tag directives
One thing that makes the meta robots tag so effective is its level of versatility. Here’s a list of all the directives you have at your disposal to signal your indexing preferences to search engines:
noindex directive signals to search engine robots not to return a page within search results when queried.
nofollow directive indicates to search engine robots that all links within a page should not be followed and should not pass on link authority.
none directive signals to search engine robots that this page should basically be ignored. It’s sometimes used as a shortcut for
nofollow directives, as well.
Protip: when you’re using either the
none directive, or
noindex,nofollow directive it’s recommended to disallow access to this page all together using your robots.txt file.
noarchive directive prevents search engines from presenting a cached version of the specified page.
nosnippet directive prevents search engines from displaying snippets in the SERPs and additionally prevents search engines from caching the page.
noimageindex directive prevents search engines from indexing the images on a page. Please note that, if an image is placed on another page which doesn’t have the
noimageindex directive it will still be indexed. When doing image SEO, this is essential to be aware of.
noodp directive was used to stops search engines from pulling the description of the page within DMOZ (an open content directory of links being maintained by volunteers) as the snippet for your page within SERPs. As of May 2017, DMOZ has closed down so this meta robot tag cannot be used anymore.
notranslate directive tells search engines to not offer a translated version of the page within SERPs.
unavailable_after directive tells search engines to not display the page after a designated time. The date/time must be formatted in RFC 850 format.
index and all
all directives signals to search engine robots that you’d like them to index the page. You don’t have to indicate this, as it’s the default for search engines: unless you specify a different directive search engines will index the page and follow its links.
follow and all
alldirectives tells search engines to follow the links on the page and to pass on link authority. Just like the
index directive it’s the default so there’s no absolute need to specify.
Situations in which you would use the meta robots tag
To be honest: although the meta robots tag is a powerful way to instruct search engines how to deal with your content, it’s usually not the go-to mechanism to do so. If you don’t want to have a page indexed by search engines, it’s usually more advisable to use a canonical link or to disallow access to the page entirely through robots.txt. However, if for any reason you’re unable to employ those solutions the meta robots tag is a good method to achieve the same goal: preventing duplicate content issues.
Furthermore, a specific use-case for the meta robots tag is when dealing with place-holder pages. Sometimes you need to publish a page which is not fully finalized yet and for the time being contains “thin content”. In these cases you might not want to have the page indexed yet and the meta robots tag is a proper solution for preventing this.
Combining meta robots tag directives
It’s pretty common to want to signal multiple commands to search engines that visit your page. And combining meta robots tag directives is by far the best way to do that. You can start out by creating a multi-directive instruction, utilizing meta robots tag directives that allow simultaneous actions at once.
Then there are situations that call for signalling different directives to different crawlers. The directives below, for instance, yield a
noindex,nofollow directive when crawled by Google and Bing and other search engines will choose to ignore the
noindex directive altogether.
Note that if a scenario presents itself where there are competing directives, the crawlers will default to the most restrictive directive (similar to the robots.txt file).
The X-Robots-Tag HTTP header
When you’re dealing with non-HTML files such as images and PDF files you don’t want to get indexed by search engines, the X-Robots-Tag HTTP header is your best bet. When a webserver responds to a request from a visitor’s browser or search engine it doesn’t just send along the “body content” but also HTTP headers. By sending the X-Robots-Tag HTTP header the webserver can give specific indexing directives to search engines, even for non-HTML files.
For example, if you’re using Apache web server and you’d like to add a
noindex,nofollow X-Robots-Tag to the HTTP response for all of your .PDF files, you’d set the configuration as follows:
Alternatively you can do the same for images of file types png, jpg and gif:
Note that setting up the X-Robots-Tag header usually requires changes in your webserver configuration and when incorrectly set up can negatively impact your entire website. Unless you’re comfortable with making changes to your webserver’s configuration it’s best-advised to leave these changes to your server administrator.
Meta robots tag vs X-Robots-Tag header vs robots.txt
So there are a few different ways to let search engine know your preferences around indexing, and each serves its own purpose. But when to use which one? To help with that question, here’s a quick rundown of each method (the meta robots tag, the X-Robots header and the robots.txt file) and where it makes sense to use it.
Meta robots tag: use the meta robots tag to signal your preferences around indexing of your pages. Based on this, search engine bots may ignore a page entirely or even determine which links to follow and which links to not follow on within your website by using this tag.
X-Robots-Header: the X-Robots-Header is similar to the meta robots tag but instead of specifying the instructions in the HTML source of the pages you specify it on the webserver level. For non-HTML files such as PDF files and images it’s the only way to signal indexing preferences, so that’s what it’s used for mostly.
Robots.txt: the robots.txt file is used to signal your preferences around access to your pages for search engines. It’s important to understand that if you prevent access to your pages, search engines will never be able to index that content properly.
Frequently asked questions about meta robots tags
Some frequently asked questions about meta robots tags:
- What if there are no spaces between commands in the meta robots tag?
- What if there are no commas in the meta robots tag?
- Are commands case-sensitive?
- How do I see the X-Robots-Header?
- Will search engines still crawl pages that have a meta robots tag?
1. What if there are spaces between commands in the meta robots tag?
Don’t worry—all major search engines automatically omit spacing in the commands. That means that it is not a factor within the tag directive (see example below):
is the same as
2. What if there are no commas in the meta robots tag?
It’s best to use commas in the meta robots tag. Bing claims they do not care one way or another, but Google does. And that right there is enough reason to use them (here’s an example of HOW NOT TO DO IT):
3. Are commands case-sensitive?
Nope. Google, Yahoo, and Bing can recognize what the command is within the directive, even if it’s randomly upper-cased and lower-cased. Case in point:
4. How do I see the X-Robots-Tag header?
The X-Robots-Tag header can be viewed within the HTTP headers. This is quite a technical endeavour to accomplish in your browser, so it’s recommend to use a tool like ContentKing to see these.
5. Will search engines still crawl pages that have a meta robots tag?
Yes, unless you place another directive instructing the bots to NOT crawl specific pages within your site through your robots.txt file.
The meta robots tag is one of several mechanisms to combat duplicate content issues. Unless you’re dealing with place-holder content it’s usually better to employ the canonical URL or robots.txt methods. But if these options are for any reason off the table, the meta robots tag is a surefire way take more control over the way search engines index and present your website.