Preventing Thin and Duplicate Content
Thin content is content that adds little to no value for your visitor. Thin content comes in many forms:
- Pages that have no body content
- Pages that have little body content, say 50 words or less
- Pages that have identical content (also called duplicate content)
- Pages that have nearly identical content (also called near duplicate content)
If pages aren’t significantly different from one another, search engines may choose not to rank these pages highly or even ignore them completely. You’ll end up competing with your own pages.
Having a lot of thin content is also a signal to search engines that your site isn’t in good shape, and may limit your SEO success.
- Product filters
- Faceted navigation
- Preferred amount of products per page
- Product variants
- Having products in multiple categories
Fortunately, there are good countermeasures for thin content at your disposal.
Upon interacting with product filters you drill down to a subset of products. From a user’s point of view this is great: define search criteria and quickly navigate to a desired product. Quick and easy.
eCommerce websites save these search criterions in the URL, making it easy to go back to it or share it with others. so they can easily go back to it, or share it with others. And this is where duplicate content comes in: you can generate a virtually unlimited number of different URLs with all of the filter criteria. Unless you tell them not to, search engines will crawl these pages. You don’t want these pages showing up in the search results but search engines could nonetheless end up spending their valuable crawling time on them. For each domain, search engines have a so-called crawl budget—this is the amount of attention they can give your website. Ideally, you want them to be spending it on pages that you actually want to show up in the search engine result page (SERP).
Prevent search engines from accessing filtered URLs by restricting their access to them. This can be achieved by adding a disallow using your robots.txt file. Find out what URL pattern occurs in each of the filtered URLs and add a Disallow directive for those URLs.
We often see URLs like
You can take away the confusion by making sure search engines never reach those filtered URLs by adding a Disallow directive to your robots.txt file:
Faceted navigation enables you to drill down to a subset of the products in a category.
Take for instance the product category “Televisions”. Facets of this category may be subcategories like “LCD Televisions” and “Plasma Televisions”.
Often, you want your faceted pages to be indexed. It’s important that search engines can distinguish between the different facets, so make sure to include body content on these pages.
See for more information this article from Google on faceted navigation.
Product categories with a lot of products are often paginated. A total of 300 products may be divided over 10 separate pages. These pages are all very similar. The only difference being the products shown. For search engines this is basically the same content, only reshuffled. Search engines consider this thin content. Thin content only confuses search engines, and is therefore often disregarded by them. To avoid this, you want to let search engines know that these 10 paginated pages are in fact a sequence of pages, related to one another.
You can inform search engines that you’re using pagination by clearly defining the relationship between these pages. This is done using the links
rel="prev". It’s recommended paginated pages have a self-referencing canonical URL.
Page 1 only has a next page. These are the relations that need to be defined in the source:
Page 2 has a link to the previous page, and to the next page:
Page 3 only has a link to the previous page because it’s the last one in the sequence.
Preferred amount of products per page
A lot of eCommerce websites offer users the possibility to choose how many products are displayed on each page. Often you’ll see something like this:
Based on what a visitor chooses, the website generates three new URLs:
So now there are 3 other versions of the original page,
This is considered duplicate content and can therefore cause problems. Search engines will find all four of the pages and be confused about which page to serve.
In the scenario above, adding the following directive to your robots.txt file fixes this issue:
Sorting is a popular feature on product category pages. For example, it enables users to sort based on price. For a user this is a very useful feature, but if not implemented correctly, this can lead to confused search engines.
www.example.com/product-category/ - regular overview of products
www.example.com/product-category/?sort=priceHigh - overview of products sorted by price, from high to low.
www.example.com/product-category/?sort=priceLow - overview of products sorted by price, from low to high.
These three URLs show the same products, just in a different order. This leads to duplicate content and should therefore be dealt with appropriately.
Exclude URLs used for sorting products through your robots.txt file. Adding the following directive to your robots.txt file fixes this for the example above:
Products such as clothing are often available in multiple sizes and colours. One shoe model may have 32 variants (8 sizes with 4 colours each). We call these product variant pages.
Often these product variant pages don’t contain a sufficient amount of unique body content and only the photos are different, so for search engines these products are very similar. This means more duplicate content.
www.example.com/product-category/product/- main product page
www.example.com/product-category/product/variant-s/- product in size S
www.example.com/product-category/product/variant-m/- product in size M
www.example.com/product-category/product/variant-l/- product in size L
www.example.com/product-category/product/variant-xl/- product in size XL
If you are not going to write sufficient unique body content for any of the product variant pages, prevent duplicate content by canonicalizing URLs 2-5 to
Having products in multiple categories
In eCommerce, it’s also quite common to have products that fall under multiple categories.
Take car batteries, for example. Your batteries are categorized based on voltage and amperage levels, as well as on what car models they work in. You can easily find yourself finding one specific battery through 5 different paths, meaning that for a lot of eCommerce websites, each battery would have 5 URLs, as well. The result? You guessed it—more duplicate content.
Make sure that if a product is in multiple categories, one is marked as its primary category. The primary category defines the URL which you want this product to be indexed at. So if you have a product that’s available in four categories you may have the following URLs:
www.example.com/audi/battery/- product with primary product category URL.
www.example.com/volkswagen/battery/- product in different product category
www.example.com/voltage/12v/battery/- product in voltage product category
www.example.com/amerage/60ah/battery/- product in amperage product category
www.example.com/new/battery/- product in “new products” category
If URL 1 is the primary URL, then URLs 2 - 5 should all have this canonical URL:
Preserve Crawl Budget
Considering that search engines have billions of pages to crawl, it makes sense that they need to prioritize. Combine that with the fact that resources and capabilities of hosting platforms differ a lot per website, together these two factors comprise your overall crawl budget.
Crawl budget is basically the amount of the search engine’s attention your website deserves and can handle. Each website has an assigned crawl budget, and you need to spend this crawl budget wisely. You want search engines to focus on pages you want to rank with. When your crawl budget is exhausted, search engines will stop crawling and return at a later stage. Having your crawl budget spent on pages that search engines can’t index, or on pages you don’t want search engines to index can drastically hinder your SEO strategy.
Let’s say you have 30,000 pages that are accessible to search engines. On top of that, there are an additional 1,000 pages that deliver a 404 message and another 9,000 pages redirecting to other pages. This leaves us with a grand total of 40,000 pages that search engines can find. On top of that, of those 40,000 pages, only 5,000 of them are indexable for search engines.
So what’s wrong with this picture? Only 12.50% (5,000/40,000) of your pages should show up in search engines. Theoretically, search engines would focus 87,50% of their attention on pages that you don’t want them to find in the first place. This is a lot of wasted crawl budget.
- Control crawl budget using your robots.txt file. Restrict search engines from accessing sections that are of no concern to them.
- Update or remove links to avoid linking to pages that are redirected (3xx), cannot be found (4xx), or return server errors (5xx).
- Decreasing page load time of your website enables search engines to crawl more pages within the budget allocated to your domain.
Search engines are rewarding websites that provide great user experience more and more, and mobile-friendliness plays an important part in that.
In late-2015, the amount of mobile searches on Google surpassed the amount of desktop searches. And that number has continued to grow ever since. Mobile-friendliness isn’t just important from an SEO point of view—if you want to have a successful eCommerce website, appealing to mobile users is crucial.
The two most common ways to accommodate mobile visitors is through:
- A responsive design, which means the website automatically adjusts to accommodate the user’s device.
- A website dedicated and fully optimized to mobile use. Because it’s costly to maintain two websites, this approach is only worth it in cases where the majority of your traffic is from mobile users, and a responsive design would limit your ability to provide adequate service.
Regardless of how you cater to mobile users, it’s a good idea to run your website through Google’s mobile-friendly testing tool just to be sure everything is set up correctly.
The most popular choice: responsive website?
From an SEO standpoint, having a responsive website is usually your best option to service mobile users. For each page only have one URL to promote. You don’t have to worry about consolidating link and relevancy signals across the desktop and mobile versions of your pages, as you would with a separate desktop and mobile version.
Implementing separate desktop and mobile websites
If you do choose to go with two separate, dedicated websites, you want to make the relationship between the two clear so search engines point various device users to the website dedicated to them. On top of that, you don’t want any duplicate content issues, considering that both websites show the same information. To facilitate this, search engines came up with the
rel="alternate" media="x" attribute. For the sake of ease, let’s call it the
mobile attribute here on out.
mobile attribute is part of the
<link> tag and let’s you define an alternative version of your page.
Don’t confuse the
mobile attribute with the
hreflang attribute which is used for translated versions of your page.
Let’s look at an example to demonstrate how the
mobile attribute works. Say your desktop website is running on https://www.example.com and your mobile website is running on https://m.example.com.
On the desktop page
In the HTML of a desktop page, define the mobile version of the page:
This means that the mobile website should be served when the width of the user’s device is less than 640 pixels.
In the HTML of a mobile page, define the desktop version of the page:
Having the canonical URL there prevents duplicate content.
Google also supports the
mobile attribute through XML sitemaps.
Accelerated Mobile Pages (AMP)
While we’re talking about mobile-friendliness, it’s important to bring up Accelerated Mobile Pages (AMP) as well. The vision behind the AMP project is to deliver a better user experience to mobile users through a mobile-first approach and fast-loading pages.
The important thing to know about AMP is that, by general consensus, it is not recommended for eCommerce websites. The AMP guidelines are often too strict to accommodate all of the functionality you need for visitors to go through the entire checkout process.
In most cases having a responsive design is your best option.
Offer users consistent content and functionality across platforms
Be sure to offer desktop and mobile users the same content and functionality. Google has announced they’ll switch to a “mobile-first index” somewhere in 2018, meaning that your mobile website will be leading in the Google algorithm instead of the desktop one.
For responsive or dedicated mobile websites that reduce the amount of content they’re showing their mobile users, this is bad news. Google isn’t ready to switch to the mobile-first index just yet, but you want to get ready for when they do—and sooner rather than later.
Historically, eCommerce websites have used HTTPS for pages in the checkout process. Several years ago, Google started pushing the adoption of HTTPS across your entire website. Serving your entire website through HTTPS plays a minor role in Google’s algorithm, so while it may help a little, it’s not going to provide a significant competitive edge in terms of SEO.
A more important reason to adopt HTTPS is its improved security. To hammer this point home, all Google Chrome versions released after January 2017 show a warning in the address bar if websites containing form fields aren’t served over a secure connection.
This may scare off potential customers, so here’s another reason for you to have your eCommerce website running on HTTPS.
- Serve your entire website over HTTPS
- When migrating from HTTP to HTTPS, be sure to do a proper URL migration.
Studies have shown that fast-loading pages decrease bounce rates and raise conversion rates. Amazon found that their revenue increased by 1% for every 100ms decrease in load time. On top of that it plays a role in the Google algorithm (although small), so having fast-loading pages is important enough to take into account.
Optimizing for page speed can be a bit of a technical endeavor and there are hundreds of tiny things you can tweak to squeeze every millisecond out of your pages. In practice, focusing on the following three best practices will usually get you 90% of the way there.
Use a Content Delivery Network
Traditionally a single webhost was responsible for hosting the website all by itself. This webhost was located somewhere on our planet, and whenever somebody visited your website from the other side of the globe their connection had quite some ground to cover.
Content Delivery Networks (CDNs) changed this. Instead of hosting your website in a single place, your content is distributed all around the world and when somebody visits your website they connect to the endpoint (called an “edge”) that is closest to their location, decreasing the latency to reach your website.
Furthermore CDNs are optimized to serve content fast by caching your pages. By serving a static copy of your pages, they significantly lower the subsequent load time. What’s more, CDNs are usually much more capable of handling large traffic volumes. This means that where a severe traffic peak may cause your single-hosted website to slow down or simply crash from overload, CDNs happily keep serving your content.
Nowadays CDNs are both affordable and relatively easy to implement, so there’s hardly a reason not to use them.
Optimize your assets
To prevent slowing down your website, you need to optimize your assets:
- Make sure the images you serve are optimized for the device that’s used to visit your website. It doesn’t make sense to serve a 4 megapixel image to a mobile visitor.
- Minify and compress your JS and CSS to optimize their size, further reducing load time.
- Load your assets at the right moment. A good example of this are the product photos on your category pages: you don’t need to load them immediately, slowing down the rendering of your pages. Instead, load and display them once the rest of the page has been loaded, a technique called “lazy loading”. You can do the same with parts of your CSS and JS files: only immediately load what you absolutely need to display the page in the browser, then load the rest.
Use browser caching
Most pages on your website rely on a shared set of assets, such as CSS stylesheets, JS code, and common images such as your website’s logo. The problem is that browsers happily re-request any asset they encounter, meaning that when a visitor navigates from one page to another their browser again loads the logo, the CSS stylesheets, and the JS code. Such a waste!
To prevent this, your webserver can send a signal to the browser to keep a cached copy of the assets and re-use them in the future. This is done by sending a cache header together with the asset, instructing the browser to re-use the asset until a certain expiry time.
How to set this up differs from platform to platform, so make sure to inquire with your developer how to implement this on your specific website.
Structured data refers to applying markup to your pages so search engines understand it better. See it as describing your pages in a language search engines understand. Google, for their part, is the king of understanding and using structured data.
Typically, describing your content is done through schema.org. Schema.org supports a variety of schemas, the most interesting ones for eCommerce websites being:
- Product: to describe products
- Reviews: to describe reviews
- Corporate Contacts to describe your organization
- Breadcrumbs to signal you use breadcrumbs
- Sitelinks Searchbox to signal you want to show a searchbox in Google’s results driven by your own website’s search engine.
Price range shown:
Structured data creates lots of opportunities for creative SEO specialists, so use it to stand out from the competition.
- Structured data using Schema.org: an Introduction
An XML sitemap is a file consisting of all of the pages you want search engines to crawl and index. The file is structured using the XML standard, hence the name.
- Make sure your XML sitemap is dynamically generated. When pages are added and removed, your XML sitemap needs to be set up to reflect this.
- Include all of your indexable pages returning HTTP status 200.
- Don’t include more than 50,000 URLs in one XML sitemap. It’s often recommended to split up XML sitemaps so they don’t get too big. In Google Search Console, this enables you to see how many URLs are indexed out of each XML sitemap.
- If you have multiple XML sitemaps, be sure to create an XML sitemap index listing all XML sitemaps so search engines can find them easily.
- Reference your XML sitemap(s) in your robots.txt file. This enables search engines to quickly find your XML sitemap.
If you have a multilingual eCommerce website, you need to make sure search engines know which part of the website to serve to which users. For Google and Yandex you can do this using the
rel="alternate" hreflang attribute. For the sake of brevity, we’ll call the
rel="alternate" hreflang attribute simply the
hreflang attribute from here on.
hreflang attribute is part of the
<link> tag and let’s you define an translated version of your page. Make sure not to confuse the
hreflang attribute with the
rel="alternate" media attribute which is used to signal a mobile version of your pages.
hreflang attribute supports both language targetting and a combination of languages and regions.
The hreflang attribute can be defined using:
- HTML link elements in the
- HTTP header
- XML sitemap
If you’re unable to implement the
hreflang attribute, you can define your targeting preferences using Google Search Console and Bing Webmaster Tools. If you are able to set up hreflang, be sure your preferences in Google Search Console and Bing Webmaster Tools don’t conflict.
The anatomy of the hreflang attribute
hreflang attribute consists of two parts:
- audience targeting: the definition of the language or a combination of language and geographical location
- What URL to show to your target audience
When defining the hreflang attribute, you reference each translated version of the page using the
You can define a fallback page if no page is available for the audience you’re targetting. This is done using the
Let’s look at an example:
This tells search engines that the English part of your website is available through
www.example.com and the Spanish part through
x-default value tells search engines to serve the main website when they’re unable to serve the language the visitor is searching in.
Here’s a more advanced example with targeting for a combination of languages and geographic locations.
Let’s say your website is available in German and you’re targeting Germany, Austria, and Switzerland. Let’s also assume your German version of the website targeting Germany will be served as your fallback.
- Make sure your hreflang definitions are bi-directional, meaning each reference should go both ways. For instance, when
https://www.example.de/au/needs to reference
- Avoid conflicting targeting. Conflicting targeting could arise when you make mistakes in the references between pages. For instance, when
https://www.example.de/may define that it’s targeting just Germany, but if
https://www.example.de/for the whole German language there’s a conflict.
- Define language and region combinations correctly. Always double check if the combination is correct, and that you used them in the right order (language-region).
- Always set the
- Make sure to use the canonical URL together with the
hreflangattribute. Together they work to clearly communicate to search engines the relationships within your website. Only include URLs in
hreflangattribute that have a self referencing canonical.
- Use absolute URLs when defining the
hreflangattribute. Absolute URLs are less prone to misinterpretation by search engines than relative URLs.
Pro-tip: multilingual websites aren’t just about content—the goal here is to provide a complete user experience, including cultural references and currencies.
The eCommerce arena is an extremely competitive one, especially when it comes to SEO. Making a mistake can set your progress back weeks, if not months. This checklist gives you the an arsenal of on-page SEO tactics to take on the competition and solidify your stance on the market.
Whether you’re preparing your eCommerce website for launch, or just giving it a facelift, addressing all the issues presented in this article will make sure your launch is a solid one. Then you’ll be able to sit back and track the results of all your efforts.
It’s important to remember, though, that SEO is a continuous process. If you want it to succeed, your SEO strategy needs regular maintenance, fine-tuning, and even a little tender, loving care. Launching your website, or changes, is just the beginning. After that, the game is on and it’s up to you to continuously improve and expand.
Now that you’ve invested the resources into getting your SEO strategy off to a good start, you’ll want to set up monitoring to keep tabs on the health of your website. Enter ContentKing. This real-time SEO monitoring tool is designed to keep your strategy moving forward by tracking your content to help you defuse unforeseen SEO “surprises” and even prevent them in the first place.