CLOAKING: ALL YOU NEED TO KNOW

Are you a publisher who wants to keep people from sharing your content? It’s possible, using a technique known as cloaking.

In this guide, I’ll show you how it works and what the risks are if you’re detected – as well as some things that can help protect against them.

Table of Contents:

What Is Cloaking?

Cloaking is the practice of showing different content to human visitors than you show search engines. This is also sometimes called “duplicate content.”

Sometimes this can be accidental – for example, you link to your app deep within your site and different mobile browsers load different versions of it depending on their User-Agent. Other times, it’s deliberate.

For example, say you want to rank for “best local restaurants.” You’ve found an awesome review site that has great reviews of restaurants in your area. You’d love to use their content on your own site, but it’s just too expensive when you factor in what they’re asking for the rights to reprint all of their content.

The solution? You can rewrite their text, using a brand new article for each of the pages on your site. The only downside is that you’ll have to make sure it doesn’t rank for anything else – including the original article’s title or other similar titles.

How does Cloaking work?

Cloaking depends on your website server identifying the requesting source. The method used is called “HTTP Request Header.” When you visit a site, your web browser (e.g., Google Chrome, Mozilla Firefox) sends several pieces of information to that website in the form of headers along with your request for content.

These headers include your IP Address, your web browser type and version, the referring webpage that is linked to the page you’re visiting, and often a user-agent string that identifies the operating system and mobile device you are using.

By analyzing these headers, it’s possible for a website server to identify who is requesting content. This allows a publisher to show different content to different users, based on who they are.

One common example of this is showing a mobile-specific version of a website. In the US and UK, more searches happen on mobile devices than desktop computers. In Korea it’s even higher – over 70% of their searches come from mobile. It’s becoming increasingly important for websites to have mobile-friendly versions of their pages.

When you visit a website with an “m.” in the URL, it’s showing you the mobile version of that page. For example, m.example.com is their mobile site.

But what if they have separate desktop and mobile sites? They could use two different URLs – example.com for desktop and example.com/m for mobile. Or they could use a special URL that tells search engines that this page is intended for mobile users – m.example.com.

But what if they don’t want to build two separate sites, but still want to show different content? Here’s where cloaking comes in – they can use a special URL that tricks search engines into thinking it’s showing the mobile content, while actually showing the desktop page to users.

When someone requests this address in their browser, the server identifies them as a mobile user (based on their User-Agent string) and serves up its mobile content. When Googlebot visits the same link (based on the user-agent string submitted with its request), it’s served up desktop content.

It can even be more granular than that – Google recently announced that they’re able to identify individual mobile devices and serve them personalized search results. To do this, they must see all of the identifying data sent in the headers each time you request a page.

So if you visit a website and the server identifies your User Agent as an iPhone, it can respond with different content than someone using an Android phone or desktop computer. It’s this personalization that lets Google serve up personalized search results to individual users, but for this, to work they need to be able to identify each device individually.

This is where cloaking comes in. Cloaking tells the server to identify a particular mobile device and serve up different content than it would for other mobile users or desktop computers.

What’s the problem?

Cloaking can be used for a number of purposes – showing different versions of a page, serving up ads, providing personalized search results… and also showing completely different content than real users would see.

It’s very easy to tell if a page is supposed to be showing different content for mobile or desktop users – we can simply compare the Googlebot User-Agent string with those of regular users. This lets us easily identify cloaking as some pages will show mobile-only results to Google and the same page will show desktop-only results to actual users.

Cloaking and Google Penalties related to it

As cloaking gives away a website’s intentions (to artificially alter search results), it can lead to problems such as penalties. Again, we see the importance of following Google Webmaster Guidelines.

There are too many people talking about cloaking as if it’s completely undetectable, or not important – but that’s simply not true. Cloaking should be avoided because it can lead to problems with Google and is against Webmaster Guidelines – not because you can get away with it.

What does this mean for SEO?

There are many ways to mislead Google, but in the end, it’s not worth it. Creating two separate websites with duplicate content is much more practical – search engines are able to identify individual users better than they ever would be able to with cloaking.

Google has always been against any “artificial” methods of manipulating search results and continues to get better at identifying them. Those who follow the rules have nothing to worry about – cloaking has been around for a long time and there are no reports of people being severely penalized simply for using it.

Nevertheless, Google’s not completely oblivious to cloak-like behavior – so avoid getting yourself into trouble by following Webmaster Guidelines and focusing on creating high-quality, unique content.

Types of Cloaking

There are various types of cloaking :

1. Content-based (and Googlebot based) cloaking:

Content-based cloaking is serving different content to Googlebot than it does to real users. A couple of common examples:

Serving a default page on your site when someone requests a non-existent page, typically by adding “default.html” to the end of the URL, which is what search engines will do if they can’t find your page.
Replacing all instances of “&” with “and” (and vice versa) – you may be surprised to see this listed as “cloaking”, but let’s turn it around… assuming someone requests page.php?q=example and you serve example.php?q=example , that’s cloaking – you’re serving different content to the user based on the User-Agent string.

2. Header-based (and Googlebot/user agent-based) cloaking:

Header-based cloaking is where a server sends out different content to someone depending on the HTTP header that they send. Header-based cloaking covers a wide variety of things, including:

Browser User-Agent string – this is probably the most common form of header-based cloaking and is typically used to serve different content to desktop and mobile users (e.g., desktops see the normal site while phones see a simplified mobile version) and also for showing different adverts/snippets to Google than to real users (e.g., a store may serve an expensive product instead of their usual snippet if someone is using AdSense).
IP Address – This one’s rare as it requires the use of GeoIP (which returns an IP address as a string like “127.0.0.1”) to identify the user and serve content accordingly (more on this over here ).
User-Agent string – as mentioned above, but can also be used for other things such as serving an API key only to Google Webmaster Tools (but not users).
Referrer – This is what happens when people link to you (e.g., someone links to your WordPress blog post in the comments), and it’s possible to adjust the content based on this header string.

3. Server-side request forgery (SRSF)

An attempt to fool a website into delivering different content based on certain factors, such as:

User-agent string – this is where scripts (e.g., perl/cgi) can be used to serve different content based on the user’s browser: e.g., Googlebot will get one thing while a mobile user will get something else.
Referer – Google uses referer for PageRank and personalized results, so it’s possible to serve different results based on this header.
HTTP verb – Different content can be served for different HTTP verbs (e.g., GET and POST) and also based on how long an action takes: e.g., Google will only see the 301 redirect if the click was a 302).

How is cloaking detected?

By using a variety of tools, some of which are listed below, it is possible to detect cloaking, including:

Sitebulb Crawler – this crawls your site with an additional GSC user-agent string and records what you’re serving to users/GSC. This can be extremely useful as it not only detects that a page is different but what’s different about it.
Sitebulb URL Comparison Tool – this takes two URLs, tells you what the differences are, and lets you download the results in bulk .
Screaming Frog SEO Spider – this crawls your site with an additional GSC user-agent string and records what you’re serving to users/GSC. This is good for crawling your entire site quickly and finding where pages differ, but isn’t as powerful as the Sitebulb Crawler (see below).
Lookee – this lets you input two URLs and will show you what’s different about them (including all headers and response codes) and put it into a nice table.
Ghostery – this is great for identifying if third-party widgets, such as AdSense, are being used in different ways on different pages.

Cloaking, gone the wrong way, generally invites various Google penalties. This article is supposed to tell you all you need to know about the thin line between cloaking and how to practice it, all the while avoiding Google penalties.

← Ideas to create better content for your website A COMPLETE GUIDE TO SEO: GOOGLE SEARCH CONSOLE →

Recent Posts

Archives

Categories

Submit a Comment Cancel reply