There are many definitions and uses of the word “canonical.” However, to search engines, “canonical” refers specifically to the indicator of the preferred version of the URL for each page of content on a website. Canonicalization directs search engines to the version that should be indexed and referenced when reflecting webpage’s ranking in search results. Google has provided specific guidelines, recommending keeping the website URL structure as simple as possible.
When search engines find multiple URLs pointing to a single page of content, it can “confuse” them as to which is the appropriate match for a given search. This, in turn, can create “duplicate content.” On most occasions, search engines do their best to pick the ideal match for a certain search query. However, if these engines cannot determine the best match from the variations that they find, search engines may not return to crawl any of them.
This illustrates why URL canonicalization is so important to effective SEO – and why duplicate content should be avoided at all costs.
Examples of Canonical Issues
There are many ways that duplicate content can be introduced on a website. Before we explain the methods to correct canonical issues, it’s important to note the importance of proactively getting in front of these issues and making sure that the URLs (Uniform Resource Locators) on your site are SEO-friendly to deliver optimal performance. Allowing the search engines to crawl and index multiple URLs to the same page of content weakens page authority by splitting the value gained through inbound links.
Here’s a little bit of office potluck humor for you with some SEO sprinkled in to describe how this works:
The good news is – if duplicate URLs are created – it is not only easy to identify them, but it’s also easy to fix them and to help search engines identify those URLs you prefer they index.
Protocol and Host Name
Canonicalization issues can occur at any point in the URL structure, and any point where multiple variations are introduced can create problems. Common examples include using both http and https, using www and non-www, or variations in the page names themselves.
The following URL variations could potentially point to the same page, so it is important to find and correct any of these on your website. One way to do so is to go to www.google.com or www.bing.com and search for “site:website.com,” replacing website.com with your website address. Look for any variations within the URL structure at the homepage and deeper levels within the website. The variations could be minor so here are some examples:
If your pages use a server-side technology such as PHP, ASP, ASP.NET, or ColdFusion, you could even see multiple versions of the homepage itself. Use the “site:” example above to look for variations of the homepage URL (index.php, default.asp, default.aspx, or index.cfm, respectively).
The preferred domain (variations of www vs non-www, and http vs https) can be managed easily in Google with a Google Search Console account.
Parameters and Session IDs
URL inconsistency doesn’t just apply to homepages, and in many cases, it’s a site-wide nightmare caused by unsavvy Ecommerce suites, content management systems, and blogging software. Pages may be accessible via several different URLs:
I won’t go into a lot of detail here, but the important takeaway is when using session IDs, parameters, and tracking IDs, you may have duplicate content issues. This is because search engines will see numerous URLs with the same content every time the search engines index your site.
Instead of tracking and session IDs, it is better to use your web analytics referrer and navigation path reports. If you absolutely must use session IDs, parameters, and/or tracking IDs, change your software to use a hash symbol (“#’”) instead of a question mark. Search engines ignore everything after the hash, so you’ll avoid confusion.
Parameters can be controlled in Google Search Console, and Bing Webmaster Tools.
Using Capitalization and Lower Case
Duplicate content can also be introduced by not controlling the letter case used in the URL structure. The search engines see both of these example pages as unique and valid paths to the same webpage:
The solution: It’s relatively easy to set up a rewrite rule that forces uppercase into lower case, just to make sure that this issue isn’t being introduced on the website.
At one time, duplicate content was such a major issue with all the search engines that they developed a solution and made it available to all webmasters to control it. In early 2009, Google, Yahoo, and Microsoft announced support for a new link element to make correcting duplicate URLs easier. It was called the canonical tag. Since then, search engines have become very good about honoring this tag.
A great example of where a canonical tag would be needed within an Ecommerce platform can be seen in the hypothetical variations below:
Each of these represent valid paths to the same product “1234”; however, they likely would create duplicate content. To solve this problem, using a canonical tag, the following should be added to the html:
Another common example occurs when the same product is referenced in multiple departments and categories. For example, the exact same product “aspirin” could be presented on the website in categories for drugs, health, vitamins, and cold remedies:
In this case, the following canonical tag can be set to establish the preferred path and ensure that only one is indexed and returned in the search results:
A 301 redirect, or “permanent redirect,” is the best practice to solve for known or anticipated URL canonicalization issues. These should be implemented when duplicate URLs to the same page of content are discovered in a search engine index, or they can be applied proactively when you are aware that an issue will occur following a domain change or site migration.
The 301 redirects are necessary in the following three instances:
- When a URL on your website is changing that has been indexed by the search engines and has the potential for organic search traffic.
- There are existing backlinks and you want to pass along the link value to the new URL.
- Or, there is the potential of continued direct traffic through the link. For example, if you are migrating a website to a brand landing page on the main website.
They can be safely removed when the three reasons above no longer exist. These redirects require server resources and can cause performance issues as these rules continue to stack up. These rules should be reviewed on a regular basis to eliminate any redirects that are no longer needed. It is not uncommon to see 301 redirects still in place two website redesigns later, using valuable bandwidth.
Most content marketers employ the best practice of having only one redirect in place per url. However, there may be instances where more than one “hop” may be needed. There is no hard rule for how many 301 redirects can be chained together. Google preference is a single hop, although multiple hops are acceptable and will continued to be crawled by Googlebot.
However, multiple 301 redirects can adversely affect performance. Although it has been reported that they do not result in a PageRank drop, a recent “accidental SEO test” from Wayfair uncovered a situation where over 10,000 redirects resulted in a 15% loss in organic traffic.
Managing 301 Redirects on Apache Servers
Probably the fastest and most widely used method of correcting URL inconsistencies, 301 redirects tell search engines that the content has been permanently moved to another address. If your web server runs Apache, a simple rewrite rule added to your .htaccess file will handle everything for you. Here’s an example of a rule that redirects all non-www requests to the www version. This is just an example – your web developer will know the exact code to use:
So what does this rule do? Basically the ‘(*.)$’ says tells the web server it should take anything that comes after http://website.com and append it to the end of http://www.website.com (which is the ‘$1’ part), and redirect to that URL. For more details and specifics on exactly how this works and how you can create custom rewrite rules for your website, visit this page.
Alternatively, you may also setup 301 redirects in the Apache config file, named httpd.conf.
Managing 301 Redirects with WordPress
If you have a WordPress site, managing 301 redirects is easy – as there are many plugins available. One of the highest rated plugins is Redirection, which earned on average 4.2 stars (out of 5) from frequent users (with more than 70% of users awarding it a full five stars).
The bottom line is that it is up to your team to determine whether your website has canonicalization issues, and to fix them to ensure that your site delivers solid performance in organic searches. Although search engines are getting much better at managing duplicate content, it is in your organization’s best interest to remain in control of your website performance – rather than to leave its fate in the hands of the search engines.
I urge you check out another article which discusses the best method to fight keyword cannibalization and duplicate content with a canonical SEO strategy.
Editor’s Note: This article was originally published in October 2010 and has been updated substantially for accuracy and comprehensiveness.