0

URL Canonicalization

Recently I read about SEO and canonical Url, and I start to think that in these days and ages we have to be aware of this if we want our website to be "compatible and friendly" to search engines.

1.What is Canonicalization?

<link rel="canonical" href="http://www.example.com/" />

Simply talking, canonical url is the url that you want the visitors or users to see, normally it is the simplest and most representative of all the urls that represent the same page.

2.Why is that Canonicalization of url is important?

Why borther with canonicalization? What is the point of doing this since any urls can go to the same page. It is important because if we care about how easily people can search and find your site through search engines than it is inevitable to understand this clearly and deeply.

Example of different urls which point to the same pages

Some urls contains tracking params:
http://www.example.com/product/computer/?l_id=home_page
http://www.example.com/product/computer/?l_id=browsing_history
http://www.example.com/product/computer/

Above 2 urls all points to the same webpage but add more params to track where the origin is.

Another case is:

http://example.com/black-shoes
https://example.com/black-shoes
http://www.example.com/black-shoes

The server is configured to serve the same content for the www subdomain or the http protocol.

OK. I start to see your points, but why same page with different urls can be a problem after all?, you might ask.

The reasons are:

  1. Before search engine such as google show the links(which point to the page) to the searcher, it have to make sure that different page show has different content. Search engines and we also as the searcher hate to see duplicate contents on the different urls.

    So what do search engines do?

    • It have to consolidate link signals for the duplicate or similar content. It helps search engines to be able to consolidate the information they have for the individual URLs (such as links to them) on a single, preferred URL. This means that links from other sites to http://www.example.com/product/computer/?l_id=home_page get consolidated with links to http://www.example.com/product/computer/

    • And more than often it becomes a challenge to search engine than to you to track a single topic.

  2. You also might have the prefered url you want people to see.

  3. If you syndicate your content for publication on other domains, you want to consolidate page ranking to your preferred URL

Here comes the role of canonical url.

3. How to set canonical URL

We can tell search engine by doing the following:

1. Set preferred domain

Whether it is:

http://example.com or

http://www.example.com

You can tell google your prefered domain. You can do so by:

  • On the Search Console Home page, click the site you want.
  • Click the gear icon , and then click Site Settings.
  • In the Preferred domain section, select the option you want.

2. Indicate the preferred URL with the rel="canonical" link element

Mark up the canonical page and any other variants with a rel="canonical" link element. Add a <link> element with the attribute rel="canonical" to the <head> section of these pages:

<link rel="canonical" href="http://www.example.com/product/computer/" /> This indicates the preferred URL to use to access the green dress post, so that the search results will be more likely to show users that URL structure. (Note: We attempt to respect this, but cannot guarantee this in all cases.)

Avoid errors: use absolute paths rather than relative paths with the rel="canonical" link element. Use this structure: http://www.example.com/product/computer/ Not this: /product/computer/

3. Use 301 redirects for URLs that are not canonical

Suppose your page can be reached in multiple ways:

https://example.com/home

https://home.example.com

https://www.example.com

It's a good idea to pick one of those URLs as your preferred (canonical) destination, and use 301 redirects to send traffic from the other URLs to your preferred URL. A server-side 301 redirect is the best way to ensure that users and search engines are directed to the correct page. The 301 status code means that a page has permanently moved to a new location.

4. Indicate how to handle dynamic parameters

Use Parameter Handling to tell Google about any parameters you would like ignored. Ignoring certain parameters can reduce duplicate content in Google's index, and make your site more crawlable. For example, if you specify that the parameter l_id should be ignored, Google will consider http://www.example.com/product/computer/?l_id=browsing_history to be the same as http://www.example.com/product/computer/

5. Specify a canonical link in your HTTP header

If you can configure your server, you can use rel="canonical" HTTP headers to indicate the canonical URL for HTML documents and other files such as PDFs. Say your site makes the same PDF available via different URLs (for example, for tracking purposes), like this:

https://www.example.com/downloads/book.pdf

https://www.example.com/downloads/partner-1/book.pdf

https://www.example.com/downloads/partner-2/book.pdf

https://www.example.com/downloads/partner-3/book.pdf In this case, you can use a rel="canonical" HTTP header to specify to Google the canonical URL for the PDF file, as follows:

Link: https://www.example.com/downloads/book.pdf; rel="canonical" Google currently supports these link header elements for Web Search only.

6. Prefer HTTPS over HTTP for canonical URLs

Google prefers HTTPS pages over equivalent HTTP pages as canonical, except when there are conflicting signals such as the following:

  • The HTTPS page has an invalid SSL certificate.

  • The HTTPS page contains insecure dependencies.

  • The HTTPS page is roboted (and the HTTP page is not).

  • The HTTPS page redirects users to or through an HTTP page.

  • The HTTPS page has a rel="canonical" link to the HTTP page.

  • The HTTPS page contains a noindex robots meta tag

Although our systems prefer HTTPS pages over HTTP pages by default, you can ensure this behavior by taking any of the following actions:

  • Add 301 or 302 redirects from the HTTP page to the HTTPS page.

  • Add a rel="canonical" link from the HTTP page to the HTTPS page.

  • Implement HSTS(HTTP Strict Transport Security)

4. Conclusion

I hope you can get some ideas of how important canonical urls are and try to apply it in your project.


All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí