Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.3][SEF] Elimination of duplicate pages, SEO improvements: redirects for IDs and www, canonical links #44310

Open
6 of 9 tasks
universewrld opened this issue Oct 18, 2024 · 10 comments

Comments

@universewrld
Copy link

universewrld commented Oct 18, 2024

Is your feature request related to a problem? Please describe.

Even though Joomla 5.2 has some new settings to eliminate duplicate pages, there are still #44263 duplicate pages in the CMS.
I suggest reducing the number of duplicate pages in @joomla.

Describe the solution you'd like

New options for the System - SEF plugin:

  1. Redirect pages with ID (articles, categories) to the version of pages without ID
  2. Redirect from the version of a page with www to the version of a page without www (or vice versa)
  3. Canonical links for pagination pages (for pages like ?start=10)

What Joomla has already done to reduce duplicate pages:

  • Search Engine Friendly URLs
  • Use URL Rewriting
  • Force HTTPS
  • Strict handling of index.php
  • Trailing slash for URLs
  • Strict Routing
  • WWW redirect
  • IDs redirect
  • Canonical links

Additional context

This will help eliminate almost all duplicate pages for search engines.
The fewer duplicate pages processed by search robots like @google, @google-gemini, @microsoft, @openai, etc, the less energy will be released by data processing centers, the less effect will be on the environment and climate change.

What is canonicalization - https://developers.google.com/search/docs/crawling-indexing/canonicalization
Redirects and Google Search - https://developers.google.com/search/docs/crawling-indexing/301-redirects
How to specify a canonical URL with rel="canonical" and other methods - https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls

@fgsw
Copy link

fgsw commented Oct 19, 2024

Duplicate of #44263

@universewrld
Copy link
Author

Duplicate of #44263

this is not a duplicate, this is a feature request.
the previous issue was a bug report. do you see the difference?

you can close the previous issue #44263, but not this one.

@alikon alikon added the Feature label Oct 19, 2024
@richard67
Copy link
Member

  1. Redirect from the version of a page with www to the version of a page without www

What if I want it vice versa, redirect non www to www?

@universewrld
Copy link
Author

  1. Redirect from the version of a page with www to the version of a page without www

What if I want it vice versa, redirect non www to www?

yes, that's what I meant.
there should be an option for WWW:

  • don't use redirect
  • redirect to WWW
  • redirect to no WWW

@Mich-es
Copy link

Mich-es commented Oct 30, 2024

Hii - Huge Problem with J5.2

Google has a problem with J5.2. Joomla now appends a rel=‘canonical’ to every crap page and Google no longer knows what the original is. This is a serious bug and should be fixed as soon as possible. On a site with around 12,000 URLs, I have 4,800 duplicate content pages with rel=‘canonical’ in 3 days. This must be solved with J5.2.1 and not with J5.3!

Greetings Mitches

@simbus82
Copy link
Contributor

Canonical links for pagination pages (for pages like ?start=10)

Brrr, this is a really wrong approach.
Screenshot_20241030-221113.jpg

You should canonicalise paginated pages only to a "view all" page, not to a page that show only a limited number of child pages.

@universewrld
Copy link
Author

universewrld commented Nov 4, 2024

Canonical links for pagination pages (for pages like ?start=10)

Brrr, this is a really wrong approach. Screenshot_20241030-221113.jpg

You should canonicalise paginated pages only to a "view all" page, not to a page that show only a limited number of child pages.

the image shows that all pages with pagination such as /blog/page2, /blog/page3, etc. should specify /blog as the canonical first page.

In @joomla this would look like pages like /blog?start=10 and /blog?start=25 would point to /blog as the canonical blog page.

@simbus82
Copy link
Contributor

simbus82 commented Nov 4, 2024

It's the same, forgive my bluntness, but these are basic SEO concepts.

The "/blog" page on a Joomla site is typically already limited in the number of items (e.g., it shows intros to the latest 10 blog posts), so it's NOT suitable to become the canonical for a "/blog?start=10".

If you set a canonical tag that always points to /blog, you're telling search engines that all paginated pages (/blog?start=10, /blog?start=20, etc.) are identical to /blog. This is absolutely incorrect and causes a host of issues, including:

Loss of Content Indexing

Search engines will ignore pages after the first one (/blog, which shows ONLY 10 links to the underlying posts), because the canonical tag says that all paginated content (whether you like it or not, ?start=10 is pagination) is a duplicate of the initial page. In practice, search engines would only see the first 10 articles in the blog section, overlooking content on subsequent pages.

Redistribution of Link Juice to a Limited URL, Resulting in Loss of Link Juice to Posts Beyond the Tenth

When different pages all point to a single URL via an incorrect canonical (e.g., all paginated blog pages point to /blog), search engines concentrate the link juice on the declared canonical URL (/blog), ignoring the other pages.
As a result, the paginated pages (like /blog?start=10, /blog?start=20, etc.) lose their individual authority and fail to pass link juice to the posts or content within or beneath those pages.

Sitemap?

And even if you submit all blog post links in a sitemap to Google or other search engines, this doesn’t automatically guarantee that those links will be indexed correctly or receive the right amount of link juice (authority) if the canonical tag isn’t set up properly.

The sitemap is merely a list that helps search engines discover your URLs, but it doesn’t determine which version of a URL is considered the primary one, nor how internal links pass authority (link juice) within the site. If you have an incorrect canonical setup (for example, all paginated pages pointing to /blog as the canonical), you’re telling search engines that only the canonical page ("/blog") is the main version of all content.

@universewrld
Copy link
Author

What is canonicalization

Canonicalization is the process of selecting the representative –canonical– URL of a piece of content. Consequently, a canonical URL is the URL of a page that Google chose as the most representative from a set of duplicate pages. Often called deduplication, this process helps Google show only one version of the otherwise duplicate content in its search results.

There are many reasons why a site may have duplicate content:

  • Region variants: for example, a piece of content for the USA and the UK, accessible from different URLs, but essentially the same content in the same language
  • Device variants: for example, a page with both a mobile and a desktop version
  • Protocol variants: for example, the HTTP and HTTPS versions of a site
  • Site functions: for example, the results of sorting and filtering functions of a category page
  • Accidental variants: for example, the demo version of the site is accidentally left accessible to crawlers

SOURCE: https://developers.google.com/search/docs/crawling-indexing/canonicalization

@simbus82 you are completely wrong!
Pages with pagination are duplicate content and @google says so!
This is a duplicate of content that is related to the site's function and is indicated in the official Google Help!

Image

@simbus82
Copy link
Contributor

simbus82 commented Nov 5, 2024

You really have no understanding of what pagination is and the context in which it should be managed or not with canonical.

https://developers.google.com/search/blog/2013/04/5-common-mistakes-with-relcanonical
Image
Image

https://developers.google.com/search/docs/specialty/ecommerce/pagination-and-incremental-page-loading
Image

https://yoast.com/rel-canonical/
Image

https://cognitiveseo.com/blog/19204/canonical-urls-seo/
Image

https://searchengineland.com/pagination-strategies-in-the-real-world-81204
Image

But think what you want, I have no gain from wasting time convincing you, about this so basic thing for a SEO Junior.
I hope the team is more competent than you and does not approve this nonsense.

I'm alwasy happy to give my help, the result of 20 years of web development and digital marketing, and over 10 years of SEO Specialist stuff for 6-figure projects.

You need to start studying, you're really a newbie in SEO.
I recommend The Art of SEO book.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants