-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing index.html
within link generated to designate the canonical URL
#483
Comments
ObservationsWe did a few orientation flights on this topic together with @msbt, and came to the conclusion that RTD might have deprecated the "canonical_url" thing already, as it might only have been required for early versions of Sphinx<1.8 and RTD of that times. Today, it is advised to use
... but not define it:
ThoughtsIn this case, the section in readthedocs-insert.html.tmpl might actually be a backward-compatibility thing? References IWe are not sure if each one of them is relevant. However, all are about fixing or improving the situation wrt. canonical links, in one way or another. In this spirit, I am enumerating them here, because there is a chance we missed something on the ugprade path since Sphinx 1.8 (~10 years ago?).
References IIAlso discovered those, from 2023. |
Through some cleanups and refactorings, we removed some configuration overhead, and fixed the issue described above, still using a few Crate-specific workarounds. The improvements have been released with version 0.31.2. Thanks for your excellent support, @msbt! 💯 |
Problem
On a page rendered by
index.rst
/index.md
files, like this one about built-in functions, theindex.html
page name is omitted on the rendered variant of the<link rel="canonical"
representation.This flaw causes all sorts of downstream problems.
Details
@msbt is outlining more details about the problem. Thanks!
The massive amount of non-indexed pages are a result of our docs setup. The top 2 non-Google-issues (Alternate page with proper canonical tag and Page with redirect) are mostly because of the redirect chains and versioning that we have in place.
If you take this URL as an example:
https://cratedb.com/docs/crate/reference/en/master/general/builtins/subquery-expressions.html
The page also exists in these (and probably some more) versions:
https://cratedb.com/docs/crate/reference/en/5.6/general/builtins/subquery-expressions.html
https://cratedb.com/docs/crate/reference/en/5.5/general/builtins/subquery-expressions.html
Both links above have this URL set as canonical:
https://cratedb.com/docs/crate/reference/en/latest/general/builtins/subquery-expressions.html
This will obviously result in a lof of unindexed pages, not sure if this can be fixed since it's not really broken.
To show an example of your links, this URL shows as not-indexed because of "Alternate page with proper canonical tag":
https://cratedb.com/docs/guide/install/cloud/aws/index.html
If you inspect that URL, you can see the "User-declared canonical" is:
https://cratedb.com/docs/guide/install/cloud/aws/ (which is indexed)
So the
index.html
gets omitted by RTD and every docs-page ending withindex.html
gets a not-indexed issue attached to it. Can we maybe add that to the canonical URL to avoid that @amotl?References
index.html
in canonical URLs #482/cc @matkuliak, @michaelkremmel
The text was updated successfully, but these errors were encountered: