Skip to content
This repository has been archived by the owner on Dec 7, 2023. It is now read-only.

Doc Bug: documents hosted by gitbook are not indexed by search engines #212

Open
jaxxzer opened this issue Apr 24, 2019 · 10 comments
Open

Comments

@jaxxzer
Copy link
Contributor

jaxxzer commented Apr 24, 2019

Screenshot from 2019-04-23 20-30-24

There was an issue about this in gitbook's github repository, they have removed the issues list entirely: https://github.com/GitbookIO/gitbook

We have emailed gitbook, but that conversation has not yet been helpful.

I'm not sure if this issue exists on their newer documentation platform (which is version controlled, but not using git).

@hamishwillee
Copy link
Collaborator

hamishwillee commented Apr 29, 2019

I don't see those headers on the same page (using latest Chrome):
image

Is there something special you had to do to see them?

@jaxxzer
Copy link
Contributor Author

jaxxzer commented Apr 29, 2019

They've fixed it!

@jaxxzer jaxxzer closed this as completed Apr 29, 2019
@jaxxzer
Copy link
Contributor Author

jaxxzer commented Apr 29, 2019

on futher inspection, it looks like the behavior may not be any different than when this issue was opened. @hamishwillee:

  • I'm using google chrome also
  • the noindex header appears in the associated .html file for each content page
  • random pages are more likely to have the noindex tag
  • refreshing the page causes the x-cache to say HIT, and the noindex disappears (fetching from CDN?)

I'm not sure exactly how all of this works. We are reaching out to gitbook support for an update.

@jaxxzer jaxxzer reopened this Apr 29, 2019
@hamishwillee
Copy link
Collaborator

hamishwillee commented Apr 30, 2019

Thanks @jaxxzer . I'll wait on your/gitbook response. I'm not too concerned because if you do check google, you can see that pages are indeed searchable.

@Williangalvani
Copy link
Contributor

Williangalvani commented May 7, 2019

FYI. This is in Google Console
Screenshot from 2019-05-07 12-56-59

My first contact with the Gitbook team was March 22 (via site form), they replied April 18.
I replied this last email April 19, and again April 29, with no answer.
This might take a while.

All they said was

Hi William, my sincere apologies about this late reply.

Are you still having the issue? Usually, indexation with Google shouldn't be a huge problem even if we know there's room for improvements on the topic and we're working on this.

I'll be here if you need anything.

@jaxxzer
Copy link
Contributor Author

jaxxzer commented May 7, 2019

@hamishwillee It looks like you will still get some hits on google, but they are not currently indexed so the situation is not ideal/optimized.

@hamishwillee
Copy link
Collaborator

@jaxxzer I don't own the domains personally so I can't run the same tools. However I ran google search using site:xxxxx for PX4 User guide and MAVlink.io.
Mavlink.io is hosted on legacy.github.io and shows quite a few missing pages - ie you are correct that indexing is not working properly.
PX4 is showing up about the right number of pages, and I can see recent reindexing of pages.

So as a point of information, yes this is indeed a problem for gitbook hosted docs, but it does not appear to be a problem if you self host.

I could move my other libraries to self hosting I guess. We just build them using jenkins and then host on github pages. The big pain is if you support versions - because on github pages there is no way to redirect to a specific version to keep links all working nicely (so we just duplicate one of the trees).

@hamishwillee
Copy link
Collaborator

hamishwillee commented May 8, 2019

PS Have you look at other doc toolchains yet? Interested to chat through your findings. While we won't change now - as the toolchain does most of what we want - we may need to in future.

@jaxxzer
Copy link
Contributor Author

jaxxzer commented May 14, 2019

I've messed with jekyll and sphinx, I'm not a fan. We are satisfied with mkdocs (simple, yet flexible) as a replacement for gitbook: http://docs.bluerobotics.com/ping-viewer/

As an intermediate solution for the indexing, I think we will build our gitbook website, then put the output on gh-pages.

The big pain is if you support versions - because on github pages there is no way to redirect to a specific version to keep links all working nicely (so we just duplicate one of the trees).

I may be able to help you conceive some ways around this. Can you point me to your current solution re 'versions'?

@hamishwillee
Copy link
Collaborator

hamishwillee commented May 15, 2019

Hi @jaxxzer

Thanks very much. Re jekyll, there are some reasonable docs themes now, but to me it feels quite immature. I love sphinx, but no one loves restructuredText.

mkdocs looks OK. Like any system the issue for me is migration costs and the things it might not do that you already use.
For example, we'd have to change all the notes we use to admonitions. What annoys me most about that is that the rendering of gitbook notes works well in github, but I doubt that is true for mkdocs markdown. Other big pain points are any tables.

Have you tried to migrate much of your content?

  • What were the main pain points?
  • Did search work out of the box?
  • Is there support for a version picker? This is something useful in gitbook because it allows old prebuilt version to display the most current version options.
  • Is there support for internationalisation (doesn't seem to be ...)

I will certainly keep this in mind, but would also look at hugo at the point a decision was made. I won't be doing anything short term, because even though gitbook isn't maintained it does pretty much everything we need right now. The trigger to move would be that the content size outgrows gitbook build capability OR we move to a non SSG solution.

I may be able to help you conceive some ways around this. Can you point me to your current solution re 'versions'?

That would be awesome. Is this sufficient? : https://github.com/PX4/px4_user_guide/wiki#cutting-a-new-version

I was thinking about replacing the root/stable version with just a link to the other versions. This would be cleaner, but would break all existing links from elsewhere and ruin google search ratings.
Another alternative would be to generate pages with a manual redirect. A bit ugly, but could be a reasonable intermediate step as the redirects would eventually be understood by google.

@mavlink mavlink deleted a comment from Jacob-Si Sep 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants