Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping fulltext to book images via annotations #29

Open
mekarpeles opened this issue Jun 8, 2018 · 7 comments
Open

Mapping fulltext to book images via annotations #29

mekarpeles opened this issue Jun 8, 2018 · 7 comments

Comments

@mekarpeles
Copy link
Member

mekarpeles commented Jun 8, 2018

For a public/unrestricted book (e.g. https://archive.org/details/TheGeometry) one can get the fulltext for each page (with word regions) via the following API:

https://api.archivelab.org/books/<identifier>/pages/<page#>/ocr?mode=words

e.g.
https://api.archivelab.org/books/TheGeometry/pages/10/ocr?mode=words

One can also get the results by paragraph by removing ?mode=words

cc: @num170r

@jcmundy
Copy link

jcmundy commented Mar 11, 2019

Thank you for providing this! I see five numbers for each word when I follow your link. I am used to seeing x, y, w, h. What is the fifth number?

@mekarpeles
Copy link
Member Author

Not sure! @rchrd2 ?

@rchrd2
Copy link

rchrd2 commented Mar 12, 2019

Unfortunately, I don't know either. I haven't modified the seach highlighting code. You may need to reverse engineer it a bit using a production book.

The code that processes the search results (using the archive.org api, not the archivelabs one) is here https://github.com/internetarchive/bookreader/blob/master/BookReader/plugins/plugin.search.js#L206

@amandelman
Copy link

Does this issue also cover indexing the annotations to make them available in IIIF search?

@mekarpeles
Copy link
Member Author

Nope -- we expose raw (e.g. OCR) data but don't map it via any search API. Feel free to extend the current service to achieve this.

We do / did have an experimental annotations service:
https://pragma.archivelab.org/
https://github.com/archivelabs/pragma.archivelab.org

But I'm not sure if it's still working.

Here is a demo of when it worked:
https://www.youtube.com/watch?v=FtcajyRQnqM

@amandelman
Copy link

Awesome. We'll add this to our backlog now that we have a little more clarity on the issue. Thank you!

@hadro
Copy link
Collaborator

hadro commented Mar 17, 2023

Related to IIIF v3 rewrite underway and specifically #80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants