Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated ( ... ) download still happening even after #133 fix #150

Closed
mukuntharajaa opened this issue Nov 3, 2019 · 80 comments
Closed

Comments

@mukuntharajaa
Copy link

I am on master branch and currently updated to Oct 14 2019 commit. Still I am seeing truncated chapter downloads.

Book id: 9781491908419

Chapter 2: Item 5: second page shows "..." and Item 6 is altogether missing.

Please let me know, if any further information is required.

@brookscl
Copy link

brookscl commented Nov 3, 2019

Same here.

@vavdoshka
Copy link

+1

3 similar comments
@spac-valentin
Copy link

+1

@phamhoangtuan
Copy link

+1

@sorinescu
Copy link

+1

@manfredlotz
Copy link

I have the same issue

@Azarakhsh
Copy link

+1

@ghistes
Copy link

ghistes commented Nov 10, 2019

I have the same problem.

It seems to me that the problem is the login. Even though you get a 200 response-code when logging in, you never get a sessionid-cookie, and for that reason when requesting the chapters you are treated as if you are not logged in, resulting in the truncations - at least that's how it looked to me when I was trying to understand what is going on (not sure if it helps...).

@elrob
Copy link

elrob commented Nov 11, 2019

The above PR fixes this issue for me

@manfredlotz
Copy link

Your fix worked fine for me too. Thanks a lot for your work!

@manfredlotz
Copy link

I tried a couple of downloads, and mostly the epubs are not really usable. @elrob : However, this is not the fault of your fix.

@milktea02
Copy link

Still having issues :( even with #152

@elrob
Copy link

elrob commented Nov 12, 2019

I tried a couple of downloads, and mostly the epubs are not really usable. @elrob : However, this is not the fault of your fix.

@manfredlotz This issue is about the truncation of output as if you're not logged in. The PR I created doesn't change anything in the epub creation. I have had no issues with three books I've since tested with. Definitely usable for me. Can you give me an example of a book you've had issues with? And what those issues are?

@elrob
Copy link

elrob commented Nov 12, 2019

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@mukuntharajaa
Copy link
Author

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

I have tried the same book again ( 9781491908419 ). I am able to see contents now without ellipsis. But when I click chapter 6, it takes me to last page of chapter 6 properly, but shows chapter 5 as highlighted on the left hand side layout.

Guess this is some minor stuff.

@elrob
Copy link

elrob commented Nov 12, 2019

I have tried the same book again ( 9781491908419 ). I am able to see contents now without ellipsis. But when I click chapter 6, it takes me to last page of chapter 6 properly, but shows chapter 5 as highlighted on the left hand side layout.

Guess this is some minor stuff.

@mukuntharajaa Thanks for the response. If it is an issue you would like to raise and get fixed then I recommend creating a new github issue for it. This github issue was around the truncation of chapters due to authentication issues. So for now, if/when @lorenzodifuccia accepts #152 then github issue would be fixed.

@varta2014
Copy link

can we try this code please
thank you

@elrob
Copy link

elrob commented Nov 12, 2019

can we try this code please
thank you

@varta2014 If you want to try my change before it is merged into this repository then you can just pull it from https://github.com/elrob/safaribooks

@manfredlotz
Copy link

@elrob Unfortunately, I don't remember which book download I tried. I know that FBReader crashed when opening the epub. The last downloads I did were ok.

@milktea02
Copy link

milktea02 commented Nov 12, 2019

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@elrob Tried Clean Code (9780136083238) and still get truncation. I'm logging in via SSO if that might be the issue.

@AsimShakour
Copy link

I am having truncation with book: 9781119449270 in this area: https://learning.oreilly.com/library/view/professional-c-7/9781119449270/fintro.xhtml

Thanks

@elrob
Copy link

elrob commented Nov 13, 2019

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@elrob Tried Clean Code (9780136083238) and still get truncation. I'm logging in via SSO if that might be the issue.

@milktea02 I have updated my change to restore the code that I thought was unnecessary. It was unnecessary for me but I'm not using SSO. Maybe you can try the latest version of my branch and see if it works for you now. I don't have SSO so I can't test it myself.

@AsimShakour Are you using SSO too? Maybe that's the problem. Can you also try with the latest change I have made (updated just now).

@varta2014
Copy link

elrob thank you code work perfect !

@vikdean
Copy link

vikdean commented Nov 13, 2019

Still having issues :( even with #152

@milktea02 What issues are you having? Are they related to truncation (this github issue tracks the truncation problem)?

@elrob Tried Clean Code (9780136083238) and still get truncation. I'm logging in via SSO if that might be the issue.

@milktea02 I have updated my change to restore the code that I thought was unnecessary. It was unnecessary for me but I'm not using SSO. Maybe you can try the latest version of my branch and see if it works for you now. I don't have SSO so I can't test it myself.

@AsimShakour Are you using SSO too? Maybe that's the problem. Can you also try with the latest change I have made (updated just now).

Just tested it with 9780135262047; SSO works, but it still downloads the books partially.

@brookscl
Copy link

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

@vikdean
Copy link

vikdean commented Nov 13, 2019

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

@mukuntharajaa
Copy link
Author

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

I have also tried downloading this ebook and accessed random pages, @elrob`s fix is working fine.

@vikdean
Copy link

vikdean commented Nov 14, 2019

For those of you still having trouble: delete the Books directory that is created for the downloads. Then retry your download. I found that the tool will not re-download chapters it thinks are already there. I was able to download book 9781119558439 without any problems. Not familiar, but it seemed complete.

Tried it 3 times in a row, issue is still the same for 9780135262047

I have also tried downloading this ebook and accessed random pages, @elrob`s fix is working fine.

Check the Chapter beginnings... it only captures a couple of lines, the rest is truncated...
Also, whats the epub size for you? Mine is 3MB

@azmatsiddique
Copy link

please provide images or video to get cookies.json file from inpection in mac

@obar1
Copy link

obar1 commented Jul 7, 2020

@azmatsiddique ahhaha are you serious

@darshanmnyk
Copy link

Thanks a lot guys! Works splendidly.

@MuhammedElGanzory
Copy link

Thanks a lot guys! Works splendidly.

can you help me I'm trying to download also !!

@MuhammedElGanzory
Copy link

how to solve !!
Traceback (most recent call last):
File "C:\Users\LuckyMoon\Downloads\safarinew\safaribooks.py", line 10, in
import requests
ModuleNotFoundError: No module named 'requests'

@dan-r95
Copy link

dan-r95 commented Feb 9, 2021

The only cookie which you really needs is the orm-jwt I think.
In chrome, navigate to the cookies tab and search for orm-jwt.
image

@EmanuelMtzV
Copy link

@azmatsiddique ahhaha are you serious

i cant get the cookies.json file either. Any clue where to get it ?

@akriaueno
Copy link

Below script works for me.
Paste this script into console of Chrome DevTools and get cookies.

console.log(JSON.stringify(document.cookie.split(';').map(c => c.split('=')).map(i => [i[0].trim(), i[1].trim()]).reduce((r, i) => {r[i[0]] = i[1]; return r;}, {})))

@albertocavalcante
Copy link

@vikdean
I think I've found the problem.
Using document.cookie from the console does not include the HttpOnly cookies and they are definitely required.
I can't work out how to access these via the console but I was able to find a way to get them that isn't too painful.

  1. Login as usual to https://learning.oreilly.com/
  2. Open the developer tools with F12
  3. Go to Network tab in the developer tools
  4. Access the profile page in the browser: https://learning.oreilly.com/profile/
  5. In the Network tab, click on the request to /profile/ (it should be the first one)
  6. Click on the Cookies tab in the request information
  7. Right-click on the Request cookies text and choose Copy All
  8. Paste this into the cookies.json file and then remove the outer section of the JSON document
  9. Run the script without passing credentials: python3 safaribooks.py 9780135262047

p.s. sudo is not necessary.

I was unable to right click Request cookies and find a Copy all option.
Instead, I went to the Headers tab,
scrolled to Request Headers,
right clicked cookie and clicked on copy value.

copy_cookie

Then, I executed sso_cookies.py passing the clipboard content as argument, wrapped in double quotes.

sso_cookies

@munish259272
Copy link

@vikdean
I think I've found the problem.
Using document.cookie from the console does not include the HttpOnly cookies and they are definitely required.
I can't work out how to access these via the console but I was able to find a way to get them that isn't too painful.

  1. Login as usual to https://learning.oreilly.com/
  2. Open the developer tools with F12
  3. Go to Network tab in the developer tools
  4. Access the profile page in the browser: https://learning.oreilly.com/profile/
  5. In the Network tab, click on the request to /profile/ (it should be the first one)
  6. Click on the Cookies tab in the request information
  7. Right-click on the Request cookies text and choose Copy All
  8. Paste this into the cookies.json file and then remove the outer section of the JSON document
  9. Run the script without passing credentials: python3 safaribooks.py 9780135262047

p.s. sudo is not necessary.

I think something is not right or just changed. I tried but this repo master and yours @elrob with no success. The Developer Tools Network tab inside the cookies section (profile page) won’t show any httpOnly cookie. Don’t know if this is just me.

@villancikos
I have just tested again with my fork of the repo. It is working fine for me when I download the cookies following the instructions above. httpOnly doesn't refer to the name of a cookie. Two of the cookies groot_sessionid and orm-rt are set to httpOnly=true so it means some other methods of downloading the cookies don't work. The method above does work for me in firefox. If you're still having an issue, can you explain where it is going wrong and I might be able to help.

Hi @elrob . Thanks for your answer. First, I know httpOnly is a type of cookie. It is strange that my Chrome does not "tick" them in the dev tools.
I tried adding the common cookies using the javascript output and manually these two cookies: groot_sessionid and orm-rt but I am still getting a truncated epub.
BTW I am using your repo on the master branch.
Just for reference, the book id is 9781491973783.

This does not work anymore

@domrany64
Copy link

I could use it perfectly fine, after trouble I had to obtain the cookies in the right structure.
To get the cookies, I'm using a Chrome extension called Cookie-Editor.

  1. Open the O'Reilly website in the Chrome and log in using SSO
  2. Open Cookie-Editor.
  3. Click EXPORT which is the right icon in the bottom of Cookie-Editor's window.
  4. Paste the cookies (which are in the clipboard, now) to an editor.
  5. Find the "name": "orm-jwt", among the text and copy the value from that section
  6. Create the cookies.json file like this {"orm-jwt": "XXX"} where XXX is copied value from step 5
  7. Run python3 safaribooks.py XXXXXXXXXXX and enjoy the EPUB book.

@Marakai
Copy link

Marakai commented Oct 14, 2022

Below script works for me. Paste this script into console of Chrome DevTools and get cookies.

console.log(JSON.stringify(document.cookie.split(';').map(c => c.split('=')).map(i => [i[0].trim(), i[1].trim()]).reduce((r, i) => {r[i[0]] = i[1]; return r;}, {})))

Using this with Chrome in mid 2022 and it seems to be the easiest approach by far. Copy-pasted into cookies.json in the script directory and it works like a charm!

@MrDandas
Copy link

MrDandas commented Nov 7, 2022

might add PR later, but for now, I've provided my own solution based on browser_cookie3. I just log in through browser and let library to grab cookies from browser:

#            self.session.cookies.update(json.load(open(COOKIES_FILE)))
            self.session.cookies = browser_cookie3.firefox(domain_name='oreilly.com')

@Metal-Milonga
Copy link

Below script works for me. Paste this script into console of Chrome DevTools and get cookies.

console.log(JSON.stringify(document.cookie.split(';').map(c => c.split('=')).map(i => [i[0].trim(), i[1].trim()]).reduce((r, i) => {r[i[0]] = i[1]; return r;}, {})))

Using this with Chrome in mid 2022 and it seems to be the easiest approach by far. Copy-pasted into cookies.json in the script directory and it works like a charm!

worked again with Chrome in July 2023. Copy the returned json to cookies.json, and download was successful.

@KonScanner
Copy link

@domrany64's answer still works!

@eurubkov
Copy link

eurubkov commented Jan 8, 2024

I am not finding the "name": "orm-jwt", or just "orm-jwt" in the cookies at all. Is it still working for others?

@eurubkov
Copy link

eurubkov commented Jan 8, 2024

Looks like it's the Cookie-Editor extension that didn't get all the cookies.
Here's how I got it instead:
Right-Click -> Inspect Element -> Application -> Cookies -> selecting the orelly website -> orm-jwt and copying the value from there.
Then created the cookies.json as mentioned above and it worked.

@yuletide
Copy link

yuletide commented May 30, 2024

This method generally worked for me per @albertocavalcante's post

This tool is a lifesaver for anyone who gets access to this site via their library (like SFPL) since the institutional login UI is completely useless and doesn't save your reading progress, lists, or anything else. Not sure how anyone uses the site at all without a tool like this one

@henryheim
Copy link

Like some others, I have an SSO login to O'Reilly and had trouble with the script as it is in the master branch. @albertocavalcante 's method, combined with the dev tools one-liner by @akriaueno fixed everything and the tool works easily now. Just save the output of the dev tools script into cookies.json in the safaribooks directory and run the script without any auth parameters.

Thank you both for your debugging and guides.

@tuha1994
Copy link

tuha1994 commented Nov 7, 2024

@azmatsiddique ahhaha are you serious

i cant get the cookies.json file either. Any clue where to get it ?

create a file 'cookies.json' in the same folder as safaribooks, then open F12 in the sonsole tab and paste the following string to get cookies, copy all the returned values ​​and save them to cookies.json. then run the command 'python safaribooks.py xxxxx' where xxxxx is your document code

@jjmca
Copy link

jjmca commented Nov 16, 2024

For people hitting the error [#] Authentication issue: unable to access profile page. error, I had to make a modification to allow redirects. Changing allow_redirects=True here solved my issue and worked flawlessly. Great tool, thank you!

response = getattr(self.session, "post" if is_post else "get")(
    url,
    data=data,
    allow_redirects=True,
    **kwargs
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.