Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about modifiying content within the html file #32

Open
cupofjoe0 opened this issue Oct 14, 2024 · 1 comment
Open

Questions about modifiying content within the html file #32

cupofjoe0 opened this issue Oct 14, 2024 · 1 comment

Comments

@cupofjoe0
Copy link

cupofjoe0 commented Oct 14, 2024

Forgive me since I'm very new at this type of thing.

I've read through all the links and documents but I'm still lost

I'm using the command line tool

I've managed to download the tweets I need from an account and everything is working fine. I have two questions and I hope they're not too dumb haha.

I like the html file that is downloaded that contains the tweets. But the questions are about what's within the file

  1. is there a way to only have the text/html tweets and not the application/json ones within the html file?
  2. Is there an option to only have the "Click to load the iframe from Archived Tweet" menu, and not the three others?

Thanks for helping!

@claromes
Copy link
Owner

claromes commented Nov 2, 2024

Hi. Sorry for the delay in responding, and feel free to ask any questions.

  1. is there a way to only have the text/html tweets and not the application/json ones within the html file?

It’s partially possible.
If the tweet still exists, I can return the text, which is saved in the CSV file. If it doesn’t exist but is returned in JSON format, I can retrieve the text. Currently, the library doesn’t do this due to the API’s rate limiting, which restricts the number of requests and can lead to requests being blocked. The function exists but isn’t used.
If the return is in HTML, it’s not possible, as I would need to write a scraper for each Twitter/X UI type.

To address this, I’m considering an extra module that extracts the text from the iframe using AI… I could work on something like that; it’s a bit labor-intensive but sounds interesting.

  1. Is there an option to only have the "Click to load the iframe from Archived Tweet" menu, and not the three others?

The idea with the other three is that if the original option no longer exists (as links may have changed since they were saved), it would be possible to access them another way. This is common with images saved in 2011 (the tweet link doesn’t exist, but the image link does).

To streamline the view, I could implement an option to return only iframes with original links and another option to return everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants