Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚀 Feature: Scrape for specific news, like all articles on AI, or JavaScript, etc.. #4

Open
2 tasks done
artknight opened this issue Aug 5, 2024 · 9 comments
Open
2 tasks done
Assignees

Comments

@artknight
Copy link

🔖 Feature description

I would love to be able to specify what I am looking for (AI, JS, Vue, etc...) and have the script get me all the posts that are related to the particular topic

🎤 Pitch

In my use-case I would love to get the JS, Vue3 and AI news

👀 Have you spent some time to check if this issue has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

@bharatr21
Copy link
Owner

@artknight thanks for reaching out and proposing a feature!
Can you help narrow the scope of this feature a little?
a) Is it a simple keyword search? Eg:- It should search for exactly "JS, Vue3 and AI". This can be added relatively quickly
b) Does it involve boolean operators as well like AND and OR? Eg:- "JS OR Vue3 OR AI" should give search results belonging to any of the topics in the list ['JS', 'Vue3', 'AI'].
Feature b) can take a while and requires further follow up to narrow down the requirements (What are the list of boolean operators needed? Should there be an escaping mechanism where "and", "or" are themselves keywords? etc.)

@artknight
Copy link
Author

artknight commented Aug 5, 2024

So, I am running a newsletter (dailysandbox.com) that focuses on specific technologies - JS, CSS, AI, Node, Postgres, and Bash. I would love to have a script that would be able to give me an aggregate of the stories posted on HN for each of the topics. That way I could easily review the results and, possibly use it in the newsletter

It would be even more awesome if I could combine keywords such as JS + AI and get only the posts that have both :)

@bharatr21
Copy link
Owner

bharatr21 commented Aug 5, 2024

At a quick glance, the search on https://hackernews.com/ is actually https://hn.algolia.com/ (which is the API endpoint I'll consume for the scraper) seems to be powered by Algolia (https://www.algolia.com/developers/) and they do seem to support Boolean operators, so this should be feasible 👍
I'm just wondering how to accept input for an AND search query (I would just suggest a list of topics in place of the OR, for example ['JS', 'CSS'] to mean JS OR CSS, but I'm wondering what to do for the AND since allowing free form strings can get really messy in terms of string parsing and implementation details)
@artknight What are your thoughts here?

@artknight
Copy link
Author

Yeah, I see what you mean 🤔. For now let's focus on just the OR operator. Let me think about the AND a little more. Thank you for being so responsive!!

@bharatr21
Copy link
Owner

bharatr21 commented Aug 6, 2024

Yeah, I see what you mean 🤔. For now let's focus on just the OR operator. Let me think about the AND a little more. Thank you for being so responsive!!

Sorry, there will be a slight modification to the expected input, I was planning to accept a list of search terms to hit one-by-one, so for instance ['JS', 'AI'] would mean just searching for JS, followed by just searching for AI, and as for the OR operator it would be a nested list like ['JS', 'AI', ['Vue', 'React']] to search and aggregate for JS, then search for AI, then search for "Vue OR React". Right now let's start simple and support only 1 level of nesting for the OR operator.

@artknight let me know if this works.

@artknight
Copy link
Author

Yeah, that will work. How quickly do you think you can implement it?

@bharatr21
Copy link
Owner

Yeah, that will work. How quickly do you think you can implement it?

I'll need at least a day (perhaps even more because I'm juggling other responsibilities at my university)
I'd welcome a PR too if you have the bandwidth to implement this!

@bharatr21 bharatr21 self-assigned this Aug 6, 2024
@artknight
Copy link
Author

How is it going? Did you make any headway?

@bharatr21
Copy link
Owner

How is it going? Did you make any headway?

Still working on it, should be done this week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants