Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Few suggestions #8

Open
monk1337 opened this issue Jun 8, 2022 · 1 comment
Open

Few suggestions #8

monk1337 opened this issue Jun 8, 2022 · 1 comment

Comments

@monk1337
Copy link

monk1337 commented Jun 8, 2022

The project is fantastic; here are a few suggestions :

  1. It would be good if there were separate repo for redditflow data and reddit flow model APIs. Sometimes developers want to extract only data and use their model, and sometimes they want to use models but different data. Combining both things results in a bigger size of repo, and also, if I want to scrape only data, I need to install torch, sentence-transformer, sentencepiece etc. ( reference can be huggingface's dataset API and model API )

  2. Update the doc for redditflow, including how to extract data based on a single keyword and extract all comments and posts from a single subreddit?

  3. Organize the nfflow repo into some base functions which can utilize further for other platform APIs such as Twitter etc

  4. Add ML Intelligence to data fetching and scrapping ( example: OpenAI's CLIP )

  5. it can also include Elasticsearch to fetch data faster from the downloaded archive.

Here is a simple overview of integrating OpenAI's CLIP project into nfflow:

  • Download image data from different sources
  • Use Colab to load data and train OpenAI's CLIP model to convert images into vector
  • save the vectors into the user's gdrive
  • Perform evaluation ( search query ) over downloaded data

It can be automated end to end if training on colab and fetching vectors from the drive can be automated.

@abhijithneilabraham
Copy link
Member

Awesome! Will look into this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants