Support Distributed Machines #3

abhijithneilabraham · 2022-05-05T04:26:17Z

Scraping should be made available across distributed machines, in order to make it faster.
Few ideas to implement this:

Split the config by period of time. Eg: 4 machines means the start and end time could be split into 4 and each period of time could be handled by each machine.
Use docker images and pull it across multiple machines.

raaghavrm · 2022-06-16T03:20:11Z

Hey
I wish to contribute to this feature. Could you please assign it to me.
Also could you please elaborate about the feature a little more please.
Reference : Aviyel
Thanx

abhijithneilabraham · 2022-06-16T05:42:35Z

Hi @Raaghav4243 !

Sure! I hope you understand this might be quite a long task, but I will guide you through the requirements if you wish to take this forward.

As of now, redditflow supports running only on a single machine, where the scraping and filtering are done. This might be time-consuming. If a researcher has multiple cloud machines they wish to split the task, this can be done the following way:

Take the time period start_time to end_time from the config, divide it into time frames, and make new configs, with new start_time and end_time for each machine according to the time split. Now, with ssh these scripts via python into respective cloud machines, and run the python scripts remotely on those machines.

Reference for ssh connection via python: https://github.com/paramiko/paramiko

Here's another reference project where such distributed configurations and remote connections were done: https://github.com/autonomio/jako

Happy coding!

abhijithneilabraham added the enhancement New feature or request label May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Distributed Machines #3

Support Distributed Machines #3

abhijithneilabraham commented May 5, 2022

raaghavrm commented Jun 16, 2022

abhijithneilabraham commented Jun 16, 2022

Support Distributed Machines #3

Support Distributed Machines #3

Comments

abhijithneilabraham commented May 5, 2022

raaghavrm commented Jun 16, 2022

abhijithneilabraham commented Jun 16, 2022