how to run RL using multi-nodes in cluster #1133

HYB777 · 2024-05-02T01:50:51Z

How to use RayVecEnv in cluster? I want to run my rl code using multi-nodes training, I'm new to ray, is there some demos scripts?

MischaPanch · 2024-05-03T15:05:58Z

Hi @HYB777. This is a ray config issue - as long as you configure ray on a multi-node cluster, run ray.init appropriately, and use the RayVecEnv, things should work out.

That being said, I haven't tested personally on a multi-node cluster yet.

Since we're not ray developers, I think this question is outside of the scope for support from the tianshou team. However, if you encounter tianshou specific issues on the cluster, feel free let us know!

Ray has a large community and a lot of documentation, I suggest you start there. If you want to contribute a multi-node running example, I'm happy to review a PR

destin-v · 2024-07-10T19:01:34Z

If you want to run RayVecEnv in a cluster you have to setup multiple Ray workers and and connect all of them to the IP address of the Ray head node. This is done using the ray.init command. Here is an example that gets the IP address from every worker node that is connected to a Ray Cluster. If this runs on your multi-node server, you will be able to do the same with RayVecEnv.

import socket
import time
from collections import Counter import ray

@ray.remote
def f():
    time.sleep(0.001)
    return socket.gethostbyname(socket.gethostname())


def main(address: str):
    ray.init(address=address) # This needs to be replaced with your IP address

    futures = [f.remote() for _ in range(10000)]
    ip_addresses = ray.get(futures)
    for ip_address, num_tasks in Counter(ip_addresses).items():
        print(" {} tasks on {}".format(num_tasks, ip_address))

MischaPanch added question Further information is requested documentation labels May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to run RL using multi-nodes in cluster #1133

how to run RL using multi-nodes in cluster #1133

HYB777 commented May 2, 2024

MischaPanch commented May 3, 2024

destin-v commented Jul 10, 2024

how to run RL using multi-nodes in cluster #1133

how to run RL using multi-nodes in cluster #1133

Comments

HYB777 commented May 2, 2024

MischaPanch commented May 3, 2024

destin-v commented Jul 10, 2024