try to implement the training phase code for two agents in the competitive environment using PPO algorithm and it's imcomplete.
I imported the competitive environment from OpenAI's repo and revised the ppo algorithm from the baseline to adapt to the competitive environment. As you can see, the dependencies are the dependencies of the two repo above. baseline etc have to be installed at first.
My idea is simple, in the compete_learn function in the train_run.py file, I use lists to store the states and rewards, etc info of the two agents and alternating learning bwtween the two agents. It'a a naive idea, advice are wanted. Thank you!