Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MlpPolicy network output layer softmax activation for continuous action space problem? #1190

Open
wbzhang233 opened this issue Jan 15, 2024 · 2 comments

Comments

@wbzhang233
Copy link

In the case of continuous action space problem, we could use PPO\A2C algorithm to predict continuous aciton, but I want to custom softmax as my output activation function with net_arch=[256,256]. I have read and test the tutorial post. When I test the code below, I found the action is not sum up to one. the softmax function don't work. I found that the action_net in mode.policy, but I could not use softmax as the custom activation function.

policy_kwargs = {
    "activation_fn":  torch.nn.Softmax,
    "net_arch": [256, 256]
}
model = PPO('MlpPolicy', env, policy_kwargs, verbose=1)

How to use softmax as customized activation function of the action output layer?

@wbzhang233
Copy link
Author

I want to make the action of PPO represent the probability, so I need use softmax as activation function.
Totaly a continuous action space problem.

@rambo1111
Copy link

#1192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants