MlpPolicy network output layer softmax activation for continuous action space problem? #1190

wbzhang233 · 2024-01-15T09:20:29Z

In the case of continuous action space problem, we could use PPO\A2C algorithm to predict continuous aciton, but I want to custom softmax as my output activation function with net_arch=[256,256]. I have read and test the tutorial post. When I test the code below, I found the action is not sum up to one. the softmax function don't work. I found that the action_net in mode.policy, but I could not use softmax as the custom activation function.

policy_kwargs = {
    "activation_fn":  torch.nn.Softmax,
    "net_arch": [256, 256]
}
model = PPO('MlpPolicy', env, policy_kwargs, verbose=1)

How to use softmax as customized activation function of the action output layer?

The text was updated successfully, but these errors were encountered:

wbzhang233 · 2024-01-15T09:23:25Z

I want to make the action of PPO represent the probability, so I need use softmax as activation function.
Totaly a continuous action space problem.

rambo1111 · 2024-02-03T19:22:08Z

#1192

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MlpPolicy network output layer softmax activation for continuous action space problem? #1190

MlpPolicy network output layer softmax activation for continuous action space problem? #1190

wbzhang233 commented Jan 15, 2024

wbzhang233 commented Jan 15, 2024

rambo1111 commented Feb 3, 2024

MlpPolicy network output layer softmax activation for continuous action space problem? #1190

MlpPolicy network output layer softmax activation for continuous action space problem? #1190

Comments

wbzhang233 commented Jan 15, 2024

wbzhang233 commented Jan 15, 2024

rambo1111 commented Feb 3, 2024