-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dim != 1 for softmax w/o using permute #845
Conversation
This pull request was exported from Phabricator. Differential Revision: D47732875 |
This pull request was exported from Phabricator. Differential Revision: D47732875 |
Summary: Pull Request resolved: facebookincubator#845 This is a port of PyTorch's softmax implementation. Notable differences: * We use fast_exp & fast_max instead of std::max and std::exp * We don't use higher-precision types for accumulator values (doesn't look like the dim=-1 softmax code does this either) * We propagate the reduction dim size & inner size as constants We seem to be very marginally slower than PT for small batch sizes and very marginally faster for large ones. I have named this new softmax implementation "softmaxGeneral" since it is able to handle arbitrary reduction dimensions, even though we are only using it for the `dim > 1` case. Differential Revision: D47732875 fbshipit-source-id: 5118fed5cf6457bd9d27f553245c3b9985403f78
This pull request was exported from Phabricator. Differential Revision: D47732875 |
Summary: Pull Request resolved: facebookincubator#845 This is a port of PyTorch's softmax implementation. Notable differences: * We use fast_exp & fast_max instead of std::max and std::exp * We don't use higher-precision types for accumulator values (doesn't look like the dim=-1 softmax code does this either) * We propagate the reduction dim size & inner size as constants We seem to be very marginally slower than PT for small batch sizes and very marginally faster for large ones. I have named this new softmax implementation "softmaxGeneral" since it is able to handle arbitrary reduction dimensions, even though we are only using it for the `dim > 1` case. Differential Revision: D47732875 fbshipit-source-id: 1bd5a47e4293e3b800942b3d0cd0270c657d0069
This pull request was exported from Phabricator. Differential Revision: D47732875 |
Summary: Pull Request resolved: facebookincubator#845 This is a port of PyTorch's softmax implementation. Notable differences: * We use fast_exp & fast_max instead of std::max and std::exp * We don't use higher-precision types for accumulator values (doesn't look like the dim=-1 softmax code does this either) * We propagate the reduction dim size & inner size as constants We seem to be very marginally slower than PT for small batch sizes and very marginally faster for large ones. I have named this new softmax implementation "softmaxGeneral" since it is able to handle arbitrary reduction dimensions, even though we are only using it for the `dim > 1` case. Differential Revision: D47732875 fbshipit-source-id: 5323b5cfb0b2d3e3b983033718563b2976df5519
This pull request was exported from Phabricator. Differential Revision: D47732875 |
This pull request has been merged in 318111f. |
Summary: Now that facebookincubator#845 has landed, the backend supports softmax with `dim != -1` directly, and the fx converter no longer needs the workaround from facebookincubator#395. Differential Revision: D48248330 fbshipit-source-id: fad534f63b642ecbf79a90f7fae3c4cc9ad4dadf
Summary:
This is a port of PyTorch's softmax implementation.
Notable differences:
This is probably the reason why we are (very marginally) faster.
I have named this new softmax implementation "softmaxGeneral" since it is able to handle arbitrary reduction dimensions, even though we are only using it for the
dim > 1
case.Differential Revision: D47732875