Hyperbolic Attention

This project aims to investigate the use of tanh as the basis for formulating softmax instead of the logistic function (sigmoid). To do this we use the equality defined over scaler $x$ below:

$$ tanh(x) = 2 * sigmoid(2 * x) - 1 $$

As such we define hypermax over the vector $X$ as follows:

$$ hypermax(X) = 2 * softmax(2 * X) - 1 $$

(this does not work)

The motivation for doing so is 2 fold, 1st is that it creates more generalized form of attention that allows for negation rather than just rejection. This means that Transformer hidden-states can be pushed away from one another in addition to being pulled towards one another. The 2nd motivation stems from prior NLP work before the Transformer, where tanh was seen as preferable to sigmoid due to its empiracally improved gradients.

It appears as if this may not currently be mathematically sound, i.e. the sum of all values of hypermax is 2-n, a behavior which is not desired.

There are 2 properties of softmax which we want to emulate:

The sum of outputs is equal to one (well conditioned sum of outputs)
Each output is bound between (0,1)

In the creation of hypermax we want:

The sum of outputs is equal to a constant c (ideally still 1)
Each output is bound between (-1,1)

Correction:

$$ tanh(x) = \frac{e^{2x}}{e^{2x} + 1} - \frac{e^{-2x}}{e^{-2x} + 1} $$

$$ hypermax(X) = sigmoid(2 * X) - sigmoid(-2 * X) $$

Key properties of this correction:

The sum of outputs is 0
Each output is bound by (-1, 1)

TODO:

Run tests (done)
Investigate OOM issues: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
Test with MLA model?
update repo with results

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
configs		configs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperbolic Attention

TODO:

About

Releases

Packages

Languages

License

naston/HyperAttention

Folders and files

Latest commit

History

Repository files navigation

Hyperbolic Attention

TODO:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages