This is a little toy project that experiments with using a generative adversarial network (GAN) to generate random matrices in
-
Learn how to implement a GAN in PyTorch.
-
Write clean and performant PyTorch & Lightning code.
-
Discover some neat math along the way.
Let
Elements of
The problem of randomly generating unimodular matrices in an 'unbiased' way therefore arises when one needs to select a random lattice. This is a problem of significant importance. One notable application is in lattice-based cryptography, where the strength of a cryptographic protocol is derived from the difficulty of certain problems concerning lattices. For example, this is an essential part of a lattice-based cryptosystem, where the random generation of high-dimensional unimodular matrices (usually over finite fields) takes the role of the random generation of large primes in RSA. Ensuring high quality of the random lattice generation algorithm is essential to avoid leaving the cryptosystem vulnerable to various attacks. Indeed, a number of algorithms exist for randomly selecting a unimodular matrix, but in terms of statistical properties they are not exactly equivalent, leading to empirical disparities in the resilience of their associated cryptosystems. Besides cryptography, random unimodular matrices also see application in the study of lattices and lattice algorithms in general.
The focus of this repository, however, is just on a simple toy problem in low dimensions: the generation of random
The natural first obstacle is that
We first address the naive approach to the sampling algorithm, sample-and-replace:
- Sample a
$2\times 2$ matrix$M$ uniformly at random from the hypercube$[-N,N]^4$ . - If
$\det M = 1$ , keep; otherwise, discard$M$ and sample again until an element of$SL(2,\mathbb{Z})$ is found.
Although in theory every element of
Therefore in terms of practicality we are restricted to algorithms that are guaranteed to generate elements of
In this repository we experiment with an option that, as far as I am aware, has not been explored previously. As a quick summary, the method works as follows:
- Generate a dataset consisting of a large number of previously generated elements of
$SL(2,\mathbb{Z})$ . - Train a generative adversarial network to produce new elements of
$SL(2,\mathbb{Z})$ by using the dataset as training data for the discriminator.
In particular, I would like to see if it is possible to train the GAN on a dataset generated by a biased random matrix generator without the GAN inheriting its biases as well.
The random matrix generator I have in mind uses the fact that
$$
S = \begin{pmatrix}
0 & -1\
1 & 0
\end{pmatrix},
~ T = \begin{pmatrix}
1 & 1\
0 & 1
\end{pmatrix}.
$$
In addition,
- Generate uniformly at random a number
$\ell$ in the range$0,\ldots,N$ . - Perform a random walk in
$SL(2,\mathbb{Z})$ of length$\ell$ starting at the origin, where at each step we right-multiply by either$S$ or$ST$ with equal probability. - Return the endpoint.
This algorithm is guaranteed to generate elements of
A relativistic GAN architecture with gradient penalty was selected after some amount of experimentation with other architectures such as WGAN-GP. To generate the training dataset, I wrote a simple program that generates random walks in matrix_methods.py
). The program can efficiently generate on the order of 1e7 unimodular matrices from random walks of length 1e3.
For both the generator and discriminator, a simple fully connected network was selected with explicit encoding of polynomial features of degree 2. The determinant of a
Raw training of the GAN architecture on the matrix data did not appear to result in learning. It was clear that the GAN could not learn the condition that all entries must be integers. However, it also did not appear to be able to learn the determinantal condition. I changed the goalposts slightly and encoded the integrality and determinant conditions explicitly as loss functions for the generator, with the hope that the network would use the training data to encourage diverse outputs mimicking the training data distribution.
Unfortunately, although the network was now capable of generating floating-point matrices in
It is unclear to me what would be necessary to resolve the mode collapse problem. The discriminator architecture should in theory be sufficient to produce a perfect discriminator, given that the polynomial relationship given by membership in