Includes: 1. Differentiable Neural Computers (DNC) 2. Sparse Access Memory (SAM) 3. Sparse Differentiable Neural Computers (SDNC)
- Install
- From source
- Architecure
- Usage
- DNC
- SDNC
- SAM
- Tasks
- Copy task (with curriculum and generalization)
- Generalizing Addition task
- Generalizing Argmax task
- Code Structure
- General noteworthy stuff
This is an implementation of Differentiable Neural Computers, described in the paper Hybrid computing using a neural network with dynamic external memory, Graves et al. and Sparse DNCs (SDNCs) and Sparse Access Memory (SAM) described in Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes.
pip install dnc
git clone https://github.com/ixaxaar/pytorch-dnc cd pytorch-dnc pip install -r ./requirements.txt pip install -e .
For using fully GPU based SDNCs or SAMs, install FAISS:
conda install faiss-gpu -c pytorch
pytest
is required to run the test
Constructor Parameters:
Following are the constructor parameters:
Following are the constructor parameters:
Argu ment | Defa ult | Desc ript ion |
---|---|---|
inpu t_s ize | No
ne |
Size of the inpu t vect ors |
hidd en_ size | No
ne |
Size of hidd en unit s |
rnn_typ e | ``'l stm' `` | Type of recu rren t cell s used in the cont roll er |
num_lay ers | ``1` ` | Numb er of laye rs of recu rren t unit s in the cont roll er |
num_hid den_lay ers | ``2` ` | Numb er of hidd en laye rs per laye r of the cont roll er |
bias | Tr
ue |
Bias |
batc h_f irst | Tr
ue |
Whet her data is fed batc h firs t |
drop out | ``0` ` | Drop out betw een laye rs in the cont roll er |
bidi rect iona l | ``Fa lse` ` | If the cont roll er is bidi rect iona l (Not yet impl emen ted |
nr_ cell s | ``5` ` | Numb er of memo ry cell s |
read _he ads | ``2` ` | Numb er of read head s |
cell _si ze | ``10 `` | Size of each memo ry cell |
nonl inea rity | ``'t anh' `` | If
usin
g
'rnn
'
as
rn
n_ty
pe
,
non-
line
arit
y
of
the
RNNs |
gpu_id | ``-1 `` | ID of the GPU, -1 for CPU |
inde pend ent_lin ears | ``Fa lse` ` | Whet her to use inde pend ent line ar unit s to deri ve inte rfac e vect or |
shar e_m emor y | Tr
ue |
Whet her to shar e memo ry betw een cont roll er laye rs |
Following are the forward pass parameters:
Argu ment | Defa ult | Desc ript ion |
---|---|---|
inpu t | The
inpu
t
vect
or
(B
*T*X
)
or
(T
*B*X
) |
|
hidd en | (N
one,
None
,Non
e) |
Hidd en stat es ``(c ontr olle r hi dden , me mory hid
d ve ctor s)`` |
rese t_e xper ienc e | ``Fa lse` ` | Whet her to rese t memo ry |
pass _th roug h_m emor y | Tr
ue |
Whet her to pass thro ugh memo ry |
from dnc import DNC
rnn = DNC(
input_size=64,
hidden_size=128,
rnn_type='lstm',
num_layers=4,
nr_cells=100,
cell_size=32,
read_heads=4,
batch_first=True,
gpu_id=0
)
(controller_hidden, memory, read_vectors) = (None, None, None)
output, (controller_hidden, memory, read_vectors) = \
rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors), reset_experience=True)
The debug
option causes the network to return its memory hidden
vectors (numpy ndarray
s) for the first batch each forward step.
These vectors can be analyzed or visualized, using visdom for example.
from dnc import DNC
rnn = DNC(
input_size=64,
hidden_size=128,
rnn_type='lstm',
num_layers=4,
nr_cells=100,
cell_size=32,
read_heads=4,
batch_first=True,
gpu_id=0,
debug=True
)
(controller_hidden, memory, read_vectors) = (None, None, None)
output, (controller_hidden, memory, read_vectors), debug_memory = \
rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors), reset_experience=True)
Memory vectors returned by forward pass (np.ndarray
):
Key | Y axis (dimensions) | X axis (dimensions) |
---|---|---|
debug_memory['memory'] |
layer * time | nr_cells * cell_size |
debug_memory['link_matrix'] |
layer * time | nr_cells * nr_cells |
debug_memory['precedence'] |
layer * time | nr_cells |
debug_memory['read_weights'] |
layer * time | read_heads * nr_cells |
debug_memory['write_weights'] |
layer * time | nr_cells |
debug_memory['usage_vector'] |
layer * time | nr_cells |
Constructor Parameters:
Following are the constructor parameters:
Argu ment | Defa ult | Desc ript ion |
---|---|---|
inpu t_s ize | No
ne |
Size of the inpu t vect ors |
hidd en_ size | No
ne |
Size of hidd en unit s |
rnn_typ e | ``'l stm' `` | Type of recu rren t cell s used in the cont roll er |
num_lay ers | ``1` ` | Numb er of laye rs of recu rren t unit s in the cont roll er |
num_hid den_lay ers | ``2` ` | Numb er of hidd en laye rs per laye r of the cont roll er |
bias | Tr
ue |
Bias |
batc h_f irst | Tr
ue |
Whet her data is fed batc h firs t |
drop out | ``0` ` | Drop out betw een laye rs in the cont roll er |
bidi rect iona l | ``Fa lse` ` | If the cont roll er is bidi rect iona l (Not yet impl emen ted |
nr_ cell s | 50
00 |
Numb er of memo ry cell s |
read _he ads | ``4` ` | Numb er of read head s |
spar se_ read s | ``4` ` | Numb er of spar se memo ry read s per read head |
temp oral _re ads | ``4` ` | Numb er of temp oral read s |
cell _si ze | ``10 `` | Size of each memo ry cell |
nonl inea rity | ``'t anh' `` | If
usin
g
'rnn
'
as
rn
n_ty
pe
,
non-
line
arit
y
of
the
RNNs |
gpu_id | ``-1 `` | ID of the GPU, -1 for CPU |
inde pend ent_lin ears | ``Fa lse` ` | Whet her to use inde pend ent line ar unit s to deri ve inte rfac e vect or |
shar e_m emor y | Tr
ue |
Whet her to shar e memo ry betw een cont roll er laye rs |
Following are the forward pass parameters:
Argu ment | Defa ult | Desc ript ion |
---|---|---|
inpu t | The
inpu
t
vect
or
(B
*T*X
)
or
(T
*B*X
) |
|
hidd en | (N
one,
None
,Non
e) |
Hidd en stat es ``(c ontr olle r hi dden , me mory hid
d ve ctor s)`` |
rese t_e xper ienc e | ``Fa lse` ` | Whet her to rese t memo ry |
pass _th roug h_m emor y | Tr
ue |
Whet her to pass thro ugh memo ry |
from dnc import SDNC
rnn = SDNC(
input_size=64,
hidden_size=128,
rnn_type='lstm',
num_layers=4,
nr_cells=100,
cell_size=32,
read_heads=4,
sparse_reads=4,
batch_first=True,
gpu_id=0
)
(controller_hidden, memory, read_vectors) = (None, None, None)
output, (controller_hidden, memory, read_vectors) = \
rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors), reset_experience=True)
The debug
option causes the network to return its memory hidden
vectors (numpy ndarray
s) for the first batch each forward step.
These vectors can be analyzed or visualized, using visdom for example.
from dnc import SDNC
rnn = SDNC(
input_size=64,
hidden_size=128,
rnn_type='lstm',
num_layers=4,
nr_cells=100,
cell_size=32,
read_heads=4,
batch_first=True,
sparse_reads=4,
temporal_reads=4,
gpu_id=0,
debug=True
)
(controller_hidden, memory, read_vectors) = (None, None, None)
output, (controller_hidden, memory, read_vectors), debug_memory = \
rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors), reset_experience=True)
Memory vectors returned by forward pass (np.ndarray
):
Key | Y axis (dim ensi ons) | X axis (dim ensi ons) |
---|---|---|
``de bug_ memo ry[' memo ry'] `` | laye r * time | nr_ cell s * cell _si ze |
``de bug_ memo ry[' visi ble_ memo ry'] `` | laye r * time | spar se_ read s+2 *te mpor al_ read s+1 * nr_ cell s |
``de bug_ memo ry[' read _pos itio ns'] `` | laye r * time | spar se_ read s+2*tem pora l_r eads +1 |
de
bug_
memo
ry['
link
_mat
rix'
] |
laye r * time | spar se_ read s+2 *te mpor al_ read s+1 * spar se_ read s+2*tem pora l_r eads +1 |
de
bug_
memo
ry['
rev_
link
_mat
rix'
] |
laye r * time | spar se_ read s+2 *te mpor al_ read s+1 * spar se_ read s+2*tem pora l_r eads +1 |
``de bug_ memo ry[' prec eden ce'] `` | laye r * time | nr_ cell s |
de
bug_
memo
ry['
read
_wei
ghts
'] |
laye r * time | read _he ads * nr_ cell s |
``de bug_ memo ry[' writ e_we ight s']` ` | laye r * time | nr_ cell s |
``de bug_ memo ry[' usag e']` ` | laye r * time | nr_ cell s |
Constructor Parameters:
Following are the constructor parameters:
Argu ment | Defa ult | Desc ript ion |
---|---|---|
inpu t_s ize | No
ne |
Size of the inpu t vect ors |
hidd en_ size | No
ne |
Size of hidd en unit s |
rnn_typ e | ``'l stm' `` | Type of recu rren t cell s used in the cont roll er |
num_lay ers | ``1` ` | Numb er of laye rs of recu rren t unit s in the cont roll er |
num_hid den_lay ers | ``2` ` | Numb er of hidd en laye rs per laye r of the cont roll er |
bias | Tr
ue |
Bias |
batc h_f irst | Tr
ue |
Whet her data is fed batc h firs t |
drop out | ``0` ` | Drop out betw een laye rs in the cont roll er |
bidi rect iona l | ``Fa lse` ` | If the cont roll er is bidi rect iona l (Not yet impl emen ted |
nr_ cell s | 50
00 |
Numb er of memo ry cell s |
read _he ads | ``4` ` | Numb er of read head s |
spar se_ read s | ``4` ` | Numb er of spar se memo ry read s per read head |
cell _si ze | ``10 `` | Size of each memo ry cell |
nonl inea rity | ``'t anh' `` | If
usin
g
'rnn
'
as
rn
n_ty
pe
,
non-
line
arit
y
of
the
RNNs |
gpu_id | ``-1 `` | ID of the GPU, -1 for CPU |
inde pend ent_lin ears | ``Fa lse` ` | Whet her to use inde pend ent line ar unit s to deri ve inte rfac e vect or |
shar e_m emor y | Tr
ue |
Whet her to shar e memo ry betw een cont roll er laye rs |
Following are the forward pass parameters:
Argu ment | Defa ult | Desc ript ion |
---|---|---|
inpu t | The
inpu
t
vect
or
(B
*T*X
)
or
(T
*B*X
) |
|
hidd en | (N
one,
None
,Non
e) |
Hidd en stat es ``(c ontr olle r hi dden , me mory hid
d ve ctor s)`` |
rese t_e xper ienc e | ``Fa lse` ` | Whet her to rese t memo ry |
pass _th roug h_m emor y | Tr
ue |
Whet her to pass thro ugh memo ry |
from dnc import SAM
rnn = SAM(
input_size=64,
hidden_size=128,
rnn_type='lstm',
num_layers=4,
nr_cells=100,
cell_size=32,
read_heads=4,
sparse_reads=4,
batch_first=True,
gpu_id=0
)
(controller_hidden, memory, read_vectors) = (None, None, None)
output, (controller_hidden, memory, read_vectors) = \
rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors), reset_experience=True)
The debug
option causes the network to return its memory hidden
vectors (numpy ndarray
s) for the first batch each forward step.
These vectors can be analyzed or visualized, using visdom for example.
from dnc import SAM
rnn = SAM(
input_size=64,
hidden_size=128,
rnn_type='lstm',
num_layers=4,
nr_cells=100,
cell_size=32,
read_heads=4,
batch_first=True,
sparse_reads=4,
gpu_id=0,
debug=True
)
(controller_hidden, memory, read_vectors) = (None, None, None)
output, (controller_hidden, memory, read_vectors), debug_memory = \
rnn(torch.randn(10, 4, 64), (controller_hidden, memory, read_vectors), reset_experience=True)
Memory vectors returned by forward pass (np.ndarray
):
Key | Y axis (dim ensi ons) | X axis (dim ensi ons) |
---|---|---|
``de bug_ memo ry[' memo ry'] `` | laye r * time | nr_ cell s * cell _si ze |
``de bug_ memo ry[' visi ble_ memo ry'] `` | laye r * time | spar se_ read s+2 *te mpor al_ read s+1 * nr_ cell s |
``de bug_ memo ry[' read _pos itio ns'] `` | laye r * time | spar se_ read s+2*tem pora l_r eads +1 |
de
bug_
memo
ry['
read
_wei
ghts
'] |
laye r * time | read _he ads * nr_ cell s |
``de bug_ memo ry[' writ e_we ight s']` ` | laye r * time | nr_ cell s |
``de bug_ memo ry[' usag e']` ` | laye r * time | nr_ cell s |
The copy task, as descibed in the original paper, is included in the repo.
From the project root:
python ./tasks/copy_task.py -cuda 0 -optim rmsprop -batch_size 32 -mem_slot 64 # (like original implementation)
python ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 32 -batch_size 1000 -optim adam -sequence_max_length 8 # (faster convergence)
For SDNCs:
python ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -memory_type sdnc -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 100 -mem_size 10 -read_heads 1 -sparse_reads 10 -batch_size 20 -optim adam -sequence_max_length 10
and for curriculum learning for SDNCs:
python ./tasks/copy_task.py -cuda 0 -lr 0.001 -rnn_type lstm -memory_type sdnc -nlayer 1 -nhlayer 2 -dropout 0 -mem_slot 100 -mem_size 10 -read_heads 1 -sparse_reads 4 -temporal_reads 4 -batch_size 20 -optim adam -sequence_max_length 4 -curriculum_increment 2 -curriculum_freq 10000
For the full set of options, see:
python ./tasks/copy_task.py --help
The copy task can be used to debug memory using Visdom.
Additional step required:
pip install visdom
python -m visdom.server
Open http://localhost:8097/ on your browser, and execute the copy task:
python ./tasks/copy_task.py -cuda 0
The visdom dashboard shows memory as a heatmap for batch 0 every
-summarize_freq
iteration:
The adding task is as described in this github pull
request.
This task - creates one-hot vectors of size input_size
, each
representing a number - feeds a sentence of them to a network - the
output of which is added to get the sum of the decoded outputs
The task first trains the network for sentences of size ~100, and then tests if the network genetalizes for lengths ~1000.
python ./tasks/adding_task.py -cuda 0 -lr 0.0001 -rnn_type lstm -memory_type sam -nlayer 1 -nhlayer 1 -nhid 100 -dropout 0 -mem_slot 1000 -mem_size 32 -read_heads 1 -sparse_reads 4 -batch_size 20 -optim rmsprop -input_size 3 -sequence_max_length 100
The second adding task is similar to the first one, except that the network's output at the last time step is expected to be the argmax of the input.
python ./tasks/argmax_task.py -cuda 0 -lr 0.0001 -rnn_type lstm -memory_type dnc -nlayer 1 -nhlayer 1 -nhid 100 -dropout 0 -mem_slot 100 -mem_size 10 -read_heads 2 -batch_size 1 -optim rmsprop -sequence_max_length 15 -input_size 10 -iterations 10000
- DNCs:
- dnc/dnc.py - Controller code.
- dnc/memory.py - Memory module.
- SDNCs:
- dnc/sdnc.py - Controller code, inherits dnc.py.
- dnc/sparse_temporal_memory.py - Memory module.
- dnc/flann_index.py - Memory index using kNN.
- SAMs:
- dnc/sam.py - Controller code, inherits dnc.py.
- dnc/sparse_memory.py - Memory module.
- dnc/flann_index.py - Memory index using kNN.
- Tests:
- All tests are in ./tests folder.
- SDNCs use the FLANN approximate nearest neigbhour library, with its python binding pyflann3 and FAISS.
FLANN can be installed either from pip (automatically as a dependency), or from source (e.g. for multithreading via OpenMP):
# install openmp first: e.g. `sudo pacman -S openmp` for Arch.
git clone git://github.com/mariusmuja/flann.git
cd flann
mkdir build
cd build
cmake ..
make -j 4
sudo make install
FAISS can be installed using:
conda install faiss-gpu -c pytorch
FAISS is much faster, has a GPU implementation and is interoperable with pytorch tensors. We try to use FAISS by default, in absence of which we fall back to FLANN.
nan
s in the gradients are common, try with different batch sizes
Repos referred to for creation of this repo: