- All work and no play makes Jack a great framework!
- All work and no play makes Jack a great framework!
- All work and no play makes Jack a great framework!
Jack the Reader -- or jack, for short -- is a framework for building an testing models on a variety of tasks that require reading comprehension.
To get started, please see How to Install and Run and then you may want to have a look at the notebooks. Lastly, for a high-level explanation of the ideas and vision, see Understanding Jack the Reader.
To illustrate how jack works, we will show how to train a question answering model.
First, download SQuAD and GloVe embeddings:
$ data/SQuAD/download.sh
$ data/GloVe/download.sh
$ # Although we support native glove format it is recommended to use a memory mapped format which allows to load embeddings only as needed.
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir
Train a FastQA model:
$ python3 bin/jack-train.py with train='data/SQuAD/train-v1.1.json' dev='data/SQuAD/dev-v1.1.json' reader='fastqa_reader' \
> repr_dim=300 dropout=0.5 batch_size=64 seed=1337 loader='squad' save_dir='./fastqa_reader' epochs=20 \
> with_char_embeddings=True embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True
or shorter, using our prepared config:
$ python3 bin/jack-train.py with config='./conf/qa/fastqa.yaml'
Note, you can add a flag tensorboard_folder=.tb/fastqa
to write arbitrary tensorboard
summaries to a provided path (here .tb/fastqa
). Your summaries are automatically
fetched in the train loop, so all you need to do is write them in your TF model code.
A copy of the model is written into the save_dir
directory after each
training epoch when performance improves. These can be loaded using the commands below or see e.g.
the showcase notebook.
You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick
together your own model. There are examples in conf/qa/
(e.g., bidaf.yaml
or our own creation jack_qa.yaml
).
We recommend using one of our own creations jack_qa*.yaml
which are both fast while achieving very good
results. These models are defined solely in the configs, i.e., there is not implementation in code.
This is possible through our ModularQAModel
.
For more information on extractive question answering please have a look here.
If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model:
$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir
$ wget https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
$ unzip fastqa.zip && mv fastqa fastqa_reader
from jack import readers
from jack.core import QASetting
fastqa_reader = readers.reader_from_file("./fastqa_reader")
support = """"It is a replica of the grotto at Lourdes,
France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858.
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome),
is a simple, modern stone statue of Mary."""
answers = fastqa_reader([QASetting(
question="To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
support=[support]
)])
First, download SNLI
$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir
$ ./data/SNLI/download.sh
Then, for instance, train a Decomposable Attention Model
$ python3 bin/jack-train.py with reader='dam_snli_reader' loader=snli train='data/SNLI/snli_1.0/snli_1.0_train.jsonl' \
> dev='data/SNLI/snli_1.0/snli_1.0_dev.jsonl' test='data/SNLI/snli_1.0/snli_1.0_test.jsonl' \
> save_dir='./dam_reader' repr_dim=300 epochs=20 seed=1337 dropout=0.5 batch_size=64 \
> embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True
or the short version:
$ python3 bin/jack-train.py with config='./conf/nli/dam.yaml'
Note, you can easily change the model to one of the other implemented NLI readers. Just checkout our configurations in
conf/nli/
. You will find for instance an ESIM
reader (esim.yaml
) which is realized using our ModularNLIReader
,
similar to question answering. You can quickly stick together your own model in a config like that. Available modules
can be found here.
Note, you can add a flag tensorboard_folder=.tb/dam_reader
to write tensorboard
summaries to a provided path (here .tb/dam_reader
).
A copy of the model is written into the save_dir
directory after each
training epoch when performance improves. These can be loaded using the commands below or see e.g.
the showcase notebook.
from jack import readers
from jack.core import QASetting
dam_reader = readers.reader_from_file("./dam_reader")
answers = dam_reader([QASetting(
question="The boy plays with the ball.", # Hypothesis
support=["The boy plays with the ball."] # Premise
)])
We are thankful for support from:
- Comply with the PEP 8 Style Guide
- Make sure all your code runs from the top level directory, e.g.:
$ pwd
/home/pasquale/workspace/jack
$ python3 bin/jack-train.py [..]