Jack the Reader

A reading comprehension framework.

All work and no play makes Jack a great framework!
All work and no play makes Jack a great framework!
All work and no play makes Jack a great framework!

Jack the Reader -- or jack, for short -- is a framework for building an testing models on a variety of tasks that require reading comprehension.

To get started, please see How to Install and Run and then you may want to have a look at the notebooks. Lastly, for a high-level explanation of the ideas and vision, see Understanding Jack the Reader.

Quickstart Examples - Training and Usage of a Question Answering System

To illustrate how jack works, we will show how to train a question answering model.

Extractive Question Answering on SQuAD

First, download SQuAD and GloVe embeddings:

$ data/SQuAD/download.sh
$ data/GloVe/download.sh
$ # Although we support native glove format it is recommended to use a memory mapped format which allows to load embeddings only as needed.
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir

Train a FastQA model:

$ python3 bin/jack-train.py with train='data/SQuAD/train-v1.1.json' dev='data/SQuAD/dev-v1.1.json' reader='fastqa_reader' \
> repr_dim=300 dropout=0.5 batch_size=64 seed=1337 loader='squad' save_dir='./fastqa_reader' epochs=20 \
> with_char_embeddings=True embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or shorter, using our prepared config:

$ python3 bin/jack-train.py with config='./conf/qa/fastqa.yaml'

Note, you can add a flag tensorboard_folder=.tb/fastqa to write arbitrary tensorboard summaries to a provided path (here .tb/fastqa). Your summaries are automatically fetched in the train loop, so all you need to do is write them in your TF model code.

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. the showcase notebook.

You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick together your own model. There are examples in conf/qa/ (e.g., bidaf.yaml or our own creation jack_qa.yaml). We recommend using one of our own creations jack_qa*.yaml which are both fast while achieving very good results. These models are defined solely in the configs, i.e., there is not implementation in code. This is possible through our ModularQAModel. For more information on extractive question answering please have a look here.

If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model:

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir 
$ wget https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
$ unzip fastqa.zip && mv fastqa fastqa_reader

from jack import readers
from jack.core import QASetting

fastqa_reader = readers.reader_from_file("./fastqa_reader")

support = """"It is a replica of the grotto at Lourdes, 
France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. 
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), 
is a simple, modern stone statue of Mary."""

answers = fastqa_reader([QASetting(
    question="To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
    support=[support]
)])

Recognizing Textual Entailment on SNLI

First, download SNLI

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir 
$ ./data/SNLI/download.sh

Then, for instance, train a Decomposable Attention Model

$ python3 bin/jack-train.py with reader='dam_snli_reader' loader=snli train='data/SNLI/snli_1.0/snli_1.0_train.jsonl' \
> dev='data/SNLI/snli_1.0/snli_1.0_dev.jsonl' test='data/SNLI/snli_1.0/snli_1.0_test.jsonl' \
> save_dir='./dam_reader' repr_dim=300 epochs=20 seed=1337 dropout=0.5 batch_size=64 \
> embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or the short version:

$ python3 bin/jack-train.py with config='./conf/nli/dam.yaml'

Note, you can easily change the model to one of the other implemented NLI readers. Just checkout our configurations in conf/nli/. You will find for instance an ESIM reader (esim.yaml) which is realized using our ModularNLIReader, similar to question answering. You can quickly stick together your own model in a config like that. Available modules can be found here.

Note, you can add a flag tensorboard_folder=.tb/dam_reader to write tensorboard summaries to a provided path (here .tb/dam_reader).

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. the showcase notebook.

from jack import readers
from jack.core import QASetting

dam_reader = readers.reader_from_file("./dam_reader")

answers = dam_reader([QASetting(
    question="The boy plays with the ball.",  # Hypothesis
    support=["The boy plays with the ball."]  # Premise
)])

Support

We are thankful for support from:

Developer guidelines

Comply with the PEP 8 Style Guide
Make sure all your code runs from the top level directory, e.g.:

$ pwd
/home/pasquale/workspace/jack
$ python3 bin/jack-train.py [..]

Name		Name	Last commit message	Last commit date
Latest commit History 2,142 Commits
api_docs		api_docs
bin		bin
conf		conf
data		data
docs		docs
jack		jack
notebooks		notebooks
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
wercker.yml		wercker.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jack the Reader

A reading comprehension framework.

Quickstart Examples - Training and Usage of a Question Answering System

Extractive Question Answering on SQuAD

Recognizing Textual Entailment on SNLI

Support

Developer guidelines

About

Releases

Packages

Languages

License

wsh540677278/jack

Folders and files

Latest commit

History

Repository files navigation

Jack the Reader

A reading comprehension framework.

Quickstart Examples - Training and Usage of a Question Answering System

Extractive Question Answering on SQuAD

Recognizing Textual Entailment on SNLI

Support

Developer guidelines

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages