Skip to content

wsh540677278/jack

 
 

Repository files navigation

Jack the Reader Wercker build badge codecov Gitter license

A reading comprehension framework.
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!
  • All work and no play makes Jack a great framework!

Jack the Reader -- or jack, for short -- is a framework for building an testing models on a variety of tasks that require reading comprehension.

To get started, please see How to Install and Run and then you may want to have a look at the notebooks. Lastly, for a high-level explanation of the ideas and vision, see Understanding Jack the Reader.

Quickstart Examples - Training and Usage of a Question Answering System

To illustrate how jack works, we will show how to train a question answering model.

Extractive Question Answering on SQuAD

First, download SQuAD and GloVe embeddings:

$ data/SQuAD/download.sh
$ data/GloVe/download.sh
$ # Although we support native glove format it is recommended to use a memory mapped format which allows to load embeddings only as needed.
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir 

Train a FastQA model:

$ python3 bin/jack-train.py with train='data/SQuAD/train-v1.1.json' dev='data/SQuAD/dev-v1.1.json' reader='fastqa_reader' \
> repr_dim=300 dropout=0.5 batch_size=64 seed=1337 loader='squad' save_dir='./fastqa_reader' epochs=20 \
> with_char_embeddings=True embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or shorter, using our prepared config:

$ python3 bin/jack-train.py with config='./conf/qa/fastqa.yaml'

Note, you can add a flag tensorboard_folder=.tb/fastqa to write arbitrary tensorboard summaries to a provided path (here .tb/fastqa). Your summaries are automatically fetched in the train loop, so all you need to do is write them in your TF model code.

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. the showcase notebook.

You want to train another model? No problem, we have a fairly modular QAModel implementation which allows you to stick together your own model. There are examples in conf/qa/ (e.g., bidaf.yaml or our own creation jack_qa.yaml). We recommend using one of our own creations jack_qa*.yaml which are both fast while achieving very good results. These models are defined solely in the configs, i.e., there is not implementation in code. This is possible through our ModularQAModel. For more information on extractive question answering please have a look here.

If all of that is too cumbersome for you and you just want to play, why not downloading a pretrained model:

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir 
$ wget https://www.dropbox.com/s/qb796uljoqj0lvo/fastqa.zip?dl=1
$ unzip fastqa.zip && mv fastqa fastqa_reader
from jack import readers
from jack.core import QASetting

fastqa_reader = readers.reader_from_file("./fastqa_reader")

support = """"It is a replica of the grotto at Lourdes, 
France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. 
At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), 
is a simple, modern stone statue of Mary."""

answers = fastqa_reader([QASetting(
    question="To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
    support=[support]
)])

Recognizing Textual Entailment on SNLI

First, download SNLI

$ # we still need GloVe in memory mapped format, ignore the next 2 commands if already downloaded and transformed
$ data/GloVe/download.sh
$ python3 ./bin/mmap-cli.py data/GloVe/glove.840B.300d.txt data/GloVe/glove.840B.300d.memory_map_dir 
$ ./data/SNLI/download.sh

Then, for instance, train a Decomposable Attention Model

$ python3 bin/jack-train.py with reader='dam_snli_reader' loader=snli train='data/SNLI/snli_1.0/snli_1.0_train.jsonl' \
> dev='data/SNLI/snli_1.0/snli_1.0_dev.jsonl' test='data/SNLI/snli_1.0/snli_1.0_test.jsonl' \
> save_dir='./dam_reader' repr_dim=300 epochs=20 seed=1337 dropout=0.5 batch_size=64 \
> embedding_format='memory_map_dir' embedding_file='data/GloVe/glove.840B.300d.memory_map_dir' vocab_from_embeddings=True

or the short version:

$ python3 bin/jack-train.py with config='./conf/nli/dam.yaml'

Note, you can easily change the model to one of the other implemented NLI readers. Just checkout our configurations in conf/nli/. You will find for instance an ESIM reader (esim.yaml) which is realized using our ModularNLIReader, similar to question answering. You can quickly stick together your own model in a config like that. Available modules can be found here.

Note, you can add a flag tensorboard_folder=.tb/dam_reader to write tensorboard summaries to a provided path (here .tb/dam_reader).

A copy of the model is written into the save_dir directory after each training epoch when performance improves. These can be loaded using the commands below or see e.g. the showcase notebook.

from jack import readers
from jack.core import QASetting

dam_reader = readers.reader_from_file("./dam_reader")

answers = dam_reader([QASetting(
    question="The boy plays with the ball.",  # Hypothesis
    support=["The boy plays with the ball."]  # Premise
)])

Support

We are thankful for support from:

Developer guidelines

$ pwd
/home/pasquale/workspace/jack
$ python3 bin/jack-train.py [..]

About

Jack the Reader

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 54.0%
  • HTML 21.7%
  • JavaScript 11.1%
  • Jupyter Notebook 9.2%
  • CSS 2.9%
  • Shell 0.9%
  • Makefile 0.2%