Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added sem-tags #21

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ coqc coqlib.v
```

Our system assigns semantics to CCG structures. At the moment, we support C&C for English, and Jigg for Japanese.
If you are working with templates that require semantic tags, you will also need a universal semantic tagger.

### Installing [C&C parser](http://www.cl.cam.ac.uk/~sc609/candc-1.00.html) (for English)

Expand All @@ -72,6 +73,19 @@ Simply do:

The command above will download Jigg, its models, and create the file `ja/jigg_location.txt` where the path to Jigg is specified. That is all.

### Installing [semtagger](https://github.com/ginesam/semtagger) (for English, optional)

You can optionally download and install a semantic tagger by running the following
script from the ccg2lambda directory:

```bash
./en/install_semtagger.sh
```

This will generate a file `en/semtagger_location.txt` with the path to the semantic tagger.
Note that after downloading, you must follow the instructions given [here](https://github.com/ginesam/semtagger) in order to train a
tagging model.

## Using the Semantic Parser

Let's assume that we have a file `sentences.txt` with one sentence per line,
Expand Down
25 changes: 25 additions & 0 deletions en/emnlp2015exp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,12 @@ parser_cmd="${parser_dir}/bin/candc \
--candc-printer xml \
--input"

# Set a variable with the location of the semtagger tool (if used)
semtagger_dir=""
if [ -f en/semtagger_location.txt ]; then
semtagger_dir=`cat en/semtagger_location.txt`
fi

# These variables contain the names of the directories where intermediate
# results will be written.
plain_dir=${dataset}"_plain"
Expand Down Expand Up @@ -121,6 +127,25 @@ for f in ${plain_dir}/*.tok; do
python en/candc2transccg.py ${parsed_dir}/${base_filename}.candc.xml \
> ${parsed_dir}/${base_filename/.tok/}.xml
fi
# inject semantic tag information when using semtagger
if [ -n "$semtagger_dir" ]; then
if [ -f "$semtagger_dir"/run.sh ]; then
cp ${parsed_dir}/${base_filename/.tok/}.xml \
${parsed_dir}/${base_filename/.tok/}.xml.old
python scripts/xml2conll.py ${parsed_dir}/${base_filename/.tok/}.xml.old \
> ${parsed_dir}/${base_filename/.tok/}.off
. ${semtagger_dir}/run.sh --predict \
--input ${parsed_dir}/${base_filename/.tok/}.off \
--output ${parsed_dir}/${base_filename/.tok/}.sem
python scripts/xml_add_stag.py \
${parsed_dir}/${base_filename/.tok/}.xml.old \
${parsed_dir}/${base_filename/.tok/}.sem \
${parsed_dir}/${base_filename/.tok/}.xml
rm -f ${parsed_dir}/${base_filename/.tok/}.xml.old
rm -f ${parsed_dir}/${base_filename/.tok/}.off
rm -f ${parsed_dir}/${base_filename/.tok/}.sem
fi
fi
done
echo

Expand Down
11 changes: 9 additions & 2 deletions en/fracas.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Running the RTE pipeline on FraCas.

First, ensure that you have downloaded C&C parser and wrote its location in the file `en/candc_location.txt`.
First, ensure that you have downloaded C&C parser and wrote its location in the file `en/candc_location.txt`. Also ensure that you have downloaded semtagger, wrote its location in the file `en/semtagger_location.txt` and trained a tagging model in case you are willing to use semantic templates with semantic tags.

Second, you need to download the copy of [FraCaS provided by MacCartney and Manning (2007)](http://www-nlp.stanford.edu/~wcmac/downloads/fracas.xml):

Expand All @@ -16,7 +16,14 @@ git checkout tags/fracas
./en/emnlp2015exp.sh en/semantic_templates_en_emnlp2015.yaml fracas.xml
```

This script will:
If you are using semantic tags in your templates, you can similarly do:

```bash
git checkout semtag-fracas
./en/emnlp2015exp.sh en/semantic_templates_en_semtags_emnlp2015.yaml fracas.xml
```

The scripts will:

1. Extract the plain text corresponding to the hypotheses and conclusions of all fracas problems. These hypotheses and conclusions are stored in a different file for each fracas problem, under the directory `fracas.xml_plain`. The gold entailment judgment is stored in files `fracas.xml_plain/*.answer`.
2. Parse the hypotheses and conclusions using C&C parser, and save them under the directory `fracas.xml_parsed`.
Expand Down
10 changes: 10 additions & 0 deletions en/install_semtagger.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash
#
# Download semtagger from https://github.com/ginesam/semtagger

semtagger_url="https://github.com/ginesam/semtagger.git"
semtagger_dir=`pwd`"/"semtagger

git clone https://github.com/ginesam/semtagger $semtagger_dir
echo $semtagger_dir > en/semtagger_location.txt

Loading