mynlp · ginesiametlle · Jul 5, 2018 · Jul 26, 2018 · Jul 26, 2018 · Jul 26, 2018
diff --git a/README.md b/README.md
@@ -46,6 +46,7 @@ coqc coqlib.v
 ```
 
 Our system assigns semantics to CCG structures. At the moment, we support C&C for English, and Jigg for Japanese.
+If you are working with templates that require semantic tags, you will also need a universal semantic tagger.
 
 ### Installing [C&C parser](http://www.cl.cam.ac.uk/~sc609/candc-1.00.html) (for English)
 
@@ -72,6 +73,19 @@ Simply do:
 
 The command above will download Jigg, its models, and create the file `ja/jigg_location.txt` where the path to Jigg is specified. That is all.
 
+### Installing [semtagger](https://github.com/ginesam/semtagger) (for English, optional)
+
+You can optionally download and install a semantic tagger by running the following 
+script from  the ccg2lambda directory:
+
+```bash
+./en/install_semtagger.sh
+```
+
+This will generate a file `en/semtagger_location.txt` with the path to the semantic tagger.
+Note that after downloading, you must follow the instructions given [here](https://github.com/ginesam/semtagger) in order to train a 
+tagging model.
+
 ## Using the Semantic Parser
 
 Let's assume that we have a file `sentences.txt` with one sentence per line,

diff --git a/en/emnlp2015exp.sh b/en/emnlp2015exp.sh
@@ -65,6 +65,12 @@ parser_cmd="${parser_dir}/bin/candc \
     --candc-printer xml \
     --input"
 
+# Set a variable with the location of the semtagger tool (if used)
+semtagger_dir=""
+if [ -f en/semtagger_location.txt ]; then
+  semtagger_dir=`cat en/semtagger_location.txt`
+fi
+
 # These variables contain the names of the directories where intermediate
 # results will be written.
 plain_dir=${dataset}"_plain"
@@ -121,6 +127,25 @@ for f in ${plain_dir}/*.tok; do
     python en/candc2transccg.py ${parsed_dir}/${base_filename}.candc.xml \
       > ${parsed_dir}/${base_filename/.tok/}.xml
   fi
+  # inject semantic tag information when using semtagger
+  if [ -n "$semtagger_dir" ]; then
+    if [ -f "$semtagger_dir"/run.sh ]; then
+        cp ${parsed_dir}/${base_filename/.tok/}.xml \
+           ${parsed_dir}/${base_filename/.tok/}.xml.old
+        python scripts/xml2conll.py ${parsed_dir}/${base_filename/.tok/}.xml.old \
+               > ${parsed_dir}/${base_filename/.tok/}.off
+        . ${semtagger_dir}/run.sh --predict \
+          --input ${parsed_dir}/${base_filename/.tok/}.off \
+          --output ${parsed_dir}/${base_filename/.tok/}.sem
+        python scripts/xml_add_stag.py \
+               ${parsed_dir}/${base_filename/.tok/}.xml.old \
+               ${parsed_dir}/${base_filename/.tok/}.sem \
+               ${parsed_dir}/${base_filename/.tok/}.xml
+        rm -f ${parsed_dir}/${base_filename/.tok/}.xml.old
+        rm -f ${parsed_dir}/${base_filename/.tok/}.off
+        rm -f ${parsed_dir}/${base_filename/.tok/}.sem
+    fi
+  fi
 done
 echo
 

diff --git a/en/fracas.md b/en/fracas.md
@@ -1,6 +1,6 @@
 # Running the RTE pipeline on FraCas.
 
-First, ensure that you have downloaded C&C parser and wrote its location in the file `en/candc_location.txt`.
+First, ensure that you have downloaded C&C parser and wrote its location in the file `en/candc_location.txt`. Also ensure that you have downloaded semtagger, wrote its location in the file `en/semtagger_location.txt` and trained a tagging model in case you are willing to use semantic templates with semantic tags.
 
 Second, you need to download the copy of [FraCaS provided by MacCartney and Manning (2007)](http://www-nlp.stanford.edu/~wcmac/downloads/fracas.xml):
 
@@ -16,7 +16,14 @@ git checkout tags/fracas
 ./en/emnlp2015exp.sh en/semantic_templates_en_emnlp2015.yaml fracas.xml
 ```
 
-This script will:
+If you are using semantic tags in your templates, you can similarly do:
+
+```bash
+git checkout semtag-fracas
+./en/emnlp2015exp.sh en/semantic_templates_en_semtags_emnlp2015.yaml fracas.xml
+```
+
+The scripts will:
 
 1. Extract the plain text corresponding to the hypotheses and conclusions of all fracas problems. These hypotheses and conclusions are stored in a different file for each fracas problem, under the directory `fracas.xml_plain`. The gold entailment judgment is stored in files `fracas.xml_plain/*.answer`.
 2. Parse the hypotheses and conclusions using C&C parser, and save them under the directory `fracas.xml_parsed`.

diff --git a/en/install_semtagger.sh b/en/install_semtagger.sh
@@ -0,0 +1,10 @@
+#!/bin/bash
+#
+# Download semtagger from https://github.com/ginesam/semtagger
+
+semtagger_url="https://github.com/ginesam/semtagger.git"
+semtagger_dir=`pwd`"/"semtagger
+
+git clone https://github.com/ginesam/semtagger $semtagger_dir
+echo $semtagger_dir > en/semtagger_location.txt
+