Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align.sh breaks #3

Open
ButteredGroove opened this issue Jun 18, 2019 · 1 comment
Open

align.sh breaks #3

ButteredGroove opened this issue Jun 18, 2019 · 1 comment

Comments

@ButteredGroove
Copy link

Hi, and thank you for the update!
I've been trying to finish the alignment steps on LDC2017T10, but have run into a bug:

(ve) ~/guo_lu/AMR-Parser-master/data$ ./align.sh
<LOTS OF OUTPUT>
3237
3238
3239
Traceback (most recent call last):
  File "preprocess/merge_file.py", line 80, in <module>
    node_tuple = node_list[index][counter]
IndexError: list index out of range
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: train.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'train.txt': No such file or directory
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: dev.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'dev.txt': No such file or directory
rm: cannot remove 'dev.txt.pb.lemmas': No such file or directory
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: test.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'test.txt': No such file or directory
rm: cannot remove 'test.txt.pb.lemmas': No such file or directory

It appears that the instigating issue is a list index out of range.

In case it helps, my setup diary follows. You'll notice several situations where I had to handle bugs:

Diary: Set up and run Guo and Lu AMR parser

1. Set up base environment:
  * Ubuntu 16.04
  * Python 3.5.2
  * CUDA 8.0
  * Cudnn 6.0

2. Set locale:
export LANG=C.UTF-8
export LANGUAGE=C.UTF-8
export LC_ALL=C.UTF-8

3. Create directory to do work:
mkdir guo_lu
cd guo_lu

4. Create python virtual environment and activate it:
sudo apt-get install python3-venv
python3 -m venv ve
source ve/bin/activate
pip install --upgrade pip
pip install wheel

5. Build and install DyNet 2.0
# Based on manual install instructions from:
# https://dynet.readthedocs.io/en/latest/python.html#manual-installation
sudo apt-get install -y build-essential cmake mercurial git unzip
pip install Cython ordered-set numpy nltk
wget https://github.com/clab/dynet/archive/v2.0.zip
unzip v2.0.zip
cd dynet-2.0
hg clone https://bitbucket.org/eigen/eigen -r b2e267d
mkdir build
cd build
cmake .. -DEIGEN3_INCLUDE_DIR=../eigen -DPYTHON=`which python` -DBACKEND=cuda
make
cd python
python setup.py install

6. Install and set up JAMR
cd ~/guo_lu
sudo apt-get install -y openjdk-8-jre
wget https://github.com/jflanigan/jamr/archive/Semeval-2016.zip
unzip Semeval-2016.zip
cd jamr-Semeval-2016
./setup
. scripts/config.sh
./compile

7. Install AMR Parser
wget -nv https://github.com/Cartus/AMR-Parser/archive/master.zip
unzip master.zip

8. Install MGIZA++
sudo apt-get install libboost-all-dev
cd AMR-Parser-master/data
git clone https://github.com/moses-smt/mgiza.git
cd mgiza/mgizapp
cmake .
make
make install

9. NLTK tagger installation
python
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
quit()

10. Prepare LDC2017T10
# Create links to LDC2017T10 in AMR-Parser-master/data subdirectories
cd ~/guo_lu/AMR-Parser-master/data/amr/data/amrs/split/train/
rm amr-release-1.0-training-bolt.txt
cp -as ~/abstract_meaning_representation_amr_2.0/data/amrs/split/training/* .
cd ../test
rm amr-release-1.0-test-bolt.txt
cp -as ~/abstract_meaning_representation_amr_2.0/data/amrs/split/test/* .
cd ../dev
rm amr-release-1.0-dev-bolt.txt
cp -as ~/abstract_meaning_representation_amr_2.0/data/amrs/split/dev/* .
cd ~/guo_lu/AMR-Parser-master/data

11. Run preprocessing script
./preprocess_17.sh

12. Run JAMR aligner
cd ~/guo_lu/jamr-Semeval-2016
. scripts/config.sh
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/train/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/train.txt
# BUG !!!!!!!!
>> ### Tokenizing ###
>> panic: swash_fetch got swatch of unexpected bit width, slen=1024, needents=64 at /gpfs-volume/guo_lu/jamr-Semeval-2016/tools/cdec/corpus/support/quote-norm.pl line 149, <STDIN> line 1.
>> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
>>        at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:48)
>>        at edu.cmu.lti.nlp.amr.CorpusTool$$anonfun$main$1.apply(CorpusTool.scala:43)
>>        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>        at edu.cmu.lti.nlp.amr.CorpusTool$.main(CorpusTool.scala:43)
>>        at edu.cmu.lti.nlp.amr.CorpusTool.main(CorpusTool.scala)

# Fix this bug in JAMR as per https://github.com/jflanigan/jamr/issues/17
sed -i_BAK '149 s/^/#/' tools/cdec/corpus/support/quote-norm.pl

# Now run aligner:
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/train/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/train.txt
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/test/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/test.txt
scripts/ALIGN.sh < ~/guo_lu/AMR-Parser-master/data/amr/tmp_amr/dev/amr.txt > ~/guo_lu/AMR-Parser-master/data/jamr_output/dev.txt

13. Run Hybrid Aligner
cd ~/guo_lu/AMR-Parser-master/data
./align.sh
# BUG !!!!!!!!
>> <snip>
>> cat: write error: Broken pipe
>>  File "./scripts//stem-4-letters.py", line 10
>>    print ' '.join(w if w.startswith(':') or (w.startswith('++') and w.endswith('++')) else w[:3] for w in line.strip().split())

# stem-4-letters.py uses the python2 print.  I'm not sure how many other
# scripts use the old python2 print statement.  It appears that the
# AMR parser code assumes that python3 = Python 3.5.2 and python = Python 2.?
# This also means that the AMR parser has an undocumented requirement of
# Python 2.
# I decided to replicate the assumption in my environment by linking
# python to python2:
cd ~/guo_lu/ve/bin
rm pip python
ln -s /usr/bin/python2 python

# Try again:
cd -
./align.sh
# BUG !!!!!!!!
./scripts//run_aligner.sh: 7: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//plain2snt: not found
./scripts//run_aligner.sh: 8: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 9: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 10: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//snt2cooc: not found
./scripts//run_aligner.sh: 14: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//plain2snt: not found
./scripts//run_aligner.sh: 15: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 16: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//mkcls: not found
./scripts//run_aligner.sh: 17: ./scripts//run_aligner.sh: ../mgiza/mgizapp/bin//snt2cooc: not found

# Looks like a bad path.
# Fixed it by editing last two lines of:
# ~/guo_lu/AMR-Parser-master/data/align/addresses.keep
MGIZA_SCRIPT=~/guo_lu/AMR-Parser-master/mgiza/mgizapp/scripts
MGIZA_BIN=~/guo_lu/AMR-Parser-master/mgiza/mgizapp/bin

# Try #3:
./align.sh
# BUG !!!!!!!!
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 1, in <module>
    import nltk
ImportError: No module named nltk
Exception in thread "main" java.io.FileNotFoundException: train.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)

# align.sh calls align2conll.py using python, not python3.  The README
# specifically states to install nltk under Python3.  Maybe the python=python2
# assumption isn't a good one?  The print statements of align2conll.py are
# Python 3 style.  So we have a case where python should be python3.
#
# I decided to edit align.sh to maintain the python = Python 2,
# python3 = Python3 paradigm.:
Line 32: python3 preprocess/align2conll.py hybrid_pr.txt train.txt
Line 40:     python3 preprocess/align2conll.py ${JAMR_DIR}/${SPLIT}.txt ${SPLIT}.txt

# Try #4:
./align.sh
# BUG !!!!!!!!
<snip>
3238
3239
Traceback (most recent call last):
  File "preprocess/merge_file.py", line 80, in <module>
    node_tuple = node_list[index][counter]
IndexError: list index out of range
Traceback (most recent call last):
  File "preprocess/align2conll.py", line 5, in <module>
    reload(sys)
NameError: name 'reload' is not defined
Exception in thread "main" java.io.FileNotFoundException: train.txt (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at BasicFileReader.readFile(BasicFileReader.java:30)
        at ConllFileIO.readConllFile(ConllFileIO.java:21)
        at AMROracleRunner.main(AMROracleRunner.java:129)
rm: cannot remove 'train.txt': No such file or directory
<snip>
@Cartus
Copy link
Owner

Cartus commented Jun 21, 2019

I am trying to solve this issue, but it will take some time. I will update you once I fix the bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants