GitHub - Zhilin123/personal_attributes: Extract and infer personal attributes from dialogue

This project seeks to extract and infer personal attributes from dialogue

Dependencies

pip install -r requirements.txt

Python3 is required

GPU (>= 16GB memory) is highly recommended.

Dataset

Dataset has been repurposed from DialogNLI

Please put the unzipped dnli folder at the same level as the src folder. dnli folder should contain dialogue_nli/dialogue_nli_dev.jsonl , dialogue_nli/dialogue_nli_train.jsonl and dialogue_nli/dialogue_nli_test.jsonl

Benefits to Open-domain Chit-chat (PersonaChat)

First install ParlAI from source, at the same level at src

git clone https://github.com/facebookresearch/ParlAI.git ParlAI
cd ParlAI
python setup.py develop

Create a parlai_internal folder and copy convai2-rev into parlai_internal/tasks

cp example_parlai_internal parlai_internal
cp ../src/convai2-rev parlai_internal/tasks/convai2-rev

Train a model using the new task

cd parlai
parlai train_model -mf <working_directory>/model -m transformer/generator -im zoo:blender/blender_90M/model -vp 15 -t internal:convai2-rev:normalized -bs 32 -ltim 60 --rank-candidates True --embedding-size 512 --n-layers 8 --ffn-size 2048 --dropout 0.1 --n-heads 16 --learn-positional-embeddings True --n-positions 512 --variant xlm --activation gelu --fp16 True --text-truncate 512 --label-truncate 128 --dict-tokenizer bpe --dict-lower True -lr 1e-06 --optimizer adamax --lr-scheduler reduceonplateau --gradient-clip 0.1 -veps 0.25 --betas 0.9,0.999 --update-freq 1 --attention-dropout 0.0 --relu-dropout 0.0 --skip-generation False -stim 6000 -vme 20000 -bs 16 -vmt hits@1 -vmm max --save-after-valid True

Training + Testing

The command below trains the entire GenRe model.

# <task> {extraction, inference}
bash train.sh <working_directory> <task> <random_seed>

Analysis

To show distribution of dependency labels and POS tags in the Extraction dataset

cd analysis

python -m spacy download en_core_web_trf

python preprocess_linguistic_analysis.py --debug_mode False \
--csv_filename data/eval_tokens_within-sentence.csv

# <interested_field> {dependency_labels, big_pos_tags}
python linguistic_analysis.py --interested_field  <interested_field> \
--csv_filename data/eval_tokens_within-sentence_analysis.csv

To show how tail entities can be linked to sentences after various transformation in the Inference dataset

#Call conceptnet API to obtain words linked by commonsense
python call_and_save_conceptnet_api.py --category_of_words sentence --dataset eval --field_of_interest related --subset all
python call_and_save_conceptnet_api.py --category_of_words sentence --dataset eval --field_of_interest connected --subset all
python call_and_save_conceptnet_api.py --category_of_words tail --dataset eval --field_of_interest related --subset all
python call_and_save_conceptnet_api.py --category_of_words tail --dataset eval --field_of_interest connected --subset all

# run analysis
python tail_entity_not_within_sentence_analysis.py --mode dataset_analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Dataset

Benefits to Open-domain Chit-chat (PersonaChat)

Training + Testing

Analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
analysis		analysis
convai2-rev		convai2-rev
eval		eval
model		model
readme.md		readme.md
requirements.txt		requirements.txt
train.sh		train.sh

Zhilin123/personal_attributes

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Dataset

Benefits to Open-domain Chit-chat (PersonaChat)

Training + Testing

Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages