- To automate the meaning tagging of the domestic thesis sentence by predicting the rhetorical category of a thesis sentence.
- Hierarchical embedding structure and multiple loss functions are used to represent the meaning of rhetorical categories.
There are a total of 155,740 thesis sentences and tag pairs, and the semantic tags form a hierarchical structure with semantic structure classification/detailed semantic classification.
- Constructed text representation for thesis sentences using KorSciBert and GCN.
- Label embedding is constructed to extract the label semantic representation.
- Multiple loss function was constructed to reflect hierarchical properties through label semantic distance.
- Classification loss : We predicted labels using only text representation.
- Join embedding loss : We minimized the distance between text semantics and target label semantics within the same embedding space.
- Matching loss : We put distance between text semantics and incorrect label semantics.
/root/workspace
βββ data
β βββ csv
β β βββ train.csv
β β βββ dec.csv
β β βββ test.csv
β β βββ label_desc.csv
β βββ hierar
β β βββ hierar_prob.json
β β βββ hierar.txt
β β βββ label.dict
β β βββ label_i2v.pickle
β β βββ label_v2i.pickle
β βββ make_df.py
β
βββ src
β βββ models
β β βββ pretrained_model
β β β βββ korscibert
β β β βββ bert_config_kisti.json
β β β βββ pytorch_model.bin
β β β βββ tokenization_kisti.py
β β β βββ vocab_kisti.txt
β β β
β β βββ structure_model
β β β βββ graphcnn.py
β β β βββ structure_encoder.py
β β β βββ tree.py
β β β
β β βββ matching_network.py
β β βββ model.py
β β βββ text_feature_propagation.py
β β
β βββ utils
β β βββ configure.py
β β βββ evaluation_modules.py
β β βββ hierarchy_tree_stastistic.py
β β βββ train_modules.py
β β βββ utils.py
β β
β βββ config.json
β βββ dataloader.py
β βββ main.py
β βββ trainer.py
β
βββ sen_cls.yaml
- Create Environment & Import Library
conda env create -f sen_cls.yaml conda activate sen_cls pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
- Training
python main.py --do_train=True --exp_num='exp'
- Test
python main.py --do_test=True --exp_num='exp0'
- Predict
python main.py --do_predict=True --exp_num='0'