Cross-Domain Data Augmentation with Domain-Adaptive Language Modeling for Aspect-Based Sentiment Analysis
The code for our ACL2023 paper (https://aclanthology.org/2023.acl-long.81/)
Jianfei Yu, Qiankun Zhao, Rui Xia. "Cross-Domain Data Augmentation with Domain-Adaptive Language Modeling for Aspect-Based Sentiment Analysis"
The training data comes from four domains: Restaurant(R) 、 Laptop(L) 、 Service(S) 、 Devices(D).
The in-domain corpus(used for training BERT-E) come from yelp and amazon reviews.
Click here to get BERT-E (BERT-Extented) , and the extraction code is by0i. (Please specify the directory where BERT is stored in modelconfig.py.)
To assign pseudo labels to unlabeled data in the target domain, run below code:
bash pseudo_label.sh
Train a domain-adaptive language model, generate target-domain labeled data, and finally use the generated data for the main tasks. We use LSTM and GPT2 as decoder in language modeling respectively.
2.1 To train the GPT2-based DALM for data generation and evaluation, run below code:
bash GPT2.sh
2.2 To train the LSTM-based DALM for data generation and evaluation, run below code:
bash LSTM.sh
- Some code in LSTM-based language modeling are based on the codes of DAGA, many thanks!