This repository contains the code used for dependency parsing and information about how to obtain the dataset presented in the work:
Reconstructing the house from the ad: Structured prediction on real estate classifieds
The dataset includes 2,318 manually annotated property advertisements from a real estate company.
If you use part of the code or the dataset please cite:
@InProceedings{E17-2044,
author = "Bekoulis, Giannis
and Deleu, Johannes
and Demeester, Thomas
and Develder, Chris",
title = "Reconstructing the house from the ad: Structured prediction on real estate classifieds",
booktitle = "Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers",
year = "2017",
publisher = "Association for Computational Linguistics",
pages = "274--279",
location = "Valencia, Spain",
url = "http://aclweb.org/anthology/E17-2044"
}
and
@article{BEKOULIS2018100,
title = "An attentive neural architecture for joint segmentation and parsing and its application to real estate ads",
journal = "Expert Systems with Applications",
volume = "102",
pages = "100 - 112",
year = "2018",
issn = "0957-4174",
doi = "https://doi.org/10.1016/j.eswa.2018.02.031",
url = "http://www.sciencedirect.com/science/article/pii/S0957417418301192",
author = "Giannis Bekoulis and Johannes Deleu and Thomas Demeester and Chris Develder"
}
The code is written for Python 2.7. Some of the python packages needed to run these files, best installed using pip.
- scikit-learn (machine learning)
- pandas (Data manipulation)
- pandas_confusion (performance measures)
In the repository, one can find the 4 models (Threshold, Edmond, Structured Prediction via the Matrix-Tree Theorem (MTT), Transition-based) that we have developed for dependency parsing. One should run the run_script.py file that serves as a main function.
To obtain the anonymized dataset fill in and sign this form. Send it also via email to [email protected]. Follow the instructions and we will get back to you as soon as possible with information about how to download the anonymized dataset.