Skip to content

Comparison and direct label data for Airbnb listing price estimation.

License

Notifications You must be signed in to change notification settings

xycforgithub/Airbnb-Comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crowdsourcing Comparisons and Labels for Airbnb Listing Price Estimation

This repository contains the crowdsourced data for Airbnb listing price comparison as described in:

Yichong Xu, Sivaraman Balakrishnan, Aarti Singh and Artur Dubrawski.
Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information
arXiv preprint arXiv:1806.03286, 2018
arXiv version
Conference version in ICML 2018

Please cite the above paper if you use this data.

We collected comparisons and direct labels for evaluating the Airbnb listing prices in Seattle, Washington, US. We record the crowdsource workers' answers, as well as their response time. The Airbnb listing data is from Kaggle.

Data description

raw_data.json contains the raw data including:
> features: Textual and numerical features for each listing.
> labels: For each listing, we collect 5 (for training set) or 10 (for test set) labels from crowdsource workers.
> comparisons: We additionally collect comparisons between 1,895 pairs of listings. We collect two comparisons for each pair. The pair entry contains the indices of the pairs, and data entry contains the comparisons that we have collected. All comparisons are on the training set.
> num_train_data: Number of training data (we have 389 training points and 97 testing points).

generate_data.py featurize the raw data into a numerical matrix that can be used by subsequent algorithms. We use 16 features in total (55 if expanding categorical features to binary ones), as described in our paper and the Python script. For convenience, we include the processing result in vectorized_data.json.

by Yichong Xu

[email protected]

About

Comparison and direct label data for Airbnb listing price estimation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages