This project explores various factors influencing the ratings of Airbnb listings by analyzing host characteristics. It was completed as part of the "Data Preparation and Workflow Management" course in the Marketing Analytics Master's program at Tilburg University.
For more details about the course, refer to the course contents.
To evaluate the results, navigate to the src
folder and run the provided scripts. The full process has been automated by running the makefile
.
The primary objective of this course is to equip students with the skills necessary to automate research projects, facilitating reproducibility of results and enabling easy adjustments to cater to different research interests. For instance, researchers can leverage the provided tools and methodologies to explore alternative effects within the same dataset to uncover new relationships between variables.
Guest satisfaction during their stay is influenced by three crucial factors: amenities, location, and the behavior of hosts (Source: Journal of Hospitality Management). This project aims to delve deeper into the ways hosts influence review scores, focusing specifically on their perceived trustworthiness online. Key variables associated with this analysis include the host's profile picture and identity verification.
Furthermore, the project seeks to explore potential differences across countries, with particular emphasis on France and Greece. While data for four cities within each country is available, this study concentrates on analyzing country-level trends. By incorporating this variable, insights into cross-cultural variations in customer judgments can be gained.
To what extent does (i) the presence of a profile picture and (ii) the identity verification of the host affect the review score (1-5) on Airbnb between France and Greece?
The repository serves as a comprehensive research workflow, which can be outlined as follows:
- Data exploration
- Data preperation
- Analysis
- Evaluation and deployment
Initially, these steps were conducted manually and later optimized and automated. The dataset used for analysis is publicly available and can be easily replicated with the necessary software installed
The primary data source of the project is the listings.csv.gz
files on Inside Airbnb.
The variables of interest to this project have been listed in the table below:
Variable | Description |
---|---|
host_has_profile_pic | Whether the host has a profile picture (TRUE vs FALSE) |
host_identity_verified | Whether the host has been verified (TRUE vs FALSE) |
host_location | The location of the listing, specified by city |
review_scores_rating | The star rating left by customers on the listing |
As the unit of analysis in this project is focused on hosts and the country which the listing is situated in, the host_location
variable has been operationalized to generalize to country level. Consequently, a new variable named Country_Dataset
has been created, assigning each listing to its corresponding country. All variables will be accessible in RStudio after running the france_greece_dataset.R
file, allowing for further exploration and potential research opportunities.
The insights generated from this project hold particular relevance for (potential) Airbnb hosts, emphasizing the importance of a profile picture and obtaining a verified account. By demonstrating the potential impact of these factors on their ratings, hosts may be motivated to prioritize these aspects to enhance their listing's performance.
Additionally, stakeholders such as Airbnb and similar hosting companies could gain advantages from the findings of this project. Insights from the analysis could inform platform enhancements or policy adjustments aimed at improving user experience and overall satisfaction.
- src
- clean_dataset.R
- data_exploration.R
- france_greece_dataset.R
- regression_analysis.R
- visualization.R
- webscraper_urls.py
- .gitignore
- LICENSE
- makefile
- README.md
Before executing the scripts, certain software programs and packages need to be installed. Instructions for downloading these are available on http://tilburgsciencehub.com/.
The required programs are Python, R & RStudio, Git and Make.
The additonal required packages are:
Python:
pip install requests
pip install beautifulsoup
pip install pandas
pip install os
R:
install.packages("readr")
install.packages("dplyr")
install.packages("tidyverse")
install.packages("data.table")
install.packages("ggplot2")
install.packages("knitr")
install.packages("kableExtra")
install.packages("readr")
install.packages("tinytex")
install.packages("pandoc")
After cloning the repository, the scripts will generate multiple folders and files by making use of the makefile
. The output will generate in the new data
and gen
folders. The data
folder is used as a starting point, which generates a list URLs after running webscraper_urls.py
. france_greece_dataset.R
will generate our selected datasets (France and Greece cities) into the gen/data_preperation
folder. This is also where project_dataset.csv
will generate, which combines all of these files. The (data_exploration.Rmd
) explores this data, by making use of an Rmarkdown
script. This generates a PDF with exploratory data analysis, into the src
file, which will be moved to the gen
folder later on with the makefile
. clean_dataset.R
will narrow down this file with only the selected variables of this project. Then, regression_analysis.R
will analyse the relationship and visualization.Rmd
will report the outcomes. The plots are generated into gen/analysis/output
. As mentioned before, These steps have been automated with the makefile
. Follow the tutorial below to automatically generate all the data/
- Create a (local) repository where the cloned files will be stored
- In your new folder, type CMD into the search bar. This will open the command line.
- Now you are ready to clone
The_impact_of_Airbnb_hosts_on_guest_reviews
from github. Type this into the command line:
git clone https://github.com/course-dprep/The_impact_of_Airbnb_hosts_on_guest_reviews
- Now you have the repository downloaded locally and are ready to run the scripts. As this is all automated, all you have to do is go back to the command line and execute:
make
This will take a couple minutes to run, as the whole research pipeline is being run.
- Now you can look at the results in the newly created
gen
folder.
To clean the output, type this into the command line:
make clean
To see the "to-be-executed" scripts, type:
make -n
- Julian Peters, e-mail: [email protected]
- Rolf Jens, e-mail: [email protected]
- Marc van Tergouw, e-mail: [email protected]
- Eline van Lopik, e-mail: [email protected]
- Hannes Datta, email: [email protected]