Skip to content

In this project, we try to predict the booking destination of a newly onboarded user on Airbnb platform to provide personalized experience and better forecast demand.

Notifications You must be signed in to change notification settings

derinben/Airbnb-New-User-Booking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

Airbnb-New-User-Booking

Summary

In this project, we intend to predict the booking destination of a newly onboarded user on Airbnb platform. To evaluate the model's performance, we used NDCG as the metric to improve the degree of relevance and the ranking of the predictions we make. With this, Airbnb could potentially

  • Provide personalized experience
  • Better forecast demand

Dataset

The problem statement and dataset is derived from Kaggle's - Airbnb New User Bookings Challenge. The predictor variables include information about the users (user_id, age, gender etc.) and preliminary session data (actions, action types, session time etc.). Our objective would be to predict the country (dependent variable) that the user is most likely to visit. It is to be noted that only a limited set of users have associated session data and therefore, we merge (inner join) the datasets and proceed for modelling with about 5.5 mil session observations for close to 73k users.

Methodology

In order to produce the results, we performs data preparation, data preprocessing, feature engineering, model building, evaluation and yperparameter tuning. Some amount of EDA was done to understand the dataset which can be found here.

Some challenges in this dataset is handling the imbalanced dataset and limited information available.

Peek into feature engineering and selection that improved the results to a great extent

Features that indicated higher likelihood of even making a booking in the first place

Features that indicated lower likelihood of making a booking

Modelling

The models we try out are as follows: Multinomial regression - using Softmax function and L2 regularization applied to help with classifying our target variables beyond the two categories where we apply logistic regression.

Bernoulli Naive Bayes - Bernoulli Naïve Bayes is well suited for discrete data with binary features which was the case after we completed feature engineering.

Decision Trees - highly predictive due to their capability of mapping non-linear relationships well. Results are also easily interpretable within the business context.

XGboost - Allows us to leverage its regularization technique (using both L1 and L2), sparsity awareness (robust learning from missing values) and in-built cross validation.

Consequently, Xboost gave the best performance of a NDCG score of 88.323.

For further improvements

  • Airbnb can consider data on detailed user demographics, as well as sessions' data (e.g., session time and data, search queries, etc.)
  • Work with relevant stakeholders to further refine feature selection.
  • We can cansider Novelty as a metric for recommending new travel destinations to users

About

In this project, we try to predict the booking destination of a newly onboarded user on Airbnb platform to provide personalized experience and better forecast demand.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published