Machine LearningAcademic Project
Airbnb NYC Price Prediction
An end-to-end machine learning project to predict Airbnb listing prices in New York City. The pipeline covers data ingestion, cleaning, feature engineering (geospatial, textual, categorical), model training with LightGBM, hyperparameter tuning via Optuna, and a final evaluation on a held-out test set.
Key Highlights
- Feature engineered 48,000+ listings: geospatial clustering (K-Means), TF-IDF on listing descriptions, and 30+ derived features.
- LightGBM model with Optuna hyperparameter search achieved R² = 55.3%, RMSE = $102.93.
- Compared 6 models (Linear Regression, Ridge, Random Forest, XGBoost, LightGBM, CatBoost).
- SHAP value analysis to interpret feature importance — neighbourhood and room type were top predictors.
- Full reproducible pipeline in Jupyter notebooks with modular preprocessing classes.
Tech Stack
PythonLightGBMScikit-LearnPandasK-Means