July 30, 2023
Introduction
This Kaggle competition is to predict house prices in Ames, Iowa, based on various features of the houses. The dataset used for this competition was from Kaggle - House Prices: Advanced Regression Techniques.
I created an ensemble model which consists of diverse regression algorithms to enhance the predictive performance.
Achievement
- MAPE for test dataset after submission was 0.12069
- Rank: 271/4237 (Top 6% as of July 29, 2023)
Datasets
- 81 features, including various characteristics of the houses (e.g., number of bedrooms, garage area, etc.)
- Target variable is the sale price of each house
- Total 1,460 rows for training set, and 1,459 rows for test set.
- After data cleaning, I removed five features and transformed some features to improve model performance.
To prepare the data for modeling, I performed data cleaning and handled missing values by imputing them appropriately.
Language and libraries
Language : Python
Libraries :
- Data Manipulation: pandas, numpy, scipy itables, pandas_profiling, missingno
- Data Visualization: matplotlib, seaborn
- Machine Learning: sklearn, xgboost, lightgbm, catboost