April 16, 2023
Introduction
This is a kaggle competition to predict which passengers are transported to an alternate dimension. The data source is from https://www.kaggle.com/competitions/spaceship-titanic
I blended predictive results using Vote Classifier from two algorithms – LGBM, and CatBoost – to maximize the accuracy of the prediction.
Achievement
- R-squared for test dataset after submission was 0.80897
- Rank: 133/2504 (Top 5% as of April 17, 2023)
Datasets
- 12 variables for each passenger and a column for successfully transported or not (y variable). Total 8,693 rows for training set, and 4,277 rows for test set.

- After data cleaning, I kept all rows and imputed missing values.
Language and libraries
Language: Python
Libraries :
- Data Manipulation: pandas, numpy
- Data Visualization: matplotlib, seaborn, msno
- Machine Learning: sklearn, xgboost, lightgbm, catboost
Data Preprocessing 1
- Insights :
- Missing data is not in order