March 27, 2023

Introduction

The goal of the project is to predict lung cancer mortality rate in the United States. Note that it is to predict the probability of lung cancer rate in each county instead of prediction for each person. This project is not a classification but a regression problem.

Datasets

Language and libraries

Language: Python

Libraries :

Data Preprocessing

Convert FIPS column to correct format

Federal Information Processing Standard or FIPS is a categorical variable. It is a code with five digits. The left two digits show the state and the three right digits show the county code. (e.g. 02013)

In raw data, it is numeric and not five-digit numbers. So, it is required to convert it to five-digit values. (e.g. 2013 -> 02013)

Merge the population data to the main dataframe

Deal with missing value in explanatory/ dependent variable