Census-Income with LightGBM and Optuna

This project uses the census income data and fits LightGBM models on it. We also calculare the feature importances with SHAP (SHapley Additive exPlanations).

It is not intended to bring super good results, but rather as a demo to show the interaction between LightGBM, Optuna and HPOflow. The usage of HPOflow is optional and can be removed if wanted.

This work can be understood as a template for other projects.

File Description

The scripts and notebooks should be executed in this order.

preprocess.ipynb: download, explore and preprocess the data
simple_train.py: do hyperparameter search with Optuna
optuna_vis.ipynb: print and visualize optuna results
save_train.py: fit LightGBM on full dataset with best hyperparameter-set - this is an extension of simple_train.py and adds the option to store the model with the best hyperparameter set - therefore there is a lot of redundancy with simple_train.py
shap_values.ipynb: calculate and visualize shap values / feature importance
optuna.db: this was intentionally placed in git to be able to visualize the results directly using optuna_vis.ipynb

Usage

create and activate a new Python environment (for example with conda)
install the dependencies: pip install -r requirements.txt
execute preprocess.ipynb to load and preprocess the data
start the hyperparameter optimization with python simple_train.py
wait a few minutes
execute optuna_vis.ipynb to view the results (can be made in parallel while the optimization is still running)
also look at the graphics in the plots directory

Licensing

Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Census-Income with LightGBM and Optuna

File Description

Usage

Licensing

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
optuna.db		optuna.db
optuna_vis.ipynb		optuna_vis.ipynb
preprocess.ipynb		preprocess.ipynb
requirements.txt		requirements.txt
save_train.py		save_train.py
shap_values.ipynb		shap_values.ipynb
simple_train.py		simple_train.py

License

telekom/census-income-lightgbm

Folders and files

Latest commit

History

Repository files navigation

Census-Income with LightGBM and Optuna

File Description

Usage

Licensing

About

Resources

License

Stars

Watchers

Forks

Languages