This project uses the census income data and fits LightGBM models on it. We also calculare the feature importances with SHAP (SHapley Additive exPlanations).
It is not intended to bring super good results, but rather as a demo to show the interaction between LightGBM, Optuna and HPOflow. The usage of HPOflow is optional and can be removed if wanted.
This work can be understood as a template for other projects.
The scripts and notebooks should be executed in this order.
preprocess.ipynb
: download, explore and preprocess the datasimple_train.py
: do hyperparameter search with Optunaoptuna_vis.ipynb
: print and visualize optuna resultssave_train.py
: fit LightGBM on full dataset with best hyperparameter-set - this is an extension ofsimple_train.py
and adds the option to store the model with the best hyperparameter set - therefore there is a lot of redundancy withsimple_train.py
shap_values.ipynb
: calculate and visualize shap values / feature importanceoptuna.db
: this was intentionally placed in git to be able to visualize the results directly usingoptuna_vis.ipynb
- create and activate a new Python environment (for example with conda)
- install the dependencies:
pip install -r requirements.txt
- execute
preprocess.ipynb
to load and preprocess the data - start the hyperparameter optimization with
python simple_train.py
- wait a few minutes
- execute
optuna_vis.ipynb
to view the results (can be made in parallel while the optimization is still running) - also look at the graphics in the plots directory
Copyright (c) 2022 Philip May, Deutsche Telekom AG
Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.