The following instructions will allow you to :
- clone the repository
- clone the submodules used by the repository
- create a dedicated environment and install requirements
git clone
git submodule update --init --recursive --remote
conda create -c conda-forge -n diverse_molgen rdkit
conda activate diverse_molgen
pip install -r requirements.txt
The following python scripts will run molecular generation in different settings :
- Diverse molecule generation on the DRD2 dataset
python --nruns 10 --dataset drd2
- Diverse molecule generation on the EGFR dataset
python --nruns 10 --dataset egfr
- Memory RL reimplementation on the DRD2 dataset
python --nruns 10 --dataset drd2 --use_memory_rl True
The EGFR and DRD2 datasets were extracted from the ExCAPE-DB database (Sun, J.; Jeliazkova, N.; Chupakhin, V.; Golib-Dzib, J.-F.; Engkvist, O.; Carlsson, L.; Wegner, J.; Ceulemans, H.; Georgiev, I.; Jeliazkov, V., et al. ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. J. Cheminf. 2017, 9, 1–9. 3.)
NB : this step can be skipped, using already generated trajectories in the results
and results_memory_RL
Reproducing the graphs from the paper can be achieved by running all the notebooks at the root of the repository :
to reproduce Figure 4.
to reproduce Figure 8.
to reproduce Figure 5, 7a, 9a, 11a, 11b, 11c and 14.
to reproduce Figure 6, 7b, 9b, 11d.
to reproduce Figure 12.
to reproduce Figure 13.
NB : Some cells are commented in egfr.ipynb
and drd2.ipynb
and the results of those cells already saved in the robustness_experiments
folder, as they take a very long time to run. Please uncomment them if you want to run them and regenerate the results stored in robustness_experiments
from scratch.