My Insight project for training and testing machine learning models for Precision Chemotherapy Recommender. The code for the web app is in the dash folder, and is also available at github.com/syao13/precisionChemoDash.git
Data used for this project can be downloaded from the Genomics of Drug Sensitivity in Cancer Project (link):
wget "ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release-7.0/sanger1018_brainarray_ensemblgene_rma.txt.gz"
wget "ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release-7.0/Cell_Lines_Details.xlsx"
wget "ftp://ftp.sanger.ac.uk/pub4/cancerrxgene/releases/release-7.0/v17.3_fitted_dose_response.xlsx"
This project depends the following Python libraries:
- pandas for calculating aggregated results.
- numpy and scipy for mathmatical calculations.
- docopt_ for better command line interface.
- jsonpickle for formatted and reusable output.
- sklearn_ for machine learning models
- multiprocessing_ for parallelize the model training process
- matplotlib and seaborn for visualization
To install dependencies manually:
pip3 install pandas
pip3 install numpy
pip3 install scipy
pip3 install jsonpickle
pip3 install sklearn
pip3 install multiprocessing
pip3 install matplotlib
pip3 install seaborn
'eda.ipynb' contains code for exploratory data analysis.
To train and test different model parameters:
python3 train_models.py trained_models.txt
To build models with the proper parameters:
python3 build_models.py built_models.txt