GitHub

Files:

In order to find the closest molecules firstly generate their embeddings using 'physio_featurization.py' file.
Run the algorithm using the 'closer_finder.py' file as following:

python closer_finder.py <start> <end> <bool_physio>

where:

<start> indicates the index of a molecule in the dataset 'embd_neighbours_only_100K_rmse_original_smiles.csv' you want to start with

<end> -- the index of the last molecule in a dataset

<bool_physio> is an indicator for an embedding you want to test (in order to generate closest molecules by Physio Chemical Featurization run 'physio_featurization.py' file, otherwise for Circular Fingerprint Featurizaion there is needed a resulting file after running 'cf_featurization.py' file)

After several hours the algorithm will generate appropriate file depending on the inputs. As an example, after running 'python closer_finder.py 10000 15000 True' you will get a file 'PhysiochemicalFeaturization10000_to_15000.csv' with 10 closest molecules.
For faster results an algorithm was runned at 20 threads (from 0 to 5000, from 5000 to 10000, etc.) and the file 'combining.py' combines the outputs into a single file 'Physiochemical_Featurizing.csv'.
The file 'similarity.py' computes the similarity score based on total number of correctly predicted 10 closest molecules divided by a total number of predictions (= 100000*\10). For running it needs the former 'embd_neighbours_only_100K_rmse_original_smiles.csv' file as well as 'Physiochemical_Featurizing.csv'.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
cf_featurization.py		cf_featurization.py
closer_finder.py		closer_finder.py
combining.py		combining.py
comparing.py		comparing.py
physio_featurization.py		physio_featurization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

sofiyagarkot/nearest_molecules

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages