About molecular properties prediction. |
All molecular property predictors are calculated using fragment-based contributions.
We developed an original method for splitting a molecule into a set of
linear or non-linear fragments of different length and representation levels
and counting the number of occurrences of each chemical pattern found.
A Partial Least Squares (PLS) regression model was built and optimized
for a particular property using a leave-50%-out cross-validation calculation.
The method is very robust and fast (about 5K of compounds per second).
LogP (octanol/water partition coefficient)
13K compounds from the PHYSPROP database were used to find a PLS-regression model.
The best model was found with Q2=0.92 and Rmse=0.56.
LogS (water solubility)
5K compounds from the PHYSPROP database were used to find a PLS regression model.
The best model was found with Q2=0.82 and Rmse=0.87.
Molecular Polar Surface Area (PSA) and Volume
PSA is defined as sum of surfaces of oxygens, nitrogens and attached hydrogens.
6K compounds from the WDI database were used to find a PLS regression model.
The best model was found with Q2=0.99 and Rmse=1.56.
Drug-likeness score
Predicts an overall drug-likeness score using and Molsoft's chemical fingerprints.
The training set for this mode consisted of:
- 5K of marketed drugs from WDI (positives)
- 10K of carefully selected non-drug compounds. (negatives)
Definitions:
- Rmse - root-mean-square-error of cross-validated prediction
- Q2 - cross-validated squared correlation coefficient of predictions vs. training values
Return to the molecular property prediction page
|