Knowledge Base | What's New | Contact Us |

About molecular properties prediction.


All molecular property predictors are calculated using fragment-based contributions. We developed an original method for splitting a molecule into a set of linear or non-linear fragments of different length and representation levels and counting the number of occurrences of each chemical pattern found. A Partial Least Squares (PLS) regression model was built and optimized for a particular property using a leave-50%-out cross-validation calculation. The method is very robust and fast (about 5K of compounds per second).

LogP (octanol/water partition coefficient)

13K compounds from the PHYSPROP database were used to find a PLS-regression model. The best model was found with Q2=0.92 and Rmse=0.56.

LogS (water solubility)

5K compounds from the PHYSPROP database were used to find a PLS regression model. The best model was found with Q2=0.82 and Rmse=0.87.

Molecular Polar Surface Area (PSA) and Volume

PSA is defined as sum of surfaces of oxygens, nitrogens and attached hydrogens. 6K compounds from the WDI database were used to find a PLS regression model. The best model was found with Q2=0.99 and Rmse=1.56.

Drug-likeness score

Predicts an overall drug-likeness score using and Molsoft's chemical fingerprints. The training set for this mode consisted of:

  • 5K of marketed drugs from WDI (positives)
  • 10K of carefully selected non-drug compounds. (negatives)

Definitions:

  • Rmse - root-mean-square-error of cross-validated prediction
  • Q2 - cross-validated squared correlation coefficient of predictions vs. training values

Return to the molecular property prediction page

Copyright © 2017 Molsoft LLC.
All rights reserved.
Terms of Use
Privacy Policy