Statistical model based on the analysis of frequencies of ECFP-4 fragments in available chemical libraries from different vendors.
The Synthesizability score is derived as follows Each ECFP-4 fragments with frequency less than 10000 gives P[i] = 4-Log10(frequency) The total penalty is calculated as Penalty = Sqrt( Sum( P[i]*P[i] ) ) The penalty is transformed to score as Score = 3./(3.+Penalty)
- Training set: ~20M compounds available from vendors
- Descriptors: ECFP4