|
|
Molsoft has developed new technology and proprietary algorithms for molecular
modeling with applications to protein and small molecule structure prediction,
docking and structure based drug design; molecular visualization and animation,
bioinformatics; cheminformatics; and intranet development.
Importance of protein homology modeling and structure prediction
With 200 bacterial genomes almost completed, human, mouse, yeast and other
genomes essentially completed, the main obstacle to rational drug design is
insufficient structural information in the Protein Data Bank (PDB). Only 0.1-1%
of proteins has their three dimensional structure determined and the growth
rates of sequence and structure entries dramatically different. PDB doubles
only every three years, while the sequence banks double in size every 17
months. In addition, each structure undergoes many essential big and small
rearrangements upon binding to other proteins or chemical substrates. While ab
initio protein structure prediction at a reasonable accuracy is still beyond
reach the good news is that partial structure prediction can already help to
answer numerous questions in biology and rational drug design. Homology
modeling and structure prediction technology starts from the 0.1-1% proteins
with known structures and builds usable structural models up for to 30-50% of
all proteins. The models can be used for decision support in drug discovery,
e.g. prioritizing targets by 'drugability', for docking and virtual ligand
screening, for directing chemistry in lead optimization, for directing protein
functional studies via mutagenesis, as search models for molecular replacement,
etc. The molecular environment is implemented as modules of the ICM program.
ICM stands for Internal Coordinate Mechanics and includes hundreds of
algorithms unified by a common scripting language and graphics-user interface.
Homology Modeling
For about 30% of all protein sequences a good structural model can be built,
and for another 20% a partial model can be built. Molsoft developed proprietory
technologies for
- template finding: sensitive sequence search (or threading) to
identify one or several structural templates for further homology modeling
using full alignments with zero-end-gaps (ZEGA) and empirical structural
statistical significance [Abagyan, Batalov J.Mol.Biol. 1997]
- accurate treading or sequence-structure alignment using the ICM alignSS
algorithm that optimizes the sequence-structure match using residue
accessibilities, secondary structures and functional sites of the template and
sequence plus predicted secondary structure of the query sequence.
- fast homology model building and database loop searches with the build model
function. This algorithm builds a full model with all the loops in seconds.
Each loop searched in a full PDB database and selected on the basis of its
interaction energy with the loop environment.
- loop prediction through local global optimization
- model refinement using ICM global optimization algorithm
- local reliability prediction To assign a reliability value to each residue in
the model we developed algorithms including statistical potential or full
residue energies after refinement, plus by the local properties of the
alignments.
The ICM homology modeling algorithms have been successfully used in modeling
competitions [e.g. car95, hom97 ], benchmarks [ ras97 ], and in many research
projects [ sch01, nor01, tom00, sch00, kel00, gan00, car98, pat98,
sri98, yud97, yui97, mat97, etc.]
Global energy optimization
The core technology used in most of our structure prediction algorithms is
global free energy optimization in a subset of internal coordinates that
describes inter or inter-molecular geometry. For structure prediction and large
scale conformational sampling ICM employs a family of new global optimization
techniques such as: Biased Probability Monte Carlo ( Abagyan and Totrov, 1994
), pseudo-Brownian docking algorithm ( Abagyan et al., 1994 ) and local
deformation loop movements (Abagyan and Mazur, 1989 ).
Receptor structure based prioritization of protein targets
The icmPocketFinder procedure identifies the substrate binding pockets in 98%
of all the cases (tested on over 10,000 pockets). This procedure is based on
calculating the drug-binding density field and contouring it at a certain
level. In 2001 [ tar01 ] we published a fast procedure for accurate electrostatic
calculation using the boundary element algorithm . A combination of
"pocket-density" with other physical properties such as electrostatic
potential, hydrophobicity, hydrogen bonds is used to evaluate if a particular
protein target or protein-protein interface is "drugable" and prioritize the
targets. We developed a special procedure to improve the pocket models by
co-optimization of flexible pockets with some of the know ligands.
Accurate fully flexible compound docking to receptor pockets
We developed a fast and accurate algorithm for docking a continuously flexible
ligand in represented to a receptor pocket. In a benchmark study on 11
different receptors, the ICM flexible docking algorithm correctly docked 93% of
all ligand receptor pairs! There are two versions of the algorithm: with
receptor represented by a series of grid potentials, and with both ligand and
receptor represented as flexible explicit molecules. The ICM docking has been
used extensively in many research projects and drug design projects.
Virtual ligand docking and screening of millions of compounds
A particularly fast implementation of the flexible docking algorithm is used to
screen millions of compounds from vendor databases or in-house libraries. Our
technology allows to index and convert to 3D any chemical database in .sdf,
.mol or mol2 formats, then dock all the molecules and score them by estimated
binding affinity. The main purpose of this procedure is to separate binders and
non-binders and eliminate at least 99% of compounds which do not fit the pocket
and do not need to be experimentally tested. We have several different scoring
functions including a score based on the potential of mean force. The consensus
scoring reduces the number of false positives. The Molsoft-ICM docking and
virtual ligand screening was tested in benchmarks, competitions and, most
importantly, in several experimental lead discovery projects, including
discovery of novel RAR agonists [ sch01 ], antagonists [ sch00 ], RNA binders
[ fil02 ], FGFR tyrosine kinase inhibitors, Thyroid hormone receptor antagonists,
and PTB1B inhibitors.
Global optimization of compound geometries
In addition to an internal coordinate force field, Molsoft-ICM platform allows
to perform global optimization and analysis of small molecule geometries by
performing free geometry optimization in Cartesian space using the MMFF94 force
field including fully automated atom type assignments. The conformational
generation procedure accumulates a non-redundant set of representative
molecular geometries.
Molsoft-ICM scripting language and molecular environment
Molsoft has developed more than several focused applications, we designed and
developed the whole computational environment for bioinformatics,
cheminformatics, protein modeling, protein design, docking and screening. The
environment is tied together by a common scripting language for molecules,
numbers, strings, vectors, matrices, tables, sequences, alignments, profiles
and maps This environment covers molecular graphics and production of molecular
animations.
|