How to predict 3D structure of a peptide from its sequence

In the following script you are going to search for the lowest energy conformation using the Biased Probability Monte Carlo procedure to generate new conformations and full-atom energy plus solvation electrostatics, surface and entropy contributions. Start 3 or more independent simulations and let them run to convergence. Two features are indicative of convergence: the plot of the best energy achieved should be flat for sufficiently long (store the output in f1.ou and run the following macro:

plotBestEnergies "f1" 100. "append display"

); and the lowest energy conformation in different simulations are close, e.g.

# peptide "pep.se" ; runs: "f1" and "f2" 
  build "pep" 
  display 
  read conf "f1" 0 
  show stack 
  read conf "f2" 0 
  show stack

Watching trajectory files f1.mov and f2.mov may also be useful. (See also How to evaluate helicity of a peptide from the BPMC simulation and How to calculate an ensemble average). Now, the script:

# Example folding script. Use as directed. 
 read libraries 
 build "pep16"       # your peptide sequence is in pep16.se file. 
 rename a_*. "f2"    # specifies current name.  
                     # Several runs (f2,f3, etc.) are recommended 
 nvar = Nof( v_//* ) # number of variables 
 
 nProc=4             # if you are using parallel version.  
 
 mncallsMC    = nvar*50000  # maximal number of energy evaluations 
 mncalls      = 170+nvar*3  # maximal n_of minimization calls after 
                            # each random change 
 temperature  = 600   # optimal temperature for the simulation 
 tolGrad      = 0.01  # exit minimization when gradient is < 0.01 
 mcBell       = 1.0   # the default width of the MC probability distributions 
 mnconf       = 40    # maximal n_of low-energy conformations saved  
                      # in the stack (f2.cnf file) 
 mnvisits     = 25    # if stuck for >= 25 times, push it out 
 mnreject     = 10     
 mnhighEnergy = 30     
 l_bpmc       = yes   # use biased probability  
 electroMethod = "MIMEL" 
 surfaceMethod = "constant tension" 
 set terms "vw,14,hb,el,to,sf,en" 
                      # ECEPP/2 energy + solvation + entropy (see icm.hdt file) 
 
 fix v_//?vt*         # exclude irrelevant virtual variables specifying  
                      # absolute molecular position 
 set vrestraint a_/*  # load preferred backbone and side-chain angle zones 
                      # for the  biased probability MC 
 randomize v_//!omg 180.0  # create random starting conformation 
 vicinity = 15.0       
 compare v_//phi,psi  # use these variables to compare structure 
 montecarlo trajectory # run it and record a trajectory file.  
                      # watch the movie later by:  
                      # read trajectory "f2"; display ribbon 
                      # display trajectory "f2" 4. 8. 
                      # analyze the best conf. in the stack by: 
                      # build "pep16"; read stack; show stack all 
                      # load conf 1 
 quit

How to perform local flexible docking of two protein molecules using the grid potentials

This is a so called "local docking procedure" which docks all orientations of the protein ligand to a certain orientation of the protein receptor. The "global docking procedure" is somewhat different.
You may follow the menu items in Docking.Protein-protein or run the docking scripts directly. To illustrate the principal commands and functions we will also consider a series of shell commands to perform a docking procedure. We will use the following steps from the shell to dock the proteins chymotrypsin (5cha) and APPI (1aap). The real structure of the complex is known (1ca0), which can help us to test the validity of the method. This procedure has been recently tested in a dataset of 24 known protein-protein complexes ( Fernández-Recio,Totrov,Abagyan, 2002)
The procedure includes the following steps:

Creating two ICM objects for both proteins with the convertObject macro
Specify project parameters in a special table
Orient molecules, choose the docking box and make potentials.
Dock the protein ligand into the potentials.
Refine the solutions.

How to perform an explicit flexible docking of two simplified protein molecules

This procedure is relatively old and was used previously to explicitly dock two proteins starting from simplified objects. The best solutions are refined in all-atom representation. Currently we prefer docking into grid (see above).

Create ICM-objects of the two proteins you want to dock.
Use macro makeSimpleDockObj to create two simplified objects.
Combine two simplified objects into one and prepare it for docking simulation using _makeComplex script. During the execution you will be prompted for orientation of the first molecule, which should face the second one with the expected epitope.
Run the docking simulation using _dock2mol script. To insure the completeness of the search, run 3-4 simulations in parallel and compare the resulting stacks, the top 5-7 conformations should be the same except for 1-2. Combine the stacks using "read stack append" command with subsequent filtering by
```
vicinity = 4. 
 compare static a_2//ca 
 compress stack
```
Prepare .var files with optimized surface sidechain conformations for individual proteins by running _surfSideChainOpt script.
Run _makeFullAtom script to create full atom models from the simplified conformations accumulated in the stack.
Run _refineComplex script on each of the full atom models
Complex with the best energy after the optimization is (hopefully) the answer.

How to build a model by homology

[ Faq residue table ]

Have an alignment and a pdb file with the template handy, say "sx.ali" "x.brk". If you have a homology module key you can use the build model command and refine the model with the refineModel macro. The build model command builds a complete model and searches for matching loops in all pdb files. You can run the build model command from the GUI interface ( menu Homology ) AlignSS is a good shell function to make a sequence-structure alignment. It incorporates solvent accessibility and secondary structure into the alignment procedure. Alternatively, allow the build model command to perform the alignment on the fly.
In the absence of the Homology module, use the following macros/scripts:

homodel macro: fast interactive model building.
_homFast for fast model building (substitute nonidentical side-chains, assign the most likely rotamer).
_homModel for more rigorous model building for one polypeptide chain: side-chains are optimally placed loops are automatically recognized and simulated.
_homMult the same as the above script, but for a multichain protein molecule, e.g. an immunoglobulin molecule. Requires a set of separate files for each alignment.

How to create a table with the residue properties?

Sometimes you like to turn a show a_/* command for residue selections into a proper table. To create an ICM table with this one needs to create columns separately and add them as columns to a table. For example if we have a residue selection res with n residues:

read pdb "1crn"
align number # to have numbers from 1 to n
show surface area mute # compute surface areas
res = a_/10:20 # residue range of interest
n = Nof(res)  # the number of residues.
add column t Sarray(n, Name(Obj(res))[1]),Trim(Label(res),all),Area(res),Area(res)/Area(res type)

The last column is the relative residue accessibility.

The add column command will create a table with four columns, the last being the relative residue accessibility.

#>T t
#>-A-----------B-----------C-----------D----------
   1mui        T12         85.264893   0.560953
   1mui        I13         2.073181    0.010687
   1mui        K14         102.661064  0.479725
   1mui        I15         2.916692    0.015034
   1mui        G16         44.870205   0.50416
   1mui        G17         66.557358   0.747835
   1mui        Q18         67.372437   0.354592
   1mui        L19         141.619446  0.71525
   1mui        K20         49.295151   0.230351

A number of other properties which can be calculated for residue selections can be added to this table, e.g.

Then you can also append rows or other tables from different pdbs to the same table tt with another pdb with this:

add t tt # will append rows of tt to column t

Ligand overlays

[ Faq multiple chem overlay ]

How to superimpose multiple compounds with similar activity?

Use the chemSuperBG macro that is designed to take one molecule as a template and flexibly overlay in an optimal way other chemicals from a chemical table to the template.

Method.The chemical table can contain 0D, 2D or 3D representation of a compound. The compounds will be optimally superimposed to one or several templates. The flex-overlay tool will convert them on the fly to a flexible 3D form, optimize and dock it to the average property representation of the template compounds.

From the template 3D seven grids ( m_g1,m_g2,.. ) will be generated for different atom types. These grids will use the Gaussian expansion of the properties and will be averaged for the superimposed molecules. Each ligand will globally optimize both its internal energy and the grid-map fit.

The result will be saved as a 3D .sdf file.

From ICM command line the syntax is the following:

 chemSuperBG ms_template(s)> <chem_table r_effort l_Sample_Rings

From the operating system you need to run the _chemSuper script with the following arguments:

 _chemSuper templates.mol chem_table.sdf superimposed_output.sdf [effort=1.] [-r]

The -r option means that the rings will be considered as flexible.

From GUI: Select rows in your chemical table, click on the mol-column and select Chemistry/Chemical Template Superposition

Frequently asked questions on cheminformatics and compound property prediction.

[ Faq molcart query | Faq mac gui preferences | Faq molcart dump | Diverse subset ]

How to read compounds one by one from a Molcart table?

This script will read each molecule one by one and convert them to 3D

connect molcart "myhost"//"user"//"pass"//"dbase" 
# use Name(molcart connect) to see if you are connected.
for i=1,Nof("asgsynth")  # name of a vendor table
  query molcart "select * from asgsynth where molid="+i name="t"
  if Nof(t)!=1 | Smiles(Parray(t.mol[1] mol))[1] == "" continue
  read mol input=t.mol[1]
  convert2Dto3D a_ yes yes no no  # or anything else. this is a macro
  delete a_*. # clean up
endfor

How to restore font sizes and other GUI defaults on a Mac

Situation: you are stuck with large font size in ICM workspace or other bad GUI preference and can not restore the defaults:

Solution:

quit ICM

run this Mac command in Terminal:

open /Library/Preferences/com.molsoft.plist

If it gives you a GUI window with access to the ICM confirugation variables, the bad parameters and change it
If the parameter list file is shown as text, and you can not really edit or change it, just delete it from your Mac directory.
```
rm /Library/Preferences/com.molsoft.plist
```

How to export/dump a molcart table to an sdf file?

Read the molcart page for a general set ICM-molcart commands. Follow these steps:

connect to Molcart (usually you are automatically connected, if not, in GUI click on Molcart and enter host user pass table
find the table of interest (you will see it in the interface)

write molcart table="asgsynth" "tmp2.sdf"

See: molcart

How to select a diverse subset from a chemical table?

Use the make tree command. The group command will then select unique molecules and one will be able to add the columns needed in the cntrs table. Example:

read table mol "drugs.sdf" name="t"
make tree full t matrix column={"mol"} split="cl"
K = 100 # select 100 centers
I_out = Split( t.cluster K )  # split into 1K clusters
I_out = Index( t.cluster center r_out )   # uses r_out from the above
# I_out contains K of centroid indexes
t1_K = t[I_out]   # your subset