Copyright © 2005, Molsoft LLC Nov 17 2008
|
[ Search prosite | Search protein fragment | Binding site analysis | Search protein topology | Pdb sequence generation | Pdb merge | Search sstructure database | Search pdb headers ]
Use macro searchSeqProsite. For example: read pdb "2dhf" make sequence a_1.1 # sequence of a PDB structure show sequence find prosite 2dhf_a # 2dhf_a is the sequence of the proteinSee also find prosite, find pattern and read prosite.
First, make sure that you have a library of representative icm-objects. String variable s_qsearchDir should contain the relative path of this directory with respect to the s_dataDir directory. The library may be created and updated with the provided _mkQsearchLib script. Use qsearch or iqsearch macros. Load the object and type qsearch or iqsearch + arguments. You will be prompted for the forgotten arguments. To understand the meaning of the arguments, see the find pdb command. Examples:
read object s_icmhome+"crn"
call s_icmhome+"_qsearch"
# no graphics, just the list of solutions
# qsearch a_/2:6,14:18
# interactive
iqsearch a_1crn./2:6,14:18 "xxxxx------xxxxxx" "*" "*" .7
There are three algorithms (A, B, and C) with ICM which can identify pockets:
In the following example we find an almost closed pocket which can not be identified with icmCavityFinder . read pdb "1fm6" # read the 'a' chain of RXR delete a_!1,9 # keep the RXR and its ligand only make map potential a_1 Box( a_ 1. ) 1. # grid size 1.5 A make grob m_atoms exact 0.1 solid split g_atoms cool a_ display g_atoms2 reverseIf you have problems with identifying pockets, change the grid size, the threshold level for make grop m_atoms , or try to convert object to the ICM type (the conversion will add hydrogens and make the object more dense).
Use macro searchObjSegment, for example: read object s_icmhome+"crn" searchObjSegment a_1.1 30 3. # or read pdb "1pxt" delete a_!1 convert searchObjSegment a_1.1 24 6.You may need to adjust the seed fragment length and the RMSD parameters for a cleaner list. The database foldbank.seg is provided and may be recompiled, customized and updated by the supplied _mkSegmentLib script. See also segment, find segment, write segment, foldbank.seg, How to extract a diverse set of PDB entries How to compile a database of protein secondary structures and their folds .
The following script is a skeleton of the provided script _mkUniqPdbSeqs which is somewhat more automated.
l_commands=no
errorAction="none" # if something goes wrong do not
# interrupt the loop
s_pdbDir = "/data/pdb/" # make sure you have correct path
pdbDirStyle = 4 #
read sarray s_pdbDir+"/derived_data/index/source.idx"
# you need a list of all pdb-entries
# (4 char. code per line will do)
source = Tolower(Trim(Field(source,1)))
n=Nof(source)
for i=1,n
read pdb sequence resolution source[i]
# append resolution to the chain name (like 9lyz_a19)
endfor
group sequence "*" uniqSeqs unique 0.1 delete
# cutoff inter-sequence
# distance 0.1 (dissimilar by more than 10%)
#
# Other possibilities
#
# group sequence uniqSeqs unique 5 # if two seqs differ by more
# # than 5 mutations
# group sequence uniqSeqs unique # throw away only identical
# # sequences
#
write sequence s_inxDir + "/pdb1.seq"
# actual sequences for searches
write Name(uniqSeqs) "chainList"
# list of protein chains if you need it
quit
The simplest way to merge two pdb files is to read them as separate objects and the use the move a_1. a_2. command. Example: read pdb "1crn" read pdb "1d48" move a_2. a_1. # merges objects write pdb a_1. "both" # saves both files in pdb format write object a_1. # saves merged object in compact binary form Before or after merging, the objects can also be edited, translated to a new position, rename chains, change residue numbers etc. Example: read pdb "1d48" delete a_w* delete a_2 # delete the second chain read pdb "1crn" delete a_/33:99 # delete a C-term. part of crambin move a_1. a_2. # merge the remains write object a_ If you want to re-engineer a polypeptide chain of a protein, using two pdb-files, e.g. to transplant one part of a protein to another and restore the bonding connectivity, you may use the modify command: read pdb "1crn" # one pdb read pdb "1cbn" # similar protein modify a_1./20:25 a_2./20:25 # translants a loop from 2nd object to the 1st one write pdb a_1. "combo"
The following script uses the previously compiled list of unique pdb chains and creates two files: foldbank.db containing sequences, resolutions, the deposited and the automatically assigned secondary structures of the nonredundant set and foldbank.seg containing quantitative topology descriptions of the folds. The GAP (which stands for Gly-Ala-Pro) library allows to build only the backbones necessary for the secondary structure prediction algorithm and speeds up the PDB->ICM conversion. The foldbank.db is in the ICM database format, so that you can create an ICM table shell-object. This allows to sort entries and perform searches to create subsets.
l_commands =no
l_info =no
l_confirm =no
errorAction="none"
segMinLength =3
mncalls =300
s_icmhome ="./"
s_reslib ="icmGAP" # Gly-Ala-Pro residue library
read library
# ...getting the representative list of chains...
read sequences s_pdbDir+"/derived_data/pdb_seqres.txt"
#make sure to have _mkUniqPdbSeqs executed recently
li=Name(sequence)
delete sequences
#...you may modify the method or create your own list...
if (Error) quit
unix mv foldbank.db foldbank.db.OLD
unix mv foldbank.seg foldbank.seg.OLD
for i=1,Nof(li)
lii=Tolower(li[i])
read pdb lii[1:4]+"."+lii[6]+"/"
delete !Mol(a_*/A) # delete HET-molecules
convert
er=r_out
rz=Resolution(a_1.)
if(rz < 0.01)rz=9.99
sx=Sstructure(a_*)
assign sstructure
# uncomment the following line, if you'd like
# to save GAP objects. requires GAP subdirectory
# write object "GAP/"+lii[1:4]+lii[6]
sprintf "# %d\nNA %s.%s\nRZ %.2f\nER %.3f\nSE %s\nSX %s\nSS %s\n" \
i lii[1:4] lii[6] rz er String(Sequence(a_*)) sx Sstructure(a_*)
write append s_out "foldbank.db"
assign sstructure segment
rename a_2. lii[1:4] # restore the original pdb-name
write append segment "foldbank.seg"
delete a_*.
endfor
quit
There is an PDB.tab file which contains one line header descriptions of all the entries. Now you have three ways of doing it:
|
| Copyright© 1989-2004, Molsoft,LLC - All Rights Reserved. This document contains proprietary and confidential information of Molsoft, LLC. The content of this document may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission from Molsoft, LLC. |