The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 6826 drug entries including 1431 FDA-approved small molecule drugs, 133 FDA-approved biotech (protein/peptide) drugs, 83 nutraceuticals and 5211 experimental drugs. Additionally, 4435 non-redundant protein (i.e. drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. Each DrugCard entry contains more than 150 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Read more information: here
The most complete drug information (target, transporter, carrier, and enzyme information ) is provided in XML format. Chemical structures are provided separately in SDF format
The following example will demonstrate how to deal with such data in ICM.
- Read the XML data directly from the website
The command above will createread xml "http://www.drugbank.ca/system/downloads/current/drugbank.xml.zip" name="drugbank"collectionobject "drugbank". - Examine the content
This shows us that collection contains a single root node called "drugs"icm/def> Name( drugbank ) #>S string_array drugs - Going further gives the following:
Which means that drugbank["drugs","drug"] is anicm/def> Name( drugbank["drugs"] ) #>S string_array drug partners xmlns xmlns:xs xs:schemaLocation icm/def> Type( drugbank["drugs","drug"] ) array icm/def> Type( drugbank["drugs","partners"] ) collection icm/def> Name( drugbank["drugs","partners"] ) #>S string_array partner icm/def> Type( drugbank["drugs","partners","partner"] ) arrayarraywhere each entry contains the information about particular drug. In addition there is an anotherarraydrugbank["drugs","partners","partner"] which contains an additional information about targets. - Examine individual entries
The default output format for displayingdrugbank["drugs","drug"][1] drugbank["drugs","drug"][2] drugbank["drugs","partners","partner"][1] drugbank["drugs","partners","partner"][2]collectionis JSON which gives you nicely formated easy-to-read text. Looking at the output it's easy find the fields of interest.WARNING: do not try to show the entire array into the terminal window because it'll take very long and most likely you'll need to kill the window.
- Fetching individual fields
Let's create a table with a single column containing an array with drug cards.
add column drugs drugbank["drugbank","drug"]Hint: In GUI you can resize all simultaneously by holding 'CTRL' key which resizing an individual row.
The single field can be extracted by providing dot separated path to it. Note that fields which contain non-alphanumeric characters must be quoted.- A.drugbank - OK
- A.'drugbank-id' - must be quoted
# extracts drugbank-id into separate column add column drugs function="A.'drugbank-id'[1]['$']" name="drugbank_id" # extracts name into separate column add column drugs function="A.name" name="name" - Fetching multi-value fields
Multiple properties will be extracted as an array for each drug entry.
This way to extract multiple properties has one problem. For entries with only one property the result will be not array but rather individual value (E.g: Type(Type( drugs.partner_id[1] ). This will prevent from the unified access to the column in the future. In such cases it's recommended to use ':' operation instead of '.'. The result of this operation will always be an array (even for single entries).# display targets information for the second entry drugs.A[2]["targets","target"] # extract array of partner IDs for each drug into separate column add column drugs function= "A.targets.target.partner" name="partner_id" Type( drugs.partner_id[2] ) # arraydelete drugs.partner_id add column drugs function="A.targets.target:partner" name="partner_id" # will create an array for all entries. Type( drugs.partner_id[1] ) # array (even for single entries) - Querying XML fields
Let's say you want to extract a value of the property with name which start with "logP". It can be done similar to the ICM-table filtering operations. The only difference is that colon ':' (instead of dot) must be used to separate field name
The general filtering syntax:<field1>.<field2>:<queryField> <op> <value>The following operations are supported in array filtering: ==,!=,>,<,>=,<=,~,!~
Example:
Note that some entries contain text information ('0.61 [HANSCH,C ET AL. (1995)]') so the result column will not be automatically converted to# query and extract logP property add column drugs function="(A.'experimental-properties'.property:kind ~ '^logP').value[1]" name="logP"rarray. You can convert it explicitly:
The other example will extract Wikipedia links:# empty or 'bad' entries will be marked as 'ND' add column drugs Rarray( drugs.logP ) name="logPNum" delete drugs.logPadd column drugs \ function="(A.'external-links'.'external-link':resource == 'RxList')[1].url"\ name ="rxlist" - Joining with information from drugbank["drugs","partners","partner"]
For each drug entry we have list of partner IDs which refers to information from drugbank["drugs","partners","partner"]
array. Tojointhem we need to add this array to the other table and extract fields which will be used in join.
Finally we need to join drugs.partner_id with partners.id.# creates a table and put partner entries there. add column partners drugbank["drugs","partners","partner"] # extract ID column which will be used to join with drugs.partner_id add column partners function= "A.id" name="id" # extract uniprot-id from the "external-identifiers" array using query functions add column partners \ function= '(A."external-identifiers"."external-identifier":resource ~ "UniProtKB")."identifier"[1]' \ name = "uniprot_id"
Note that since drugs.partner_id contains multiple entries for each row the result drugs.uniprot_id will also contain multiple entries for each row. You can set special format withjoin drugs.partner_id partners.id column ="drugs.*,partners.uniprot_id" name="drugs"set formatcommand to execute a special action when particular uniprot entry is clicked.# load sequence set format drugs.uniprot_id \ "<!--icmscript name=\"1\"\nread sequence swiss \"http://www.uniprot.org/uniprot/%1.txt\"\n--><a href=#_>%1</a>" # or simply go to the website set format drugs.uniprot_id "<a href=http://www.expasy.org/uniprot/%1>%1</a>" - Joining with chemical structures
The final step would be to add a chemical structure information.
A little bit more rearrangements and your table is ready to be exported to SDF file.# read SDF from the website read table mol "http://www.drugbank.ca/system/downloads/current/structures/all.sdf.zip" name="drugs_chem" # join 'mol' column join drugs.drugbank_id drugs_chem.DRUGBANK_ID column="drugs.*,drugs_chem.mol" name="drugs"move drugs.mol 1 # move structure column to the first position delete drugs.A # delete drug-card information delete drugs.partner_id # delete partner id information write table mol drugs "mydrugs.sdf"
See also: collection, read xml