ICM Manual v.3.8
by Ruben Abagyan,Eugene Raush and Max Totrov
Copyright © 2018, Molsoft LLC
Jan 4 2018

Reference Guide
 ICM options
 Alignment Editor
  Regexp syntax
 Cgi programming with icm
 Xml drugbank example
 Tree cluster
 Flow control
 Energy Terms
 Gui programming
 Icm shell functions
Command Line User's Guide
PrevICM Language Reference
Regular expressions (regexp)

[ Regexp syntax ]

Functions supporting regular expressions:

See regexp syntax .

ICM regular expression syntax

[ Simple expressions | Shortcuts | Regexp back references | Greedy matching ]

Simple expressions

  • . any character except new line ( to match anything, say (.|\n) or use (?n) in the beginning of the expression )
  • ^ the beginning of the line
  • $ the end of the line
  • [abc] any character from the list
  • [^abc] any character NOT in the list
  • [a-z] a range, e.g. [0-9] or [0-9A-Z]
  • \c backslash suppresses special meaning of a character
  • \\ backslash itself
  • (string) enclose a simple expression in parentheses to write repetitions, back-references, or field=number expressions in the Split, Match and Replace functions.

Inline modifiers of regular expressions:

  • (?i) ignore case until the end of the same enclosing group, e. g. 'aBc' ~ '(?i)abc', 'a((?i)bc)d' matches 'aBCd','abcd','aBcd', but not 'Abcd' or 'abcD'
  • (?-i) match case-sensitive until the end of the same enclosing group, e. g. 'a(?i)bc(?-i)d' matches 'aBCd', but not 'Abcd' or 'abcD',
  • (?n) begin matching newline character with dot '.': "1bc\nd2" ~ '(?n)1.*2'


  • \d matches a digit ( '[0-9]' ). '\d+' matches one or more digits.
  • \D matches a NON-digit. '\D+' matches space between numbers
  • \w matches a character in a word ( [a-zA-Z_] ). '\w+' matches a word
  • \W matches a NON-word character. '\W+' matches the interword space
  • \s matches a whitespace character, or a separator ( [ \r\t\n\f] )
  • \S matches a non-separator symbol
  • \b matches a word boundary, i. e. a boundary between \w and \W symbols, for example, '\bedgeh\b' matches inside 'the edge' and does not match inside 'the hedge'

Repetitions and back-references

( a and b are simple regular expressions, e.g. a DNA base [ACTG], or ([hp]anky.*) ):

  • a? - nothing or a single occurrence of a
  • a* - nothing or any number of repetitions of a
  • a+ - matches a at least once or more
  • a{n,m} - matches a from n to m times
  • a|b - matches a or b
  • ab - matches a and b
  • (a)\1 - \1 is a back-reference: matches a, then matches exactly the same string. Back-references can go from \1 to \9.

A problem with the posix repetitions

Imagine that you want to match text between two tags, e.g. <i>one</i> in a text which has two items of the same kind ( <i>one</i> and <i>two</i> ). Unfortunately, we can not just use <i>.*</i> to match <i>one</i> since the POSIX standard tries to match the MAXIMAL LENGTH expression between the italic tags (shown in bold are the flanking expressions: <i>one</i> and <i>two</i>).
A straight-forward solution of this problem is to make a more complex definition of the word between the tags, by saying that the 'italized' word should not contain the '<' symbol.

ICM followed Perl in using the question mark (?) after the repetition symbol to enforce the minimal match. The minimal match expressions will look like this (a is a simple regular expression, like a character or a string in parentheses ):

  • a?? - nothing or a single occurrence of minimal occurrence of a
  • a*? - nothing or any number of repetitions of minimal occurrence of a (e.g. Match(s,'tag(.*?)endtag':n))
  • a+? - matches a at least once or more
  • '<i>.*</i>' - matches the entire 'one</i> and <i>two'
  • '<i>[^<]*</i>' - explicitly prohibits the tag inside. matches only the first word
  • '<i>.*?</i>' - the '*?' expression enforces the smallest match

Cgi programming with icm

Copyright© 1989-2018, Molsoft,LLC - All Rights Reserved. Copyright© 1989-2018, Molsoft,LLC - All Rights Reserved. This document contains proprietary and confidential information of Molsoft, LLC. The content of this document may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission from Molsoft, LLC.