The Hierarchical Editing Language for Macromolecules was designed to create a single notation that can encode the structure of complex biomolecules including diverse polymers, non-natural monomers and complex attachment points.
HELM was first conceived at Pfizer in the summer of 2008 to support the Pfizer oligonucleotide therapeutic unit and molecules were first registered into the Pfizer corporate database using HELM in December 2008.
How it works
HELM contains multiple levels of information:
Monomers - the atom/bond representation of the building blocks
Simple polymers - a linear sequence of monomers of the same type
Complex polymers - combinations of simple polymers, hydrogen bond information and annotations
The flexibility allows you to define molecules like this which include a oligonucleotide connected to a small molecule (SMCC) connected to a peptide.
There are 2 major releases of HELM.
Covers different types of macromolecule : DNA/RNA, PEPTIDE and CHEM polymer types
Allows non-natural monomers : Monomers are defined by the HELM author, so there is no limit on the monomers that can be included in your molecule.
xHELM allows you to ‘bundle’ all the monomer information with your molecule definition into a single package that can be used to transfer information outside your organisation.
Adds the ability to define
The HELM 2.04 specification is available below.
There are minor changes in the 2.04 update. Specifically:
Added links to the monomer JSON schema.
Added location of xml schema for xHELM.
Clarified mandatory use of ? for connection points with unknown monomers like *, X and N.
Clarified rules for the use of in-line HELM particularly around connection point definition.
Minor revisions to aid clarity.
HELM Test set
The team have compiled a set of around 150 structures that illustrate the full range of structures that can be encoded in HELM.This is included as a resource for anyone who wants to implement their own HELM tools rather than use the open source toolkit.
HELM Examples from ChEMBL
Roger Sayle kindly created this file of 1000 (inline) HELM strings from ChEMBL 21.It includes examples of multiple peptide chains, and nucleic acid sequences, but doesn't cover all of the corners of the HELM v2 (or v1.1) specification. It was created using the monomerlib2 monomer set.
Notepad++ highlighting definition file
If you create HELM manually it can be helpful to use an editor that will highlight different parts of the string.
To use this language definition file go to [Language] -> [Define you language...]. Here you have to click on [Import...] and load the xml file. After closing the dialog, you have to restart Notepad++. Now you can assign the language to you current open file via [Language] -> [HELM2].