HELM Notation

 

The Hierarchical Editing Language for Macromolecules was designed to create a single notation that can encode the structure of complex biomolecules including diverse polymers, non-natural monomers and complex attachment points.

HELM was first conceived at Pfizer in the summer of 2008 to support the Pfizer oligonucleotide therapeutic unit and molecules were first registered into the Pfizer corporate database using HELM in December 2008.

HELM: A Hierarchical Notation Language for Complex Biomolecule Structure Representation Tianhong Zhang, Hongli Li, Hualin Xi, Robert V. Stanton, and Sergio H. RotsteinJournal of Chemical Information and Modeling 2012 52 (10), 2796-2806DOI: 10.1021/ci3001925

 

How it works

HELM contains multiple levels of information:

  • Monomers - the atom/bond representation of the building blocks

  • Simple polymers - a linear sequence of monomers of the same type

  • Complex polymers - combinations of simple polymers, hydrogen bond information and annotations


The flexibility allows you to define molecules like this which include a oligonucleotide connected to a small molecule (SMCC) connected to a peptide. 

There are 2 major releases of HELM.

HELM1

HELM 2

HELM1

HELM 2

Covers different types of macromolecule : DNA/RNA, PEPTIDE and CHEM polymer types

Allows non-natural monomers : Monomers are defined by the HELM author, so there is no limit on the monomers that can be included in your molecule.

Is portable

xHELM allows you to ‘bundle’ all the monomer information with your molecule definition into a single package that can be used to transfer information outside your organisation.

Adds the ability to define 

  • Unknown sections of or entire polymers

  • Repeating units

  • Annotations

  • Unknown connection points

  • Multiple possible connection points

  • Probabilities of different monomers or polymers

  • Mixtures



Specification

The HELM 2.04 specification is available below.

There are minor changes in the 2.04 update. Specifically:

  • Added links to the monomer JSON schema.

  • Added location of xml schema for xHELM.

  • Clarified mandatory use of ? for connection points with unknown monomers like *, X and N.

  • Clarified rules for the use of in-line HELM particularly around connection point definition.

  • Minor revisions to aid clarity.



HELM Test set

The team have compiled a set of around 150 structures that illustrate the full range of structures that can be encoded in HELM.This is included as a resource for anyone who wants to implement their own HELM tools rather than use the open source toolkit.

 

HELM Examples from ChEMBL

Roger Sayle kindly created this file of 1000 (inline) HELM strings from ChEMBL 21.It includes examples of multiple peptide chains, and nucleic acid sequences, but doesn't cover all of the corners of the HELM v2 (or v1.1) specification. It was created using the monomerlib2 monomer set.



Notepad++ highlighting definition file

If you create HELM manually it can be helpful to use an editor that will highlight different parts of the string.

HELM Syntax Highlighting2.xml

To use this language definition file go to [Language] -> [Define you language...]. Here you have to click on [Import...] and load the xml file. After closing the dialog, you have to restart Notepad++. Now you  can assign the language to you current open file via [Language] -> [HELM2].