It is easy to be preoccupied with the day-to-day tasks of a project, particularly when development work is progressing full pace. However, it is interesting to look more widely and consider the impact our work has had on the scientific community.
HELM was released in 2013, with a single user – Pfizer (who invented it), but were shortly followed by ChemAxon and a steady stream of organisations which represent a wide section of the informatics community. We have also gained recognition from regulators who endorsed HELM as an acceptable format in ISO 11238.
The list of HELM users is now very healthy, and we appreciate our enthusiastic and engaged community. Here are some of the groups who are using HELM.
Novartis makes extensive use of HELM for nucleotide registration and analysis. The open-source HELM tools are integrated with the internal informatics landscape.
Yohann Potier said, “HELM allows Novartis to accurately describe its chemically-modified constructs using an industry standard for registration.”
As the originators of the HELM standard, Pfizer has based their entire macromolecular registration infrastructure on HELM and its associated biomolecule toolkit.
Sergio Rotstein said, “While the enablement of biomolecular registration was already of great value to Pfizer, the establishment of HELM as an industry standard provided even greater value by facilitating cross-company interoperability and biomolecular data exchange, a very desirable outcome in our increasingly collaborative industry”
Starting with HELM Roche has developed the HELM Antibody Editor (HAbE) to enable especially the convenient handling of complex antibody in innovative formats for their analysis, visualization, manipulation and registration.
Most recent is the implementation of HELM2 at Roche to describe, register and manage therapeutic oligonucleotides and their derivates. This was facilitated by the improved monomer handling and support for ambiguous nucleotides within the HELM 2 toolkit.
Merck has been slowly adopting the HELM notation across our Discovery Chemistry Modalities organization focusing first on simple linear peptides and oligonucleotides. Using the Pistoia HELM editor for creation, editing and registration of monomers and chemical modifiers, our Modalities chemists can now work confidently with their monomers across multiple environments including our biopolymer registration system, our BioviaDraw platform and our tools within Insight for Excel. In 2018 we anticipate incorporating complex, macrocyclic biopolymers into the HELM supported workflows, peptide metabolite identification support and antibody-peptide conjugates. All of this facilitated by the easy to use tools leveraging HELM notation as a foundation.
Internal registration systems and tools are all based on HELM.
GSK is using of the Pistoia Alliance’s Hierarchical Editing Language for Macromolecules (HELM) notation to represent therapeutic large molecules in its bio-registration system, facilitated by the deployment of Dassault Systèmes BIOVIA’s Biological Registration solution. GSK scientists at sites around the world will use the system.
“There is a gap in the space that HELM now covers where there weren’t really any alternatives. It was desirable for GSK to be on a standard rather than create our own notation, and to partner with the Pistoia Alliance and other companies to develop that standard.” Leah O’Brien, Business Consultant, GSK.
We are grateful to all funders, including the above plus BMS.
Scientific software providers
ACD/Labs supports HELM across our portfolio, from the ACD/Labs Biosequence tool (ACD/ChemSketch) to our array of software for analytical chemistry ACD/Spectrus, and the Luminata CMC project decision support application, to the recently announced Katalyst D2D for high throughput and parallel experimentation. Notations for biomolecules ranging from small peptides to whole antibodies can be imported from HELM and xHELM files and exported in HELM file format. “HELM notation is the cornerstone of our expanded molecular characterization efforts, from small molecule to biologics”,- says CEO Daria Thorp.
Applications use Biovia’s proprietary SCSR (Self-Contained Sequence representation, an extension of the V3000 molfile) format, but there is extensive ability to import, export and convert to and from HELM. Pipeline Pilot Chemistry Collection contains importers and exporters, HELM readers and writers including XHELM, and components to interconvert between macromolecules represented by HELM, full chemistry and SCSR. HELM support is available in Insight, the Draw and Pipette sketchers, biological registration and the chemistry cartridge.
Certara’s D360 application for scientific informatics supports discovery research scientists through self-service access data access, integrated analysis and data visualization capabilities combined with collaboration tools to improve data driven decision making. D360 has been deployed in support of small molecule and research into other modalities such as antibodies, ADCs, oligonucleotides, and peptides. D360 supports the use of HELM notation for searching, filtering and rendering other modalities. More advanced functionality, such as sequence formatting, alignment and clustering based on HELM representation allows users to determine and exploit sequence-activity relationships.
Biomolecule Toolkit and the macromolecule sketcher BioEddie are natively supporting the HELM standard. The tools provide capabilities for managing a centralized monomer library, registering and performing uniqueness checks of macromolecules, generating a HELM notation from small molecule representations and sequences, and representing modalities with partially or fully unknown chemical structures.
Roland Knispel, Project Lead for Biologics Informatics at ChemAxon, said, “Our HELM-based tools are helping our customers to manage chemically modified sequence-based modalities. A single environment for various types of modalities, improved data quality and utilizing an industry-wide standard for data exchange are the key benefits reported back to us by our users. By market demand our platform is now being integrated into solutions provided by IDBS and other partners.”
Dotmatics has adopted the Pistoia Alliance’s HELM notation as part of its biologics discovery suite. Dotmatics’ biological registration system, Bioregister, integrates the HELM Web Editor to create and edit sequences in HELM notation. Bioregister also generates, stores, and reads/writes HELM notation. This allows users to work with their HELM-defined entities within the Dotmatics Suite and also to exchange HELM-format data with other HELM-compliant systems. Additionally, Dotmatics’ analysis and visualization application, Vortex, reads HELM notation and from this generates oligomeric representations, allowing advanced analytical techniques to be applied directly to HELM-represented entities.
IDBS leverages ChemAxon’s Biomolecule Toolkit and BioEddie in its E-WorkBook suite and therefore includes HELM support.
Paul Gouldson, Vice President Strategic Solutions said, “IDBS has been supporting open standards in EWB since its inception. We use and develop integrations to open source tools and have supported examples with AniML, HELM, ADF, SVG and HTML tools.”
Next Move software:
The HELM format is supported by Sugar&Splice both for reading and writing peptides and nucleic acids, thus enabling conversion of all-atom structures from SMILES (for example) through HELM and back to SMILES. This support includes inline HELM (allowing structural data to be roundtripped even when monomers are missing from the HELM database), xHELM and partial support for HELM 2.0 ambiguity codes. The NCBI uses Sugar&Splice to generate HELM strings for all biopolymer entries in PubChem.
RDKit includes the ability to convert DNA, RNA and peptides to and from HELM and a large number of other notations including: FASTA, PDBBlock and standard sequence notation.
PerkinElmer integrated the Pistoia Alliance HELM standard into ChemDraw®, enabling chemists and biologists to easily describe complex molecular structures, rapidly create biopolymeric structures, and share their information in an industry-standard, publication-ready format. “We look forward to continuing to work with HELM as a standard to better serve the research community by providing modern tools that foster collaboration and enable faster discoveries,” says Pierre Morieux, ChemDraw Global Marketing Manager at PerkinElmer.
Scilligence supports HELM monomer management and sketching in their wide range of informatics solutions including Scilligence ELN and RegMol/BioAssay. Scilligence is the developer of the HELM Web Editor which works to further standardize HELM notation and facilitate the exchange of information between researchers.
Being involved in the HELM tool development from the early hours, quattro research provides solutions for registration of biomolecules based on HELM notation. With a focus on antibodies and ADCs (antibody-drug-conjugate), we have developed the HELM Antibody Editor (HAbE) together with Roche. The xHELM format for data exchange, the ambiguity support of the HELM2 toolkit and a monomer service are additional Pistoia hosted projects developed and maintained by quattro research, in addition to our internal development and research. Many of these tools are now open source and hosted on GitHub, made available to all who are looking into adopting HELM as a standard
ChEMBL is one of the largest public drug discovery databases, containing information about approved drugs, clinical candidates and lead optimization data, including 1.7 million distinct compounds and more than 11,000 targets.
“A large proportion of new drugs are now biotherapeutics, but many could not be adequately represented by our traditional sequence or structure formats. HELM gives us a great solution, allowing us to accurately describe drugs such as modified peptides and antibodies” says Anna Gaulton, Senior Data Integration Officer for the EMBL-EBI Chemogenomics Team.
The ChEMBL database team have worked with the HELM project on methods to fragment peptide structures and define new monomers. These have been used to converted more than 20,000 natural and modified peptide structures to HELM and create a publicly-available library of more than 2800 peptide monomers.
PubChem is an open chemistry database at the National Institutes of Health (NIH), which provides information on chemical structures, identifiers, chemical and physical properties, biological activities, patents, health, safety, toxicity data, and many others to several million scientists worldwide.
Pubchem contains over 500,000 structures represented in the HELM notation. Many of these are complex, for example, only 65% of the HELM peptides are exclusively made up of amino acids.