Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

if you are interested please get in touch via info@OpenHELM.org.


HELM Grammar 


HELM currently does not have a well defined grammar.  While the syntax is defined and grammar is implied, the grammar should be specified in a grammar format file such as the .g4 (LISP) file defined by the Antlr open source tool.  This would allow for parser-generators to be created for other languages such as python, C# and Javascript. It would also put structure and documentation on the HELM syntax as it evolves over time.

Fragmentation

  

Outcome should be to add HELM to the following grammar library: https://github.com/antlr/grammars-v4


Fragmentation


HELM represents biomolecules as polymeric structures, which are composed of monomeric building blocks. Typically, these are collected in a list or database which contains the full chemical graph of each building block alongside with additional data specifying its context in a biomolecule. Such a monomer dictionary is the very foundation of representing biomolecules at an atomic level.

...

-          Automate the entire process of establishing the initial monomer dictionary and file format conversion as much as possible

 

Canonicalization

 


Specific purposes, e.g registration of biomolecules, require the ability to identify and filter for distinct biomolecules to create a database without redundancies. Typically a single biomolecule can be expressed with more than just one HELM representation. Based on the principles HELM applies to represent a biomolecule, the challenge of generating a canonical representation can be broken down into three main objectives:

...

-          Canonicalization of the chemical structures within the monomer dictionary: The chemical representation of all monomers in the monomer dictionary must be unique in their given context, i.e. alternative structural tautomers, representations of aromatic rings, ionic forms of certain functional groups and other features considered to yield chemically equivalent structures must not be kept as separate entries in the monomer dictionary. 


There is some cannonicalisation functionality in the HELM toolkit, but this is limited in scope. 


...