Facts Found Log

  1. It is possible to extract existing annotations out of PubChem and to upload updated/improved annotations back into PubChem. CDD did the upload process once. They may have documentation saved
  2. Both PubChem and ChEMBL in principle welcome improved annotations
  3. It is possible to add ontologies or ontology terms to the CDD BioAssay Express tool
  4. The online CDD BioAssay Express tool is not the same as the version stored in the public github
  5. Assay descriptions contained in the appropriate field in ChEMBL have variable depth and completeness. For that reason we cannot rely on them when we conduct QC of our own annotations. On the other hand, this also means that our annotations are likely to deliver more value.
  6. Papers cited in ChEMBL may not contain suitable assay descriptions; instead, one may have to chase multiple references at a potentially very high cost for access to papers
  7. There are multiple assay entries in PubChem that have verbose descriptions (in the “Description” field) that however contain plain text and have never been parsed. These assays represent the lowest-hanging fruit
  8. In the published academic papers there are errors in assay descriptions, that propagate between papers
  9. When assay panels are cited in peer-reviewed literature, links to vendor assay panels are often dead, because vendors go out of business or merge
  10. Primary assay is more interesting than secondary ones (because selectivity assays may have less rigor than the primary assay). Hence, focus on publications that report primary target assays, and for selectivity assays, consider first and foremost those that have multiple concentrations and not just one.
  11. The PubChem assays that were imported from ChEMBL have irrelevant text in the text box (The abstract from the paper does not describe the assay), making it hard to use NLP and models to make automated predictions.  This makes it hard to use ChEMBL sourced PubChem assays without pre-reading papers and pasting the assay descriptions in. Other PubChem assays tend to have the assay descriptions pasted into the PubChem assay description or protocol field instead of paper abstracts. These often have more than a few compound results uploaded as well. Examples: https://pubchem.ncbi.nlm.nih.gov/bioassay/624392 and https://pubchem.ncbi.nlm.nih.gov/bioassay/1117277#section=Description