Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date

Attendees

...

Apologies

Discussion items

Time
Item
Who
Notes

Action items

  •  
Output
17:00 - 19:00 (120 mins)
         

>Data sharing and standards - experiences with ChEMBL.

>Assay table.

>Introduction by Anna Gaulton and Anne Hersey (EMBL-EBI)

>Populate assay table:

-Agree a set of terms and definitions for the assays (based on the required pre-work)

-Agree common purpose for the assay (if applicable)

-Identify a set of common and uncommon assays (and any gaps)

>Share knowledge, experiences and tips with team.

>Correlate and establish common purpose for the assays identified

Minutes

Data sharing and standards - experiences with ChEMBL

View file
nameChEMBL_standards.pdf
height250

>ChEMBL: extract pharmacology data from scientific literature into one place to build tools and help drug discovery. Increasingly, data being deposited from various sources. See ‘other data sources’ and examples of SAR data from GSK and AstraZeneca.

>Reported data is very heterogeneous so have developed a process in ChEMBL to standardise data types. Helps with flagging erroneous or out of range data and duplicates.

>Cross-industry MIABE paper highlights what information to capture when recording bioactivity data: compounds, assays and activity measurements

>It is possible to retrospectively better annotate assays to better conduct detailed queries e.g. use ontologies to better annotate assay descriptors.

>Existing place to capture additional Assay Parameters and Activity Properties beyond standard fields – free format (not controlled vocabulary) which is later curated. New schema allows for more flexible data deposition.

>Important to define minimum information requirements early – harder to go back later, especially where data comes from scientific literature.

>Difficult to predict all possible use cases upfront so need to be pragmatic.

>Important to standardise data and have controlled vocabularies but this can be done retrospectively to some extent. Other databases at EBI require new terms to be created but this can slow down the process of data submission.

>More flexible model allows for more information to be captured easily but standardisation is harder. Capturing all fields of interest from the start is ideal but may not always be possible.

>There is some bioactivity data in ChEMBL on peptides but not on Abs. There is some information on approved Ab drugs and clinical candidates but no bioactivity data. In theory having PK data for Abs in ChEMBL could fit quite well with the current data model.

>Better to focus on capturing all data information from the start rather than spending too much time establishing a controlled vocabulary.

>Recommendation is to define preferred units upfront. Unit conversion can happen retrospectively depending on type of assay e.g. nM to mM can be easy to convert but molarities will depend on type of salt used, which may not be standard.


Assay table

Assays

>Units and paramenters important - when going to literature will need to see if our values agree with what we have

>DSC consumes more protein than DSF. DSC non-equivalent to DSF.

>aSEC only small bed volume columns used: 1.5-20mL. aSEC and SEC equivalent. Difference = sample stressed (SEC AS) or non-stressed (aSEC). Treat SEC AS as separate technique

>All assays, except DSF, stressed and unstressed

>Distinguish between properties of the molecule vs. Accelerated stability/storage

>Analytical size = readout for SEC AS typically (capturing degradation rate)

>Many assays will show results of which process was used to produce the molecule and in addition to the molecular properties themselves. Have we decided to compare across processes and focus only on trying to isolate impact of molecule? e.g. Hydrophobicity and Tm parameters (related to molecule), aggregates based on SEC (related to process)

>Suggestion is to focus on parameters/attributes which are less process dependent e.g. Glycan content by LCMS is process dependent.

>Include Chemical stability (most relevant) with LCMS.

>Aggregation readout is process independent once some purification has happened. Aggregation is molecule dependent so need to isolate process component out of it.

>DSF, aSEC and HIC assays are techniques to directly assess intrinsic properties of the molecule whereas other assays are techniques for quantifying changes or degradation of molecule upon stressing.

>Some confusion caused by trying to capture both in same table. Ways to distinguish:

2 tables: 1 for intrinsic parameters and 1 for changes upon stress or

1 table: Add column with measured parameter/question answered or

1 table: Under AS experiment have all techniques with potential readouts to inform on stability of molecule which is related to developability. Intrinsic properties are potentially related to developability.

1 table: sort by 'developability parameter' (as original table) but include AS under which many techniques will be included.

>PK is intrinsic property of the molecule.


Questions and Discussion points

>Is information captured enough to do cross-comparison?

>Which assays are used to measure stressed/unstressed conditions?

>Which assay readouts (measure of, units) are influenced by expression process?

>How important is it to isolate the process component out of the readout (e.g. For aggregation)? Suggestion is that we can still compare data irrespective of process used to produce molecule - we can also focus on comparing Early selection phase data only (which will be most abundant) in terms of ranking rather than absolute values. A build on this would be to define and capture the DD phase in which the data was generated and compare only data across the same phase.

>What are the risks and implications of the above?

>How do we want to deal with 'other' assays which are used for the same or similar purposes but where we cannot compare absolute values?