Page Comparison

Date

04 Sep 2018

Attendees

Apologies

Discussion items

Time	Item	Who	Output
17:00 - 18:00 (60 mins)	>Proposed sharing model and Internal stakeholder package	>All	>Agree framework for sharing model

Minutes

Proposed sharing model and stakeholder package

>Team on board with data sharing model presented (see presentation: https://drive.google.com/open?id=15UgGvzJwIRaY5RqT_njHn6P9qb127XEl)

>Sharing of sequences, 3D structures as well as sequence and structure descriptors are in scope and critical to the success of the initiative. At this stage, it is difficult to state with confidence the importance of descriptors in building predictive models.

>Discussion on sharing sequence data:

-Sharing sequence information is seen as one of the main challenges and may happen on a case-by-case basis for each project or molecule. For some organisations sequences may not be shared together with Tier 2 ('dead' projects) and Tier 3 ('live' projects) data.

-We will need to determine whether sequences from early screening campaigns would make the cut (Include this as a question to stakeholders)

-Security (encryption) of sequence data may become central to sharing sequences in the context of Tier 2 and Tier 3 data. One possibility may be to include sequence information which has been 'abstracted' to a certain degree using PCA.

-Sequence (and structure) descriptors can be used in place of sequences (and structures) with associated Tier 2 and Tier 3 data but we will need to agree a common set of sequence and structure descriptors to ensure comparability. Examples will be provided as part of the proposal.

>Sense is that, at best, this project will deliver a set of predictive models which will help infer information on new candidates. It is unlikely that these predictive models can be used in other ways, e.g. new library design.

>Internal stakeholder package needs to include a ballpark figure of what the costs will be (at least for 2018-2019).

Actions

All organisation representatives - provide approximate number of molecules for each Tier (irrespective of whether information on these is likely to be shared).

Richard - redraw 'Tier' slide providing more detail around which data is included in each Tier and which activities and objectives are associated with each tier. Bottom tier should include any published information e.g. developability index

Bryan - provide structure descriptor examples.

Abhi - provide sequence descriptor examples and how PCA could be used to 'abstract' sequences.

Carmen - investigate ballpark figure for database build and sustainability plan from e.g. Lhasa.

All - contribute to assay table on wiki, internal stakeholder document and presentation (DOWNLOAD CURRENT VERSION AND MODIFY

Versions Compared

Old Version 1

New Version 2

Key