Infrastructure, training and culture for clinical FAIR data

1 Infrastructure
2 Education and training
3 Data-centric culture and the FAIR mindset

Infrastructure

Effective FAIR data management requires sufficient investment in building a suitable infrastructure and support team to match the size and ambition of the business organisation. Numerous examples of such FAIR data enabled infrastructure implemented by major pharmaceutical companies have emerged in recent years.

The EDISON platform and FAIRification of clinical trial data at Roche

Roche has built the EDISON platform to enable prospective FAIRification of data at the point of entry to Roche, by harmonising, automating and integrating very heterogeneous & complex processes across multiple departments, building in data standards and quality checks for data models in clinical and non-clinical data. The EDISON platform is built as an ecosystem of self-contained micro services to ensure maximum performances, scalability and low maintenance. The current scope of EDISON is clinical non-CRF data but the platform is scalable and flexible to cover a large variety of data models, both clinical and non-clinical. More details about the EDISON platform can be found described as a use case in the Pistoia Alliance Toolkit:https://fairtoolkit.pistoiaalliance.org/use-cases/prospective-fairification-of-data-on-the-edison-platform-roche/

Roche launched a series of clinical use cases in 2017 with smaller data sets to answer specific scientific questions prioritised by a scientific steering committee. This approach led to the identification of issues and challenges associated with FAIRfying legacy data. In addition, it resulted in a deeper understanding of what is needed to improve the EDISON platform for processing clinical trial data. More details about the FAIRification of clinical trial data can be found described as a use case in the Pistoia Alliance Toolkit: https://fairtoolkit.pistoiaalliance.org/use-cases/fairification-of-clinical-trial-data-roche/

Corporate Linked Data (COLID) at Bayer (COLID Tech Docs )

Bayer have developed and open sourced a platform named Corporate Linked Data (COLID). This is a technical solution designed for corporate environments that provides a metadata repository for corporate assets based upon semantic models. COLID assigns URIs as persistent and global unique identifiers to any resource. The incorporated network proxy ensures that these URIs are resolvable and can be used to directly access those assets. By following the Linked Data principles consequently, the data model of COLID uses RDF and provides the content through a SPARQL endpoint to consumers. This model was developed based on learnings from open standards like dcat and prov-o. Being both a management system for resolvable identifiers and an asset catalogue, COLID is the core service to realise Linked Data in corporate environments and therefore an essential cornerstone for implementation of FAIR data management. The documentation that describes COLID in comprehensive detail can be found on GitHub: COLID Tech Docs which includes a quick start section.

Identifier policy at Astrazeneca

AstraZeneca has implemented a Uniform Resource Identifiers (URI) policy to construct a FAIR infrastructure across the global enterprise. This important corporate policy describes how URIs need to be constructed to facilitate cross-enterprise Findability, Interoperability and Reuse of digital objects. Significant adoption benefits occur in information domains where it is necessary to utilise data across multiple sources and where it is possible that there is no control over the information architecture within these sources. The business areas taking advantage of this approach include clinical studies, translational medicine and competitive intelligence. Further details about this identifier policy are detailed as a use case in the Pistoia Alliance Toolkit: https://fairtoolkit.pistoiaalliance.org/use-cases/adoption-and-impact-of-an-identifier-policy-astrazeneca/

Education and training

Education and training are critically important for successful implementation of FAIR data management for clinical and non-clinical data, including the setting of the pharmaceutical and biotechnology enterprise. Educational and training material on FAIR data management is readily available in the scientific literature and as relevant web resources (examples).

An important aspect of education about FAIR implementation is understanding the process of data stewardship. This is a coordinated set of activities that ensure that data and associated metadata are made sufficiently FAIR in a sustainable manner. This means that FAIR data management is an ongoing and iterative process that is continually refined and tailored to suit the needs of a particular business application or use case as described in the Data stewardship handbook (HANDS): https://www.health-ri.nl/data-stewardship-handbook-hands . Although HANDS was written primarily for academic institutions, most aspects can be transferred to meet the needs of any industrial organisation. More details on the process of data stewardship can be found in the Pistoia Alliance FAIR Toolkit: https://fairtoolkit.pistoiaalliance.org/methods/data-stewardship/

A highly effective approach for training on FAIR data management is the “Bring Your Own Data” (BYOD) or datathon workshop. These typically take place over two or three days so that participants can learn the practicalities of how to improve the FAIRness of their data and the benefits gained by making the data FAIR. It is a lightweight, very effective and enjoyable way to collaborate across scientific teams in multinational organisations. The BYOD or datathon method summarised here is based on that described in detail at the Dutch Techcentre for Life Sciences (DTL).

Data owners, domain experts (usually biologists or chemists), and FAIR data experts jointly work on specific data sets at a workshop. At the start, data owners present the data they wish to make FAIR. The data experts have extensive knowledge about FAIR data formats and principles, and support the data owners in choosing the optimal profile for making the data FAIR. In addition, they make sure that FAIR linked data is produced in the end. Domain experts can assist the data owners and data experts to solve intellectually challenging data modelling issues and to demonstrate the added value of FAIR data in answering specific research questions. More information on BYOD and datathon workshops can be found in the Pistoia Alliance FAIR Toolkit: https://fairtoolkit.pistoiaalliance.org/methods/byod-datathon-workshops/

Data-centric culture and the FAIR mindset

The trend emerging during recent years is to build a data-centric culture powered by FAIR data, which is seen by stakeholders as a valuable corporate asset. This contrasts with the traditional application-centric approach, to regard data as a secondary commodity. This contributes to limited reusability of valuable clinical data, typically buried in data silos or graveyards.

The introduction section of this guide has described how implementation of the FAIR principles results in realisation of much more value from clinical data over a much longer period because it is much more likely to be reusable (figure 1). This shift in recognition and realisation of much greater value for data and metadata at the centre of a business organisation, defines a data-centric culture.

To support this seismic shift to a data-centric culture, the most progressive and successful pharmaceutical and biotechnology companies now recognise the importance of FAIR implementation as a change in data management culture for data. This journey is long and arduous and thus can be best achieved in an iterative manner.

However, it must be said that although making data FAIR is important, this is not an end in itself, but rather a powerful enabler. It supports digital transformation to increase business productivity and it is vital for success with data-hungry technologies such as machine learning and artificial intelligence.