It is not possible to use AI and ML technologies without large volumes of high-quality data. While many projects to produce disease-specific data sets exist both in public and in private R&D, the same is not true for the healthy controls. Typically control data are only gathered in conjunction with the data gathering for the disease cases. The absolute number of healthy control cases are low and the coverage (by organ, tissue, and assay technology) may be insufficient. Quality of such control samples may also be in question, for instance, labeling tumor margin tissue as a healthy control. The reason for this situation is lack of incentive to invest in healthy control samples in the absence of investment in the disease-specific R&D. On the other hand, similarities between pharmaceutical research programs may result in duplication of effort for generation of healthy control data.
We propose a system where project participants may submit their assay results for healthy control samples, that would be usually kept proprietary, to a data bank, and obtain such results created by other project participants, from the same data bank. The submitted data should follow the existing standards such as HIPAA and others (e.g. MIAME for gene expression data or similar standards for other data types), but the details of the R&D program for which the data is generated and the identity of the submitting organization may be obfuscated. The latter requirement is to shield the in-house research programs of the participating organizations from the competitive intelligence probes. The data bank may house the entire data sets or links to data sets stored in other public resources.