DS10: How should the applicability domain of the model be expressed?

The Applicability Domain Analysis is a traditional means to assess the confidence of a prediction on a particular property of a compound. Traditionally, it is based on the idea that similar compounds have similar properties, and therefore, if a test compound is similar enough to the compounds used to train a Machine Learning model, the prediction of the model should be reliable. While the general concept of Applicability Domain is widely spread, there is little consensus on how to do and report the analysis.

Traditionally, to determine if a test compound falls within the AD, a metric that quantifies the distance between the test compound and the set of compounds used to train the ML model training set is calculated, and if this metric is less than a particular threshold the test compound is said to fall within the AD. This approach suffers from a number of problems that ultimately results in dubious AD analysis. Some of the problems are as follows. 1) There is no consensus on the distance metric to be used. 2) The similarity of a compound is estimated with respect to the whole set of compounds, i.e., this is a global comparison, and often, the reliability of a prediction depends on similarity to closest neighbors. 3) The definition of the molecular features used to perform the similarity calculation can be challenging. Typically, molecules are represented with hundreds or thousands of features, but only a small subset of such features carry predictive information. Therefore, if the similarity analysis is performed with the whole set of features, very likely, the analysis will not provide useful information. 4) In some of these distance-based approaches, a threshold needs to be set, and this brings additional hurdles.

Alternatives: distance to closest neighbors (problem with threshold and metrics). Has been used. More modern AD involve the confidence that the ML method reports. This can implicitly account for the more informative descriptors, and has shown to be more informative (confidence vs error). One drawback is that it will be “ML method” dependent and in some cases it may be more difficult to understand.

Reporting AD:

Reporting the error for predictions on a test compound as a function of the distance metric, or confidence used.