Mobile Menu

How to build AI models that people will trust

AI models are becoming increasingly commonplace in the life sciences field, particularly in drug discovery. But these models are complex, and building them can be a difficult task.

In our recent webinar, Richard Lewis (Director of Data Science, Novartis) talked us through the challenges of building AI models, how to overcome these challenges, and the lessons that have been learnt in the process. To hear Richard’s talk in full, plus the others talk on ‘Harnessing the Power of Big Data in Drug Discovery with AI’ please click the following link.

How to build an AI model – the basics

Richard quotes Brandon Allgood, stating that building an AI model requires three things to be successful: an understanding of the domain that you’re modelling, a good grasp of data science and a knowledge of software engineering. When one or more of these things is missing, model building will fail.

Let’s start with domain knowledge. In order for a model to be successful, you must understand the source of the observed data, what is being measured, the limits on the measurements, the experimental error and the dynamic range of your measurements. Richard highlights this with an example from a paper he recently published that demonstrates how compounds partition within water and octanol (a common method used in the early stages of drug discovery) using QSAR and machine learning methods.

Figure 1: Image describing the fundamentals of domain knowledge and an example figure from a recent paper. Screenshot taken directly from Front Line Genomics webinar.

Richard displays a figure from the paper, in which there is a significant anomaly in one of the measurements (green bar chart in Figure 1). He stresses that this is likely because the sample has reached the limits of the equipment, and is not a true reading. This is known as a ‘stripe’, which can lead to an unbalanced dataset if taken at face value, adding bias to the experiment. This is a stark example of not understanding your domain, which can lead to failed model building.

Is model building too easy?

You understand your domain, and now you need to build your model. But what about the data science? Richard posits that when it comes to model building, you should spend only 20% of the time on the build and 80% of the time curating the data. There are many important facets to data preparation; how to handle duplicate data, understanding if there is enough data, how is that data described and so on. Often, people do not appreciate how easy it is to build a crude model, but these basic models do not always take these important questions about the data into account.

Figure 2: Image describing the use of learning curves in model building. Screenshot taken directly from Front Line Genomics webinar.

So, how much data do you need to build a model? This is where learning curves come in. You can see in Figure 2 that as you increase the amount of data, the model gets better, reaching the ‘grey zone’. This is used to define whether the model is good enough to inform decisions. Without enough data, there is a risk of overfitting. And although Richard states that basic model building is ‘easy’, he highlights a number of challenges, including figuring out suitable control experiments.

Richard: ‘Model building is still a black art.’

Where does software engineering come into all of this? There are a number of questions to answer.

Can your model be simply deployed? Can it be used on a platform that most people use? Is it scalable, and is it versionable? Can updates be easily deployed? Is it easily maintainable by others?

All of these questions are important for increasing accessibility of the model, even if the model itself works well. If you want people to use it, you have to have a good grasp of software engineering to ensure that these questions are addressed.

A question of sociology

Even with domain knowledge, good data science and decent software engineering, at what point does someone trust a model to make decisions?

A survey at Novartis showed that the model had to be right 90% of the time for the average person to trust it. To reach this high threshold, answers are generally returned in three classes. For example, you can say that there is 90% certainty that something falls above a certain threshold (such as drug solubility), 90% certainty that it falls below another threshold, or that it is inconclusive. If something falls into the latter class, this indicates that the model shouldn’t be used to obtain the answer and the physical experiment should be carried out. This satisfies the goal of 90% certainty, but leads to doing more experiments.

Richard also highlights that there should be support from leaders to encourage the use of models and that trust should be built by only publishing good, accessible models. In other words, using the model has to be easier and more reliable than doing the physical experiments for someone to consider using it.

Richard: ‘Untrusted models will not drive decisions. Models that have failed before are trusted less.’

Q&A Highlights

Q: How important is it to standardize software engineering to ensure models are robust and ‘trusted’? What efforts are there to ensure standardization?

A: That’s a difficult question. There is an initiative to try to make models that are available outside of companies into some standard form. But it is still very early days. So, I think they need to have what we do in house. E.g., a review panel. How are you standardizing the data? Is your experiment column-based? Are you doing normalization? Are you shifting the data? What descriptors are being used to describe the objects you’re measuring? I talk about chemical structures, and there are probably more than 1,000, maybe even 5,000 ways you can describe chemical structures. Then you’ve got the way in which you build the statistical model. Do you use graph neural networks? Do you use support vector machines? How do you do that and how do you make it so that everybody can use that model? So, I think we’re starting to do standardization. But we are still not there.

Q: With regards to the 90% correct threshold, how common is it for models to reach this? And in the future, do you predict that this threshold would rise?

A: It’s actually very difficult for models to reach that. In the example I gave about solubility, we might have to use thresholds to ridiculously high levels. So, rather than saying it’s going to be less soluble than 10, or more soluble than 100, we might have to say it’s less soluble than one, or it’s going to be more soluble than 10,000. So, you then get smaller and smaller classes that are successful, and you get a bigger and bigger inconclusive class.

If they want that 90%, the users also have to accept, in more times than they may like, that the model will say, ‘I just don’t know.’ If you want to take the threshold down from 90%, to, say, 80%, then you’ll get fewer compounds or fewer objects predicted as inconclusive. So, there’s a trade-off; the more accurate you want your model to be, the more often the model will say, ‘I don’t know.’ And that you have to play off against how much the assay costs to run. If it is a very expensive assay, you should perhaps be more prepared to get wrong answers. If it’s a very cheap assay, just run it.

So again, that’s one of the things we found when I was building models for proteins and predicting protein mutations. Sometimes the if the assay was very, very expensive, you would you believe the model more because that was the only way you’re going to get a prioritized list. If the assay was very cheap to run, then you would expect the model to be better because you could run the assay faster. So, if you want to get more accurate, you have to run your learning curve. Let’s see how much data you want to get and how far off you are from actually getting near to experimental error.

Q: As machine learning theories and techniques advance, what are some applications in pharma/biotech you look forward to?

I’d like to model almost everything that we can get. At the moment we are connecting biochemical assays to models, the next stage will be connecting cellular assays. Eventually, we would love to be able to connect clinical observations, which are much more sparse and noisy, to models. It’s not there yet. But we are heading there. Every company is investing a lot to try and understand what happens in the clinic and how can we model that, and how can we connect that upstream to the models we’re using in the lab. For example, for side effects or off-target effects, we might have some clinical data on that, which we can then, the next time we run the project, use in some form to drive decision making in discovery.


More on these topics

AI / Drug Discovery / Machine Learning