Thank you for Subscribing to Life Science Review Weekly Brief
If conventional drug discovery is akin feeling around in the dark, artificial intelligence (AI) is portrayed as a high-tech flashlight, an essential tool to light the way. Unbridled enthusiasm for AI drug discovery is driving massive investment in AI-drug startups and enticing established pharmaceutical companies to recruit AI talent and seek partnerships with AI-drug outfits. In this moment of rapid technological change, it is a challenge for an established pharmaceutical company to implement an AI strategy that delivers timely insights. To delay action on AI is to cede leadership in drug discovery and potentially, to relinquish competitiveness in a therapeutic area. While every organization is different, our company is taking a hybrid approach, building core AI capabilities internally while partnering to gain access to specific AI technologies. We draw upon deep expertise in biology and chemistry, especially in computational biology and computational chemistry, to define AI goals and sanity-check AI outputs. From target discovery to clinical trials, AI has the potential to increase hit rates and prevent program-ending surprises. Given the stakes, it is no wonder there is a rush to gear-up with AI.
From target discovery to clinical trials, AI has the potential to increase hit rates and prevent program-ending surprises
At its essence, AI is about learning from available data to make better decisions. In drug development, better decisions often mean a higher hit rate. If an AI target selection tool could eliminate half of targets from the start, that translates into half the work / cost / time. Wouldn’t you like to move twice as fast? However, if the AI tool eliminates many good targets from the start in error, it may be worse than using no tool at all. Clearly, accuracy counts.
A big impediment to AI adoption in drug discovery is lack of transparency. While working in secret is commonplace in the pharmaceutical industry, it is not efficient for sorting-out the AI systems that generate bona fide hits from those that generate noise. Another challenge is the lack of measurements of predictive accuracy that are commonplace in machine learning outside of drug discovery. Drug discovery is often concerned with one or a few objects of interest – the hits – which strain a frequentist definition of accuracy. Furthermore, designing a drug is a multiparameter optimization problem. If an AI system optimizes one stage of the pipeline, it can be to the detriment of other stages, but that may not come to light until years later. It is worth pointing out that arguably the most significant AI advancement relevant to drug discovery,
the AlphaFold system for predicting protein 3D structure from protein sequence, was not developed in the pharmaceutical industry.
Progress in the “protein folding problem” followed a well-traveled path in the machine learning community: a publicly available training set (the protein data bank crystal structures), a neutral-party assessment on blinded predictions (the critical assessment of protein structure prediction, CASP), and a forum for open exchange of ideas and methods (competitions, conferences, and papers). Could such an open dialog around protein-ligand binding or therapeutic antibodies be possible?
"From target discovery to clinical trials, AI has the potential to increase hit rates and prevent program-ending surprises".
Given the challenges of AI adoption in the pharmaceutical industry, we decided that a combination of internal AI development and external partnerships is the best way to bring AI into our drug discovery process. As an example, we focus on target identification here. AI methods for target identification can use large-scale data integration to reveal hidden connections between targets, diseases, drugs, etc. A myriad of data types can be represented in a knowledge graph, a data structure amenable to exploring connections “by hand” and by machine learning. Knowledge graphs for target identification can be populated with transcriptomic measurements, canonical pathways, data-driven modules, protein-protein interactions, protein-ligand interactions, genetic interactions, gene sets, ontologies, literature-derived relationships, and other data. In short, there are a lot of options for how to represent the problem. Some potential partners specialize in this mode of target identification and can deliver hits that can be experimentally validated. The “pay for hits” arrangement has the advantage of instant gratification, but as discussed, it can be totally opaque, a black box algorithm wrapped in a black box business agreement. To most effectively engage with an AI partner requires people on the receiving end who can cut through marketing hype, are trained in machine learning, and can rigorously assess hits using quantifiable metrics. Internal AI activity does not need to replicate partners’ AI systems, nor would it necessarily be possible given the degrees of freedom in how to represent the problems. Rather, internal AI development within the pharmaceutical enterprise can be focused on developing our own target feature spaces. Internally, we can measure machine learning accuracy on various supervised and semi-supervised tasks to know how much confidence to put in our target representations. Through this lens we can assess the relevance of externally generated hits to our disease biology. How surprising is this hit? If your computational tools cannot shed light on that question you need to start building a new flashlight.