AI models still reach limits in drug discovery

May 08, 2026

Developing new medicines is a slow and costly process. For some years now, artificial intelligence (AI) has been considered a promising tool to accelerate drug discovery by identifying drug candidates faster and more efficiently. However, a new study led by Dr. Janani Durairaj and Prof. Torsten Schwede at the Biozentrum of the University of Basel shows that current AI models in drug discovery still face limitations.

Prediction of a protein-ligand interaction.

When Google DeepMind’s AlphaFold system stunned the scientific community by predicting the three-dimensional structure of proteins with near-perfect accuracy, it raised high hopes: Could artificial intelligence (AI) soon also predict which small molecules bind to a protein and how these interactions might influence protein function? Such information is essential for designing new drugs, and therefore the use of AI could greatly accelerate drug discovery.

In a recent study published in “Nature Structural & Molecular Biology,” Dr. Janani Durairaj and Prof. Torsten Schwede from the Biozentrum, University of Basel, temper expectations and euphoria around AI-assisted drug design. “For molecules with familiar patterns, today’s AI models can indeed make reliable predictions about how they interact with proteins,” says Durairaj. “But they still struggle with entirely new molecules or even known molecules binding in new ways.” This limitation poses a major challenge for drug discovery, where the goal is rarely to reproduce what is already known, but rather to innovate.

AI models put to the test

Until now, it has been unclear whether modern AI models learn how small molecules – known as ligands – interact with proteins, or whether they merely “recognize” and reproduce examples from their training data. To evaluate this, the researchers introduced a new, large-scale benchmark dataset called Runs N’ Poses. The dataset contains detailed structural information on more than 2,600 high-resolution protein–ligand complexes that were determined experimentally and released after the training cutoff of current AI systems. This allowed the team to assess how well four of the leading AI systems cope with unknown molecules.

“The results were sobering,” says Durairaj. “The models succeed when faced with ligands similar to what has been learned, in other words, when their chemical structure and interaction is familiar to the AI from the training data. But for truly unknown binding modes that the AI has never seen before, prediction quality decreases significantly.” The current AI systems rely on memorizing and interpolating between known examples from their training data rather than understanding the underlying biophysical principles that govern protein–ligand interactions.

With their open-source Runs N’ Poses benchmark dataset, the researchers provide a more realistic basis for evaluating and training future AI models. “AI has enormous potential for drug development,” says Schwede. “But we need to be honest about what the technology can and cannot yet achieve. Only with realistic benchmarks can we train AI systems that genuinely help us discover new drugs.”

AI models rated too optimistically

The study also shows that earlier evaluations of AI-driven drug discovery were often overly optimistic. Many benchmarks test AI models on problems that closely resemble their training data. That’s like praising someone for passing an exam when the questions are identical to the practice exercises. The researchers therefore advocate for more diverse training data and more realistic evaluations. Innovation is essential in drug development – especially when existing medications are no longer effective, for example due to antibiotic resistance or in the case of cancer, or when no treatment is available at all, as is the case for some rare diseases.

Improving evaluation methods for AI predictions

In a second study, Torsten Schwede’s team addresses another critical challenge: how to reliably assess AI predictions in an era of increasingly large and complex models. Modern AI systems generate millions of predictions for structures of proteins and macromolecular complexes as well as ligand interactions, based on growing datasets and increasingly large experimentally determined reference structures. This makes it challenging to evaluate how reliable these predictions are.

In “Nature Methods”, the researchers identify limitations in existing structure comparison methods and present robust approaches to better assess the reliability of AI-generated predictions. Such tools will be essential for using AI in biomedical research and drug discovery.

Publications:
Peter Škrinjar, Jérôme Eberhardt, Gabriel Studer, Gerardo Tauriello, Torsten Schwede, Janani Durairaj. Evaluating generalization in protein-ligand cofolding methods. Nature Structural & Molecular Biology, published online 8 May 2026

Gabriel Studer, Xavier Robin, Stefan Bienert, Janani Durairaj, Peter Škrinjar, Gerardo Tauriello, Andrew Waterhouse, Torsten Schwede. Comparing macromolecular complexes - a fully automated benchmarking suite. Nature Methods; published online 22 December 2025

Contact: Communications, Katrin Bühler

Back

Research Groups A-Z

Research Groups

Research Fields & Publications

Researchers

Awards & Honors

Technology Ventures

Partners

Overview

PhD at a Glance

PhD Student Office

International PhD Program

PhD Students

Overview

Degree Programs

Graduate Teaching Program

Course Coordination

Summer Schools

Internships

Technology Platforms A-Z

Technology Platforms

Services

Latest News

News

Events

Overview

Biozentrum at a Glance

Community

Organization

Administration

Alumni

AI models still reach limits in drug discovery

AI models put to the test

AI models rated too optimistically

Improving evaluation methods for AI predictions

Navigation mit Access Keys

AI models still reach limits in drug discovery

AI models put to the test

AI models rated too optimistically

Improving evaluation methods for AI predictions

Related Research Group