AI in drug discovery: a regulatory tightrope walk

Wednesday 3 December 2025

Julio Copo
Arochi & Lindner, Mexico City
jcopo@arochilindner.com

Introduction

The landscape of drug discovery is undergoing a seismic shift, catalysed by the rapid integration of artificial intelligence (AI). No longer confined to theoretical promise, AI is now actively reshaping how pharmaceutical companies identify potential drug targets, design novel molecules and predict clinical trial outcomes. By leveraging vast and complex datasets – from genomic sequences to real-world patient data – AI systems can uncover patterns and insights that would be impossible or prohibitively time-consuming for human researchers to detect. This technological leap offers the potential to drastically reduce the time, cost and attrition rates associated with traditional drug development pipelines, accelerating the delivery of life-saving therapies to patients worldwide.

However, this transformation is not without its challenges. The very attributes that make AI so powerful – its computational complexity, autonomous learning capabilities and often opaque decision-making processes – pose significant hurdles for regulatory oversight. Traditional frameworks for drug approval, which rely on transparent, reproducible and well-understood mechanisms of action, are being stretched to accommodate technologies that defy conventional explanation. Regulators must now grapple with questions of accountability, explainability and safety in a landscape where the ‘developer’ may be an algorithm and the rationale behind a drug’s design may be buried in layers of neural network abstraction.

Navigating this evolving terrain demands more than incremental updates to existing policies – it requires a fundamental rethink of how innovation is evaluated, validated and monitored. The challenge is to strike a delicate balance: enabling the full potential of AI to revolutionise drug discovery while ensuring that patient safety, scientific integrity and public trust remain uncompromised.

The ‘black box’ problem: explainability vs performance

At the heart of the regulatory debate surrounding AI in drug discovery lies the ‘black box’ dilemma. Advanced AI models – particularly those built on deep learning architectures – are capable of processing millions of chemical structures, biological interactions and clinical variables to generate highly accurate predictions. These models can identify promising drug candidates, anticipate adverse effects and optimise molecular design in ways that far exceed human capabilities. However, their decision-making processes are often opaque, lacking a clear, interpretable rationale that regulators and scientists can scrutinise.

For regulatory agencies such as the United States Food and Drug Administration (FDA) and the European Medicines Agency (EMA), this opacity presents both philosophical and operational challenges. Their core mandate is to ensure that any approved therapy is both safe and effective, a standard traditionally grounded in mechanistic understanding and empirical evidence. If a pharmaceutical company submits a drug candidate selected by an AI model, but cannot explain the molecular rationale behind the selection – why the AI chose that compound over others, or how it predicts efficacy and safety – then the regulatory burden of proof becomes difficult to satisfy.

This tension between performance and explainability is not merely academic; it has real-world implications for patient safety, legal accountability and scientific reproducibility. While AI may uncover novel therapeutic pathways, regulators must ask: can we trust a model whose logic we cannot fully understand? Is predictive accuracy alone sufficient, or must the underlying reasoning be transparent and verifiable?

In response, the scientific community is actively developing interpretability tools to ‘open the black box’. Techniques such as attention mechanisms, feature attribution maps, and model-agnostic explanations (eg, LIME, SHAP) aim to provide insights into how AI models arrive at their conclusions. However, these tools often fall short of delivering comprehensive, mechanistic explanations that satisfy regulatory standards. They may highlight correlations or influential features, but they rarely offer causal clarity or biological plausibility.

Moreover, the challenge is compounded by the fact that explainability itself is a spectrum. Some models may offer partial transparency, while others remain entirely inscrutable. Regulators must therefore grapple with a pivotal question: what constitutes a ‘necessary and sufficient’ level of explainability for regulatory approval? Should different thresholds apply depending on the risk profile of the AI’s application – eg, molecule selection versus clinical trial prediction?

Establishing clear, risk-based standards for AI explainability will be essential to unlocking regulatory pathways for AI-designed drugs. This may involve tiered frameworks where low-risk applications require minimal interpretability, while high-stakes decisions demand rigorous, multi-layered justification. It may also require new forms of documentation, such as algorithmic audit trails, model validation protocols, and explainability reports tailored to regulatory review.

Ultimately, the goal is not to constrain innovation, but to ensure that AI-driven drug discovery remains accountable, transparent, and aligned with the ethical and scientific principles that underpin modern medicine

Data integrity and bias: the foundation of trust

While AI’s computational power is transformative, its reliability is fundamentally bound to the quality of the data it consumes. AI models are not inherently intelligent – they learn from patterns in historical data. If that data is biased, incomplete or poorly curated, the model will not only reflect those flaws but may amplify them, leading to skewed predictions and potentially harmful outcomes in drug design and development.

In the pharmaceutical and biomedical domains, this issue is particularly acute. Historically, clinical trials and genomic studies have disproportionately represented certain populations – most notably individuals of European descent – while underrepresenting others, including people of African, Asian, Indigenous and Latin American backgrounds. This lack of diversity in foundational datasets means that AI models trained on them may inadvertently design drugs that are less effective, or even unsafe, for underrepresented groups.

For example, an AI system trained predominantly on data from Caucasian populations might prioritise molecular features that correlate with efficacy in that group, while overlooking genetic variations or metabolic pathways more prevalent in other populations. This could result in ‘precision medicine’ that is only precise for a narrow demographic, undermining the goal of equitable healthcare and exacerbating existing disparities.

Regulators are increasingly aware of this risk and are moving toward codifying data integrity and representativeness as regulatory requirements – not just scientific best practices. Agencies like the FDA and EMA are beginning to demand that drug sponsors demonstrate the provenance, accuracy and diversity of their training datasets. This includes documenting how data was collected, curated and validated, as well as ensuring that it reflects the demographic and biological diversity of the populations the drug is intended to serve.

Moreover, the concept of ‘algorithmic fairness’ is gaining traction. Regulators and ethicists are exploring frameworks to assess whether AI models produce equitable outcomes across different subgroups. This may involve stress-testing models against synthetic or real-world datasets representing diverse populations and implementing bias mitigation strategies during model development.

Ultimately, the integrity of data is not just a technical concern – it is a moral and regulatory imperative. Without robust, representative and auditable datasets, the promise of AI in drug discovery risks becoming a driver of inequality rather than a tool for universal progress. Ensuring that AI-designed therapies are safe and effective for all requires a foundational commitment to data quality, transparency and inclusivity.

The challenge of continuous learning: the need for ‘living’ approval

One of the most disruptive characteristics of modern AI systems is their capacity for continuous learning – the ability to evolve by ingesting new data and refining their internal parameters over time. Unlike traditional software tools, which remain static once deployed, many AI models are designed to adapt dynamically, improving their predictive accuracy and responsiveness as they encounter new information.

While this feature is a cornerstone of AI’s potential, it introduces a profound regulatory paradox. The conventional drug approval process is built around the concept of a fixed product: a molecule, a manufacturing process or a clinical protocol that remains stable throughout its lifecycle. Regulators assess and approve these products based on rigorous, point-in-time evaluations. But what happens when the ‘product’ is an evolving algorithm?

This dynamic nature raises critical questions for regulatory oversight:

  • When does an AI model’s evolution constitute a material change?
  • Should every update – no matter how minor – trigger a new regulatory submission?
  • How can regulators ensure ongoing safety and efficacy without stifling innovation?

Requiring full re-evaluation for every model update would be prohibitively expensive and slow, potentially undermining the very benefits that continuous learning offers. On the other hand, allowing AI models to evolve unchecked could introduce new risks – such as degraded performance, emergent biases, or unintended shifts in decision-making logic.

To address this, regulators are exploring the concept of a ‘living’ approval framework. This approach envisions a more flexible, life cycle-based model of oversight, where AI systems are approved not as static entities but as dynamic tools governed by predefined boundaries and monitoring protocols.

Key components of this framework may include:

  • Change control protocols: Sponsors would define acceptable types of model updates – such as retraining with new data or minor architectural tweaks – and establish thresholds for what constitutes a ‘significant’ change requiring re-review.
  • Performance monitoring: Continuous validation mechanisms would track the model’s outputs over time, ensuring that predictive accuracy, safety and fairness remain within acceptable limits.
  • Audit trails and versioning: AI models would be version-controlled, with detailed logs of changes, training data updates and performance metrics to support traceability and accountability.
  • Risk-based triggers: Only changes that materially affect the model’s behaviour or its impact on patient safety would prompt a full regulatory reassessment. This ‘living’ approval paradigm represents a fundamental shift in regulatory thinking – from static compliance to dynamic stewardship. It acknowledges that AI is not a one-time submission but an ongoing relationship between developers, regulators and the technology itself.

If implemented effectively, such frameworks could unlock the full potential of adaptive AI in drug discovery, allowing models to learn and improve in real time while maintaining rigorous safeguards for public health.

Paving a path forward: risk-based frameworks and global collaboration

Addressing the regulatory challenges posed by AI in drug discovery is not about slowing innovation – it is about building a resilient, forward-thinking framework that safeguards patient safety while enabling technological progress. The complexity and novelty of AI demand a departure from rigid, one-size-fits-all regulatory models toward more nuanced, adaptive approaches. This transformation hinges on deep collaboration between AI developers, pharmaceutical companies and global regulatory bodies, particularly the FDA in the US and the EMA in Europe.

The future of AI regulation is already taking shape around risk-based assessment frameworks. These frameworks recognise that not all AI applications carry the same level of risk. For instance, an AI tool used to optimise laboratory scheduling poses minimal risk to patient safety and may require only light-touch oversight. In contrast, an AI system that designs a novel molecule or makes real-time decisions in manufacturing could directly impact product quality and patient outcomes – necessitating rigorous validation and full regulatory pre-approval.

This proportional approach is exemplified by the FDA’s proposed seven-step risk-based credibility assessment, which evaluates AI models based on their intended use, influence on regulatory decisions and potential consequences of failure. It marks a shift from scrutinising the algorithm itself to assessing the credibility and reliability of its outputs.

Key initiatives shaping the path forward

Risk-based governance

The FDA’s draft guidance encourages sponsors to define the AI model’s Context of Use – a clear description of how and where the model will be applied – and to assess its risk accordingly. This allows regulators to tailor their scrutiny based on the model’s role in the drug development process. For example, a model used in early-stage compound screening may face different requirements than one used to support clinical trial design or regulatory submissions.

Life cycle management

To address the challenge of continuous learning, agencies are promoting life cycle maintenance plans. These plans require sponsors to implement pre-defined monitoring protocols, change control mechanisms and performance benchmarks. The goal is to ensure that AI models remain reliable and safe throughout the drug’s development and manufacturing lifecycle. Importantly, this approach avoids unnecessary resubmissions for minor, low-risk updates, preserving agility without compromising oversight.

Global alignment

Regulatory harmonisation is essential to prevent fragmentation and delays in global drug development. Agencies like the EMA and FDA are actively working to align their frameworks, terminology and expectations. The EMA’s workplan, aligned with the European Union AI Act, emphasises a human-centred approach, formal risk analysis and ethical AI deployment. This global coordination is vital for multinational pharmaceutical companies seeking to bring AI-designed therapies to market across jurisdictions.

The overarching goal is to establish flexible, scalable standards for data governance, model validation, and continuous monitoring – standards that can evolve alongside the technology itself. This includes:

  • transparent documentation of AI model development, training data, and performance metrics;
  • auditable workflows that allow regulators to trace decisions and verify outcomes; and
  • ethical safeguards to ensure fairness, accountability, and respect for patient diversity.

By tackling these regulatory hurdles head-on, the industry can unlock the full potential of AI – not just as a tool for efficiency, but as a catalyst for innovation, equity and global health impact. If successful, this collaborative effort will usher in a new era of drug discovery, where therapies are developed faster, tailored more precisely and delivered more safely to the patients who need them most.