Translational research

Can you predict a drug's clinical outcome from its in vivo data?

Over 90% of drugs fail, often too late. How do I read my in vivo results against the human clinical record to call likely failures earlier?

More than 90% of drugs fail, often after the most expensive studies. In a custom build with Gubra, we made an API that augments in vivo results with clinical evidence - comparing an internal compound's in vivo effect to compounds that reached the clinic to predict its likely outcome. On a 37-drug retrospective set it correctly flagged 84.21% of failures.

By PharosBioUpdated June 15, 2026

Who this is for: Preclinical and translational scientists who need their in vivo results interpreted against the human clinical record before committing further in vivo or clinical resources.

In collaboration with

of failing drugs correctly flagged (specificity): 84.21%
10 diseases, 1000s of records - retrospective test: 37 drugs
study interpretation, augmented with clinical evidence: days → minutes
with Gubra; API also a skill in Hydra: custom build

Why in vivo results are hard to translate

More than 90% of drugs fail, and the expensive failures happen late. The warning signs are usually present earlier - but cell, animal and human data are not directly comparable, so they are hard to read together. (A mouse liver is removed and weighed; a human liver is CT-scanned for volume - different methods, different units.)

The bottleneck a partner team actually named was not speed but comparability and reproducibility: methods, units and normalisation differ between studies, so pulling a specific measurement out and lining it up against the clinical record takes a well-defined schema and strict endpoint criteria. Done by hand it is slow, and easy to skip under deadline.

What teams in this space search for

Why don't in vivo / preclinical results translate to the clinic?
How do I predict a compound's clinical outcome from in vivo data?
How do I compare my in vivo result to drugs that reached the clinic?

Custom project

How we built it

Built on the PharosBio in-vivo augmentation APICustom build · API also a Hydra skill

A custom build with Gubra: an API that augments in vivo results with clinical evidence. It extracts measurements from clinical trials, PubMed and in-house data, scores compound similarity by target and mechanism using LLMs, and benchmarks an internal compound's in vivo effect against compounds that reached the clinic to predict a likely outcome.

How the pipeline works

The method is comparison against a reference. With data extracted from clinical trials, PubMed and a company's in-house records, the API compares and scores compound similarity (target, mechanism) using LLMs, then lines an internal compound's in vivo result up against published in vivo results from drugs that went on to the clinic.

If the internal compound's effect size matches or beats a compound that succeeded in the clinic, it is flagged as likely efficacious; the workflow can be augmented with toxicity, indications, model relevance and PK/PD simulation. It was built and tested with Gubra, a biotech serving 15 of the top 20 pharma worldwide.

What it found

On a diverse retrospective set of 37 drugs across 10 diseases, the tool reached 84.21% specificity (failing drugs correctly flagged), 33.33% sensitivity (true positives captured) and 59.46% overall accuracy - stronger at flagging likely failures than at catching every winner, which is exactly where late failure costs the most.

Relying on PubMed and clinical trials alone missed some compound-performance details, which contributed to the lower sensitivity - a direct consequence of the comparability problem. Validation is being expanded to ~11,000 more studies.

What we learned

The biggest practical hurdle is comparability and reproducibility, not speed: extracting a specific measurement requires a well-defined schema and strict endpoint criteria. Solving that is most of the work - and most of the value.

Each run also builds a growing knowledge graph of drug-disease interactions that can answer later portfolio and indication questions. This was a custom build with Gubra on the PharosBio API - in-house testing is complete and Gubra deployment is pending - and the same API is available as a skill in Hydra.

Published reference

Semaglutide

In vivo results

Clinical results

Internal compound

Dulaglutide

In vivo results

Predicted clinical outcome

Internal in vivo results are compared to published in vivoresults from compounds that went on to the clinic. If an internal compound’s effect size matches or beats a successful one, it is flagged as likely efficacious - and the workflow can be augmented with toxicity, indication, model relevance, and PK/PD simulation.

Clinical-outcome prediction: an internal compound’s in vivo result is benchmarked against published compounds that reached the clinic, yielding a predicted clinical outcome.

Retrospective performance

Tested on a diverse retrospective set of real drug outcomes.

37 drugs10 diseases1,000s of recordsretrospective ground-truth set

84.21%

specificity - failing drugs correctly flagged

33.33%

sensitivity - true positives captured

59.46%

overall accuracy

The tool is stronger at flagging likely failures than at capturing every winner - useful precisely where late failure is most expensive. Validation is being expanded to ~11,000 more studies.

One API, many workflows

This was a custom build for a partner, powered by the PharosBio in-vivo augmentation API - which fetches clinical endpoints and scores drug similarity by target and mechanism of action. The same API can be repointed at:

Augmenting in vivo results with clinical evidence
Clinical-outcome prediction
Toxicity & PK/PD context
A growing drug-disease knowledge graph

It is also available as a skill inside Hydra

What you get

In vivo results interpreted against the human clinical record - days to minutes
84.21% of failing drugs correctly flagged on a 37-drug retrospective set
Likely failures surfaced early, where late failure is most expensive
A growing drug-disease knowledge graph for future portfolio questions

Data sources used

Clinical trials (endpoints & outcomes)
PubMed (published in vivo & clinical results)
Company in-house in vivo data
Compound similarity by target & mechanism (LLM-scored)

Figures reflect analyses PharosBio ran on public datasets and public benchmarks. Named competitors, collaborators, and logos are withheld at this stage; the methods and results shown are real and repointable to your own target.

Sources & methods

Built and tested with Gubra (biotech serving 15/20 top pharma worldwide)
Retrospective set: 37 drugs across 10 diseases; validation expanding to ~11,000 studies
Preclinical-to-clinical attrition: ~90% failure (industry)

Frequently asked questions

Does this replace in vivo experiments?

No - it interprets them. The API reads your in vivo results against the clinical record to predict a likely outcome, so the studies you commit to are the ones most likely to translate.

Why is sensitivity lower than specificity?

Because cell, animal and human data are not directly comparable, and PubMed and clinical trials alone can miss compound-performance details. The tool is tuned to be reliable at flagging likely failures (84.21% specificity); capturing every true positive needs more and better-normalised data, which the expanded validation addresses.

Was this built with Hydra?

No - it was a custom build with Gubra on the PharosBio in-vivo augmentation API. That API is also available as a skill in Hydra, so the same in-vivo-to-clinical workflow can be run there.

Run this analysis on your question

Hydra plans, executes, and validates, so you reach a defensible answer in hours, not weeks.

Try Hydra Work with us

Related case studies

Novel targets & repurposingRead Combinatorial therapyRead Safety signals in combinationsRead