Imaging AI Only Matters When It Works in the Real World

DeepHealth’s massive community-screening study shows why lab metrics no longer decide who gets adopted

Large, multi-site real-world studies are becoming the deciding factor in how hospitals buy imaging AI. Despite broad agreement that real-world evidence matters, few vendors have tested their systems at scale across heterogeneous, multi-site screening programs, where performance often differs from lab studies.

For years, vendors relied on retrospective test sets, reader studies and controlled research environments. Health systems are now demanding evidence from routine clinical practice across diverse patient populations and equipment.

DeepHealth’s recent ASSURE study, which the company reports involved about 579,000 women across 109 community imaging sites, is a prominent example of this shift. The company’s broader RSNA 2025 launch, spanning lung, thyroid, prostate, neuro and operational AI, shows it is positioning real-world evidence as a platform-level strategy.

DeepHealth reports that ASSURE evaluated an AI-supported workflow in community screening practice. According to the company, the AI-augmented “Safeguard Review” workflow increased cancer detection rate by 21.6% compared with standard 3D mammography and improved positive predictive value by 15%, while recall rates stayed within American College of Radiology guideline limits.

The company also reports consistent gains across breast-density groups and racial subgroups, including a 22.7% detection increase for women with dense breasts. “There has never been a similar study of this size in the United States, much less one with such a diverse patient population, that examines the patient impact and efficacy of AI-assisted breast cancer screening,” said Kees Wesdorp, President and CEO of DeepHealth in the company’s announcement. 

Multi-site evidence now outweighs lab performance

Retrospective studies do not reflect how AI performs across varied scanners, acquisition techniques, radiologists or patient populations. ASSURE’s results (as reported by DeepHealth) illustrate the spread between reader-study performance and outcomes in heterogeneous community environments.

Lunit’s recent large multicenter prospective mammography study shows a similar pattern: the company reported a 13.8% increase in cancer detection in single-reader screening, along with a 36% reduction in reading time during deployments. “By reducing reading time by 36% while improving cancer detection rates and screening efficiency, we are proving that AI is not just a tool but a critical partner,” said Lunit CEO Brandon Suh in the company’s published statement.

ScreenPoint Medical also relies on external validation to distinguish its Transpara system. Its breast AI was selected for the PRISM randomized controlled trial in the United States, funded by PCORI, which will evaluate AI performance in routine screening workflows. “This landmark trial… attests to the potential of AI to help radiologists shape a healthier future,” said ScreenPoint CEO Pieter Kroese in the company’s announcement.

Device manufacturers are also leaning on real-world evidence. Hologic publishes performance and workflow data linked to its 3DQuorum imaging solution. GE HealthCare emphasizes regulatory-cleared, device-embedded AI such as its FDA-authorized Pristina deep-learning reconstruction for mammography. “AI allows for the iterative refinement of treatments based on real-world data,” said GE HealthCare President and CEO Peter Arduini in a recent statement. Siemens Healthineers has expanded multi-modality AI collaborations, including ultrasound initiatives with DeepHealth in select markets.

DeepHealth’s RSNA 2025 launch shows the same shift beyond breast imaging. The company introduced AI suites for lung, thyroid, neuro and prostate imaging, each tied to workflow or measurement claims.

For thyroid ultrasound, DeepHealth reports that radiologists accepted AI-generated nodule measurements and characterizations without correction in more than 94% of 4,070 nodules and that scan-slot time could be reduced by up to 30% in deployment settings. Its TechLive platform, which the company states has 510(k) clearance, reported a 42% reduction in MR-room closures in internal pilot deployments across more than 400 connected scanners. These are vendor-reported results and have not yet undergone independent evaluation.

How AI Tools Are Being Judged in Practice

Health systems evaluating imaging AI are placing greater emphasis on evidence generated in multi-site deployments. Larger studies allow performance to be examined across breast-density, racial and age subgroups, and prospective designs are beginning to outweigh retrospective analyses. Independent trials have become a point of comparison. The PRISM study in the United States and the TRANSFORM trial in the United Kingdom, for which DeepHealth reports its prostate suite was selected, are examples of how external oversight is shaping vendor credibility.

Operational measures are entering the evaluation process. Metrics such as reading time, recall patterns, scan-slot duration and throughput are being considered alongside detection performance. ASSURE’s “Safeguard Review” workflow, which created second-read triggers for high-suspicion cases, shows how AI can alter case routing within routine practice. Similar questions apply to thyroid and neuro tools that automate measurements and tracking.

As these evaluation methods spread, vendors are increasingly judged on performance in real clinical settings rather than on isolated model accuracy.

📣 Want to advertise in AIM Media House? Book here >

Picture of Mukundan Sivaraj
Mukundan Sivaraj
Mukundan covers enterprise AI and the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com or Signal at mukundan.42.
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.