In early 2025, Tahoe Therapeutics released Tahoe-100M, a dataset of more than 100 million data points detailing how cancer cells respond to over 1,000 molecules. Within months, it had been downloaded nearly 100,000 times by AI labs and research institutions. The nonprofit Arc Institute used it to build an open-source virtual cell model that doubled the predictive accuracy of earlier systems. For the San Francisco-based startup, it was validation for their goals: digitally replicating the complexity of living cells to improve the odds of developing effective drugs.
Tahoe was founded in 2022 by chief executive Nima Alidoust, chief scientific officer Johnny Yu, and established University of California scientists Hani Goodarzi and Kevan Shokat. All four have backgrounds spanning single-cell genomics, machine learning, and drug discovery. The company’s premise is that while AI has advanced in protein modeling, attempts to simulate an entire cell have been limited by the availability of large, high-quality datasets.
Creating Massive, High-Resolution Cellular Datasets
Tahoe’s primary offering is Mosaic, a platform that organizes patient-derived cancer cells into “cell villages” to reduce variability and data quality issues. The cells are exposed to small molecules, many of which form the basis of medicines, and their reactions are measured using single-cell RNA sequencing. This method, known as scRNA-seq, produces more granular data than older single-cell transcriptomics techniques.
Tahoe says Mosaic can run thousands of experiments in parallel, generating datasets large enough to train AI models that can predict cell behavior. “Building Tahoe-100M required us to invent new ways to generate single-cell data,” Alidoust said. “Now, we’re applying that superpower to go 10x further, [and] using these massive datasets to bring about the GPT moment for AI models of human cells, translating insights to clinical readouts, and developing new medicines with much lower clinical failure rates”.
The company plans to expand from 100 million to one billion single-cell datapoints, mapping how tens of thousands of drug molecules interact with human biology. That expansion would create what Tahoe calls a “foundational dataset” for training virtual cell models: AI systems pre-trained to simulate gene function in varied biological contexts.
Strategic Investment Fuels Tahoe’s Expansion Plans
Tahoe has raised $42 million to date, including a $30 million round in August led by Amplify Partners with participation from Databricks Ventures, General Catalyst, Mubadala Ventures, and others. The financing values the company at $120 million.
Investors say the company is targeting one of the most persistent bottlenecks in drug development: the low rate at which early discoveries translate into clinical success. “While structural models have accelerated molecular design, they rarely translate to clinical success: a problem that remains one of the biggest challenges in drug development,” said Sunil Dhaliwal, general partner at Amplify Partners. “Tahoe Therapeutics is uniquely positioned to move the industry past this bottleneck by generating massive drug-patient datasets and training high-dimensional, cell-based AI models”.
The company’s business model blends internal drug discovery with selective partnerships. Tahoe intends to choose a single pharmaceutical or AI company to access the expanded dataset and co-develop medicines. That arrangement would combine Tahoe’s data with a partner’s clinical or modeling expertise, with the goal of advancing at least one drug candidate for a major cancer subtype into clinical trials.
Differentiating Through Whole-Cell AI Modeling
Tahoe’s focus on gigascale perturbative single-cell datasets sets it apart from competitors working primarily on structural or protein-level models. By capturing how whole cells, and the genes within them, respond to different chemical compounds, Tahoe aims to create training material for biological AI models analogous to the text datasets that fueled advances in natural language processing.
The strategy has already produced external results. Beyond the Arc Institute’s virtual cell model, early work with Tahoe-100M has yielded novel therapeutic candidates and drug targets across multiple disease areas.
In the meantime, the company is betting that the combination of scale, precision, and targeted collaboration will position it to influence both AI-driven biology and pharmaceutical R&D. According to Alidoust, “This next phase is about developing new medicines with much lower clinical failure rates”.








