By all accounts, Databricks has a data problem and so does everyone else.
“Everybody has some data, and has an idea of what they want to do,” said Jonathan Frankle, chief AI scientist at Databricks’ Mosaic AI. “Nobody shows up with nice, clean fine-tuning data that you can stick into a prompt or an [application programming interface].” That gap between ambition and usable input has become the defining bottleneck in enterprise AI.
Dirty data, not model size or GPU availability, is what’s holding back most organizations from realizing the promise of generative AI. And Databricks, a company that’s built its reputation on infrastructure for training large models, now finds itself facing the same challenge its customers do: the data is too messy.
To close that gap, Databr
Databricks Says Clean AI Data Doesn’t Exist and Acquires Fennel
- By Anshika Mathews
- Published on
Machine learning models are only as good as the data they learn from.
