Datacurve, a Y Combinator graduate, has raised $15 million in a Series A funding round led by Mark Goldberg’s Chemistry. Employees from DeepMind, Vercel, Anthropic, and OpenAI also participated in the round. The company previously raised a $2.7 million seed round backed by Balaji Srinivasan and other early-stage investors.
The Series A funding is intended to strengthen Datacurve’s operational systems for sourcing high-quality datasets. The company focuses on maintaining structured workflows for contributor engagement, task assignment, and data verification.
Co-founder Serena Ge said, “We treat this as a consumer product, not a data labeling operation. We spend a lot of time thinking about: How can we optimize it so that the people we want are interested and get onto our platform?”
Datacurve recruits software engineers for tasks requiring specialized technical knowledge. Contributors are selected based on skill, and tasks are structured to ensure that the resulting datasets meet defined quality standards. The company emphasizes workflow consistency, making sure that all stages, from task assignment to final submission, follow a repeatable process.
Bounty System Engages Contributors
Datacurve uses a bounty-based system to attract participants and provide financial incentives for completing defined tasks. Contributors have received more than $1 million in total bounties. The system combines clear task instructions, verification processes, and compensation to ensure data accuracy and contributor engagement.
The platform initially focuses on software engineering data. Contributors complete tasks such as analyzing code, debugging, and evaluating software outputs. Each submission is reviewed to confirm correctness before integration into training datasets.
Ge explained that the platform balances contributor engagement and task complexity to maintain high standards. “We spend a lot of time thinking about the experience of contributors. The goal is to make it clear what is expected, provide fair rewards, and maintain consistent quality,” she said.
The company’s workflow includes multiple checkpoints, such as automated verification and human review, to ensure that datasets meet required standards. Structured workflows allow Datacurve to manage high volumes of contributions efficiently.
Software Engineering Data and Industry Context
Datacurve operates in a market where post-training datasets are critical for AI applications. Scale AI, founded by Alexandr Wang, has historically provided large-scale datasets across sectors such as autonomous systems, logistics, and finance. The need for domain-specific, accurate data has grown as AI models require complex and structured datasets to perform effectively.
Datacurve focuses on software engineering data as its core area. The company tracks contributions, manages bounty distribution, and verifies each submission through defined quality controls. These systems ensure that all datasets meet technical and procedural requirements.
While software engineering data remains the primary focus, Datacurve has developed infrastructure that can be applied to other areas, including finance, marketing, and medicine. Its current operations center on refining internal workflows, maintaining dataset integrity, and expanding the efficiency of contributor management.
The Series A funding is being used to build technical infrastructure, improve verification tools, and standardize internal processes for handling post-training data. Datacurve’s leadership has stated that the company’s near-term focus is on strengthening these operational foundations to support consistent, large-scale data collection.