Protege Logo

Protege

Senior Machine Learning Researcher / Principal Scientist

Posted 12 Days Ago
Remote
Hiring Remotely in USA
Senior level
Remote
Hiring Remotely in USA
Senior level
Lead the evaluation and optimization of large-scale datasets for AI training, ensuring data quality and collaborating with research teams.
The summary above was generated by AI

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI's data problem is a generational opportunity. The company that succeeds will be one of the largest in AI — and in tech.

Role Overview

Data is the foundation of AI performance, and we believe model quality starts with data quality. You’ll be at the heart of shaping how we curate, assess, and prepare the training data that powers real-world AI systems.

We’re seeking a Senior Member of the Core Data Team/ Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models. In this role, you’ll help define what "high-quality data" means in practice, using statistical, computational, and ML-driven methods to ensure our data is diverse, representative, and high-impact. You’ll work closely with research and engineering teams to improve model performance through better data. This is an ideal role for someone with a PhD in machine learning, CS, or a related applied field who is passionate about the role of data in AI training and excited to advance Protege’s mission to become the ubiquitous platform for AI training data.

Key Responsibilities

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets

  • Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets.

  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups.

  • Provide leadership on data quality strategy and shape internal best practices

  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards.

  • Contribute to research and development of tools that automate data preprocessing and validation

About You

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field

  • Strong understanding of AI model training pipelines, including pre-processing and evaluation

  • Experience working with large, unstructured datasets, especially text

  • Background in statistical analysis, bias detection, and data validation

  • Able to identify high-impact problems and drive independent solutions

Bonus if you have these attributes

  • Experience with synthetic data generation or augmentation strategies

  • Publications or open-source contributions in data-centric AI or related areas

  • Experience developing evaluation frameworks or performance metrics for training data

  • Cross-functional collaboration with product, infrastructure, or partnership teams

Top Skills

Data Validation
Machine Learning
Statistical Analysis

Similar Jobs

38 Minutes Ago
In-Office or Remote
5 Locations
143K-191K
Mid level
143K-191K
Mid level
Cloud • Information Technology • Machine Learning
The Quality Manager at CoreWeave will develop and implement quality processes, conduct audits, analyze data for improvements, and manage a team to ensure high operational standards in data center reliability and efficiency.
Top Skills: Erp SystemsMicrosoft Office SuiteQms Software
38 Minutes Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
99K-167K Annually
Mid level
99K-167K Annually
Mid level
Artificial Intelligence • Cloud • Computer Vision • Hardware • Internet of Things • Software
The Product Manager for Automations Platform will define and deliver features for automation tools, collaborating with cross-functional teams to enhance user experiences and incorporate customer feedback.
Top Skills: APIsAutomation ToolsTechnical Systems Integration
39 Minutes Ago
Easy Apply
Remote or Hybrid
United States
Easy Apply
Internship
Internship
eCommerce • Food • HR Tech • Information Technology • Mobile • Retail • Software
Assist in product operations by supporting rollouts, analyzing performance, championing user experiences, and streamlining team processes.
Top Skills: ExcelGoogle SheetsPythonSQL

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account