P
About This Role
Location: San Francisco preferred, remote considered Compensation: Paid internship About Us Preference Model is building automated ML research engineering. Existing frontier models are brittle when applied to real-world ML tasks. The present bottleneck is the lack of high-quality RL training environments. Our first step is to build RL environments that reflect real-world complexity, with diverse tasks and robust reward functions. Our founding team has previous experience on Anthropic’s data tea…
Posted on May 7, 2026 · Apply by June 6, 2026