Gantry is building product testing and analytics for LLM-powered applications. We’re developing the most reliable, trustworthy way to evaluate LLM apps, and workflows to integrate those evaluations into the product development process. You can think of it like unit testing + Mixpanel for AI app builders. Gantry was founded by Josh Tobin, former OpenAI researcher and co-founder of The Full Stack, and Vicki Cheung, former founding engineer at OpenAI and Compute team lead at Lyft. At Gantry, we believe AI practitioners should have access to tools that feel as magical as the ones they develop for their end-users. As an Applied AI Researcher, you will be responsible for identifying opportunities for AI-powered product experiences with product teams, as well as prototyping and working with the engineering team to implement the algorithms behind those tools. Some examples of the kinds of problems you can expect to work on include: * Automated evaluation of large language models * Model-augmented error analysis/failure mode detection * Fine-tuning large language models for specific use cases This is an interdisciplinary role that will involve skill at developing statistical algorithms, training models, and lightweight prototyping and implementation.