Patronus AI’s cover photo
Patronus AI

Patronus AI

Technology, Information and Internet

San Francisco, California 7,449 followers

Powerful AI Evaluation and Security

About us

Patronus AI is the leading AI evaluation and optimization company. Our research-backed product enables AI engineers to optimize their agents, access powerful evaluation models, and automatically detect LLM system performance issues across 50+ modes. Leading technology companies and enterprises like AngelList, Etsy, and Pearson use Patronus AI to ship top-tier AI products. Founded by machine learning experts from Meta, Patronus AI is on a mission to accelerate the world's adoption of generative AI. We are backed by Notable Capital, Lightspeed Venture Partners, Stanford University, Datadog, Gokul Rajaram, and leading software and AI executives.

Website
https://patronus.ai
Industry
Technology, Information and Internet
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2023

Locations

Employees at Patronus AI

Updates

  • We’re excited to support Meta and Hugging Face's OpenEnv launch today! OpenEnv provides an open-source framework for building and interacting with agentic execution environments. This allows researchers and developers to create isolated, secure, deployable, and usable environments. Lately, at Patronus, we’ve been working on RL environments for coding agents, and we were excited to contribute to OpenEnv with real-world-inspired tools and tasks to train and steer AGI. We began with a Gitea-based git server environment. Git server environments are foundational and enable effective collaboration and version control for software workflows, and we thought it would be a perfect way to get started with OpenEnv. With our git server environment, we support: * Fast iteration across runs with sub-second resets for RL training loops * Shared server + isolated workspaces * Environment variables + setting custom configs for Gitea We look forward to seeing what everyone builds with OpenEnv! GitHub: https://lnkd.in/gydP9Gaw HuggingFace: https://lnkd.in/gq59MZTe

    • No alternative text description for this image
  • Our CTO, Rebecca Qian, spoke at the PyTorch Measuring Intelligence Summit 2025 yesterday! She was on the Beyond the Leaderboard: Practical Intelligence in the Wild panel with Jeremy Howard (fast.ai), and Haifeng Xu (University of Chicago/ProphetArena), moderated by Shishir Patil (Meta). The group discussed the limitations of public benchmarks and explored how real-world tasks, such as code generation, enterprise analytics, and scientific discovery can guide evaluation priorities and methodology. We’re excited to continue pushing the boundaries in this space with novel agent evaluation benchmarks and the development of dynamic, feedback-driven training environments. Thank you to Joseph Spisak for organizing the conference and the other speakers at the intelligence summit. We enjoyed hearing from Vivienne Zhang, Noam Brown, Aakanksha Chowdhery, Ph.D., Yifan Mai, and Anastasios Angelopoulos!

    • No alternative text description for this image
    • No alternative text description for this image
  • At Patronus AI, we're excited to publish a new article with tutorials and examples for AI Guardrails. 🚀 In this article, you will learn about the importance of AI guardrails in ensuring the reliable and ethical use of large language models in various industries, and the different components, strategies, and tools involved in their development and deployment. Read the article at: https://lnkd.in/eKUC_a_V All based on the latest AI research produced by the Patronus AI Team and the broader research community. #AI #NLP #LLM

  • Introducing MEMTRACK, a new benchmark designed to evaluate long-term memory and state tracking in multi-platform agent environments. 🎉 Human memory enables us to achieve complex objectives by taking in, storing, and applying saved information. We wanted to evaluate how LLMs would perform when given access to memory tools. We found that although LLMs are effective in general tool calling, they struggle to properly use memory tools leading to continued underperformance with long-context reasoning and follow-ups. This makes agent memory an exciting space to unlock performance gains. The team is looking forward to presenting this paper as part of the NeurIPS SEA workshop in December! arXiv Paper: https://lnkd.in/gbvFMXa4 Blog: https://lnkd.in/gQXmNACS

    • No alternative text description for this image
  • At Patronus AI, we're excited to publish a new article with tutorials and examples for AI Agent Tools. 🚀 In this article, you will learn about AI agent tools that allow AI models to interact with external systems and enhance their capabilities through real-time data access to third-party systems for taking automated actions. You will learn state-of-the-art best practices for invoking tools in agentic workflows, designing guardrails, applying reinforcement learning, and evaluating the functionality and effectiveness of AI agents powered by tools. Read the article at: https://lnkd.in/g8j5Drzm All based on the latest AI research produced by the Patronus AI Team and the broader research community. #AI #NLP #LLM

  • At Patronus AI, we're excited to publish a new article on the best practices for Advanced Prompt Engineering. 🚀 In this article, you will learn about advanced prompt engineering techniques that can maximize the potential of large language models, including self-ask decomposition, step-back prompting, contextual priming, and more. Read the article at: https://lnkd.in/guhqp7_g All based on the latest AI research produced by the Patronus AI Team and the broader research community. #AI #NLP #LLM

  • At Patronus AI, we're excited to publish a new article on the best practices for LLM Observability. 🚀 In this article, you will learn how LLM observability empowers engineering teams by capturing and analyzing various aspects of LLM-based applications—like prompts, responses, latency, costs, hallucinations, and chain trace data—to optimize performance, accuracy, and reliability. This article also covers the tools and best practices for adopting LLM observability in your AI application environment. All based on the latest AI research produced by the Patronus AI Team and the broader research community. Read the article at: https://lnkd.in/gPFD3JUN #AI #NLP #LLM

  • Welcome Joshua Weimer to the team! 🎉 Josh joins Patronus AI as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the Interior. “Generative AI is still in its early stages of adoption, and as use becomes more widespread, the need for task- and user-specific alignment will only grow more important. Patronus offers a rare chance to be at the forefront of this space, building solutions that keep our users ahead of the curve.” Read more about Josh’s journey to Patronus AI: https://lnkd.in/gSPjjyCM

    • No alternative text description for this image
  • View organization page for Patronus AI

    7,449 followers

    Evaluators are at the heart of the Patronus AI platform, and customers across industries have found them helpful in evaluating context and answer relevance, detecting hallucinations, and analyzing multimodal content! If you’re new to evaluators, this blog post will give a quick overview of what our platform offers. If you’ve already been using evaluators, this blog post might help you find new additions for your evaluation workflows. :) Read the post here: https://lnkd.in/ge9FTiPE

    • No alternative text description for this image

Similar pages

Browse jobs

Funding