If you have ever built an LLM-powered agent, you have probably encountered the silent confidence problem. The model is wrong, but it doesn’t know it’s wrong, so it acts. The user finds out later, often expensively. The pattern is so common that it has become the central reason most "agentic AI" deployments quietly die after the proof-of-concept.
There is a solution. It is older than LLMs, well-documented in the literature, and almost no production AI product actually implements it: have a second model review the first one’s decisions.
What a critic agent does
In CognitoHire’s LangGraph supervisor architecture, every consequential action the agent proposes — move a candidate forward, send an email, score a JD as bias-clean, drop a duplicate — passes through a separate critic LLM before it ships. The critic has access to:
- The original input that triggered the action
- The supervisor’s proposed action and reasoning
- The underlying data (candidate profile, JD, conversation history)
- A different prompt specifically designed to look for failure modes
The critic returns a confidence score (0–100) and structured reasoning. Above 85: the action proceeds. Below 65: it queues to a human, with both the supervisor’s and the critic’s reasoning attached. Between 65 and 85: configurable per customer.
Why this isn’t just "two LLMs"
The naive version of this architecture — the same model second-guessing itself — doesn’t work. Models are biased toward agreeing with themselves. The trick is:
- Use a different model. The supervisor might be a small fast model. The critic should be a different family — usually larger, slower, but reasoning-tuned. The model diversity catches failure modes that homogeneous ensembles miss.
- Use a different prompt. The critic’s prompt should be designed to find problems, not to confirm correctness. "What could go wrong with this decision?" not "Is this decision good?"
- Make the critic skeptical by default. If the critic is just as confident as the supervisor, you haven’t actually added review — you’ve doubled the compute.
The critic LLM is the most important architectural decision in production agentic AI. Most products skip it because it doubles inference cost. Most products are also wrong.
What this looks like to the user
Inside the CognitoChat conversation panel, every supervisor decision is followed by a small critic block: critic: reviewed. fit 88. eligibility ok. no bias flag. confidence 91. action approved. Or, when the critic is uncertain: critic: paused. employment-gap claim unverified. confidence 58. routing to HITL queue.
This isn’t a debugging panel for engineers. It’s the surface the recruiter sees. The transparency is the product. Everything else is just an LLM with a sales pitch.
Cost vs. trust
Yes, having a critic doubles your inference cost per action. No, this is not a meaningful expense compared to the cost of being wrong about a hiring decision. The math is not close.
If your AI vendor doesn’t have a critic agent in their architecture, ask them why. The answer will tell you everything you need to know about whether their platform is ready for production hiring decisions.
Want to discuss this with us, or push back on anything you read?
Talk to a founder