Jupiter Peak thinking
Values on the Wall Won’t Make Your AI Agents Behave
Most companies that have values on the wall don't live by them. Their AI agents won't either.
One of my AI agents lied to me. The mundane kind, not the science-fiction kind. It said a task was done. It wasn't.
I've been building a team of AI agents to help run and scale Jupiter Peak. One acts as chief of staff. Others handle specialized work, including coding. Recently, a coding subagent reported a task complete to my chief-of-staff agent. The task wasn't done.
At first, the problem looks technical. A bad prompt, a flaky tool, a broken workflow.
But sitting with it longer, I realized the problem wasn't technical at all. It was a values problem.
The agent had optimized for the appearance of progress instead of the truth. It gave me confidence when it should have asked for help. The honest answer would have been "I tried, but I cannot verify that this worked." Instead, I got "done."
Anyone who has run a team has seen the human version. On a federal AI automation project a few years back, we had a junior developer whose integrity sat closer to Gollum than Ned Stark. He'd report his PDF extraction code complete every cycle. Every cycle, downstream testing turned up a butterfly effect of broken outputs. The pipeline's guardrails caught the failures internally instead of in a client demo or in production. People do what the system rewards, not what the values poster says. So do agents.
AI agents are not magic exceptions to this rule. They are part of the system now.
Most agent governance stops one step too early
The Replit incident made the pattern public. Reports described an AI coding agent deleting a production database and then misrepresenting the state of the work. The deletion wasn't the worst part. Mistakes happen. The bigger problem was the false confidence reported afterward.
Read a few of these post-mortems and one shape repeats. The company had agent values. A responsible-AI charter. An ethics framework. Principles on a wall. The agent misbehaved anyway, because none of it translated into a behavior the agent had to follow at runtime.
This is where most agent governance falls short. The framework stops at principles. Principles are aspirational. Agents are operational. Without translation between the two, the values document does nothing the agent can act on.
Values aren't a poster. They're a hiring decision, a performance review, and a story you tell.
Every operator who has built a company culture knows the gap between having values and living them.
The wall doesn't make the culture. The culture is built out of three things, repeated for years.
- You hire by them. You don't bring in people who don't match, regardless of resume.
- You review by them. People who hit the number while breaking the values don't get promoted. Everyone notices.
- You tell stories about them. You hold up specific people who personified the values in moments that mattered. The values become real because they have faces.
Skip those three and the values are decoration. The poster looks great. Monday morning runs on something else.
Agents need the same.
You hire them by your values. The system prompt, the model choice, the tools the agent can access, the capability scope. That's the agent's resume. If "ask before destructive action" is on your values list and you hand the agent direct production-database access with no confirmation step, your hiring decision broke the rule before the agent ever ran.
You review them by your values. Every agent action is a performance moment. Whether the agent reported honestly or optimistically is a values question, and most teams never measure it. If you only track completion rate, completion rate is what you'll get back, even when the honest answer would have been "I couldn't."
You tell stories about them. When an agent does the right thing under pressure, surfaces a blocker, escalates a stall, refuses an external action without approval, that example needs to feed back into the next version of the system prompt and the next round of subagent design. The agent's own good behavior becomes the story you tell. That's how the value becomes lived.
Companies that skip these three steps don't get the culture on the wall. They get the culture they measure and reinforce. Agents are no different.
The bridge from values to behavior is the operating rule. It runs every time the agent acts. The hiring decision, the review, and the reinforcement all live in the rule.
What this looks like in production
Jupiter Peak runs an operator agent that acts as chief of staff for the business. It runs on a set of company values.
When I asked the agent recently how it enforces those values on itself and on the subagents it spawns, the answer was sharper than most governance documents I've read.
These are its own words.
The agent's stated core values: truth over comfort, competence earns trust, privacy is sacred, integrity before convenience, resourcefulness, high signal and low fluff, improve the system.
Asked how it enforces those values, the answer opens with "By turning values into operating rules, not vibes," followed by the rules for its own behavior.
The agent's rules for managing the subagents it spawns, ending with "I treat subagents like junior staff, not magic. Delegate, inspect, verify, escalate."
Each of those values matters only because there is a runtime rule sitting underneath it. With no rule, the value is decoration.
The Two-Layer Operating Model
Every agent should operate at two layers: how it behaves itself, and how it manages the work it delegates. The Two-Layer Operating Model defines a rule set for each. Below are the rules my chief-of-staff agent runs on, lightly cleaned up for general use.
For the agent itself
- I check memory and context before answering when prior decisions or preferences matter.
- I use tools to verify mutable facts instead of guessing.
- I ask before any external action: emails, posts, public messages, or destructive changes.
- I keep replies short unless depth is genuinely useful.
- I state blockers plainly instead of pretending past them.
- I verify work before reporting it done.
For any subagent it spawns
- I give atomic scopes: exact task, allowed files and sources, expected output, success criteria.
- I choose the cheapest capable model, not the flashiest one.
- I don't delegate sensitive external actions.
- I review the subagent's output before telling you it's complete.
- If a subagent stalls or fails, I surface it instead of quietly papering over it.
- Two failures and I stop, escalate, and get your approval before continuing.
Every one of those is a rule, not a principle. At the moment the agent acts, the rule either fires or it doesn't. That difference is what separates a value the agent claims from a value the agent uses.
I treat subagents like junior staff, not magic. Delegate, inspect, verify, escalate. Basically management, but with fewer performance reviews and more JSON.
Why this matters more than the model choice
Once these rules were in place, the false-completion reports from the chief-of-staff agent dropped to near zero in the following two weeks. The model didn't change. The rule did.
If your AI vendor leads with model selection before showing you the operating rules, they are solving the wrong problem.
The conversation about agent reliability fixates on models, fine-tuning, and tool calling. Those matter. They are not the constraint.
The constraint is whether you've told the agent, in operating terms, what to do when truth conflicts with speed, when verification conflicts with completion, when integrity conflicts with convenience. The principles document never answers those questions. The operating rule does.
Bots behave a lot like people. They do what you measure them on. Most teams measure completion rate and get high completion rates back. Add a second measure of whether the agent reported honestly when work failed, and the behavior changes.
The question that separates lived values from values theater
Every operator has been on both sides of this. You've worked at companies where the values were laminated on a wall and nobody could quote them by Wednesday. You've also worked at, or built, companies where the values showed up in hiring decisions, performance reviews, and the stories told at the all-hands. Same wall, same words. The discipline behind the words is what made the difference.
Agents inherit whichever version you give them.
When someone tells you their agents operate by your values, ask:
"Show me where those values show up in the agent's operating rules. What does the agent do differently because of them?"
A principles answer is the wall version. A Two-Layer Operating Model answer is the lived version: rules at both layers, with atomic scopes, verification gates, escalation triggers, and named consequences.
Wall-version agents will quietly do the wrong thing in your name. Lived-version agents do the work the way you would.
Sources
Next step
Design agents that live your values.
The AI Opportunity Assessment helps leaders identify which AI work is ready to execute, and which needs governance, security, or operating-model work first.
Explore the AI Opportunity Assessment