Skip to content
Go back

Your Agent's Behavior Is Code — Start Versioning It

The teams that figure this out early get a compounding advantage: skill quality improves through PRs the same way libraries do.

I spent a few hours this week updating skill files in a git repo. Not code files. Not config files. Skill files — markdown documents that tell an AI agent how to behave when working in that specific project.

When I pushed the commit, something clicked: I was doing something we’ve been doing with code for 50 years, but we haven’t fully named it yet for AI systems. Institutional knowledge about how to do work — usually locked in people’s heads, or in wikis nobody reads — is now versionable, diffable, and executable.

That changes things.


What Repo-Specific Agent Skills Are

When you work with an AI coding agent like Claude Code, it can load context from the project it’s working in. Most people know about CLAUDE.md — a file that tells the agent about project conventions, architecture, what to avoid. Think of it as the README the agent actually reads.

What’s less understood is that you can also commit skills — structured instructions for specific workflows — directly into the repo under .claude/skills/. When anyone opens that project, the agent loads those skills automatically. The behavior travels with the codebase. As of December 2025, Agent Skills became an open standard adopted across Claude Code, GitHub Copilot, Cursor, and other major platforms — skills created in one tool are recognized by others without duplication.

A skill isn’t a prompt. It’s closer to a process document — except it’s written to be consumed by an AI that will actually execute it, not a human who will skim it.

For example, a repo that publishes content might have a skill called content-audit — instructions for checking whether any page contains data that shouldn’t be published: internal metrics, unconfirmed claims, placeholders. Every time someone runs that check, the agent follows the same process. The process is in the repo. The process has a commit history.


The Problem This Solves

Most teams using AI agents today have an invisible problem: the agent’s behavior is inconsistent across people.

One developer uses the agent one way. Another uses it completely differently. When the new person joins, they figure out their own patterns from scratch. The institutional knowledge about how to use the agent well on this project lives in individual heads — not in the project itself.

This is the same problem documentation was supposed to solve, with the same failure mode: nobody writes it down, and when they do, nobody reads it.

Skills committed to the repo solve this differently. You don’t read a skill to learn how to work — you invoke it, and the agent does the work according to the team’s established process. The knowledge is executed, not consumed.


What Changes When Skills Are Version-Controlled

The process becomes auditable. If a skill is producing bad output, you can read it. If someone changes how it works, there’s a diff. You can review a skill change the same way you review a code change — which is appropriate, because it has the same kind of impact.

The agent’s behavior becomes part of the codebase contract. When someone takes a dependency on a process — “our pre-publish check runs the content-audit skill before any page merges” — that process is now as stable or unstable as any other part of the codebase. It can break. It can be fixed. It can be improved.

Skill quality compounds. The first version of a skill is usually rough. When it’s in a repo and people use it, they notice what’s missing. They file issues. They send PRs. The skill gets better over time, the same way a good library gets better. Compare this to the individual-prompt approach, where each person’s workflow is a series of freehand instructions they improvise every time.

Onboarding gets faster. A new contributor doesn’t need to figure out the “right way” to do things from scratch. The skills in the repo embody the team’s current best understanding of how to do the work. They’re immediately available on first checkout.


What This Actually Looks Like

The directory structure is simple. In your repo root:

.claude/
  skills/
    content-audit/
      SKILL.md
    review-before-publish/
      SKILL.md
    data-quality-check/
      SKILL.md

Each skill is a markdown file. It describes what the skill does, when to use it, what checks to run, and how to report findings. The agent reads it when invoked.

A content-audit skill for a publishing repo might look like this:

---
name: content-audit
description: Use before publishing any page — checks for internal metrics,
  unconfirmed claims, placeholders, and PII that shouldn't be public.
---

# Content Audit

Run before any merge to the published branch.

## What to Check

- [ ] No internal metric values (revenue, headcount, quota)
- [ ] No placeholder text (TBD, TODO, FIXME, [INSERT])
- [ ] No customer names without a public case study
- [ ] No unconfirmed claims ("will ship", "planning to")
- [ ] No internal URLs or tool names without explanation

## How to Report

Flag each finding as: file path → line number → what was found → suggested fix.
Do not auto-fix — present findings and wait for author confirmation.

The process is in the repo. The process has a commit history. When someone finds a new failure mode, they file a PR against the skill — and everyone who uses it inherits the fix automatically.

The gitignore convention that works: ignore most of .claude/ (personal settings, session data, memory) but explicitly un-ignore skills/:

.claude/
!.claude/skills/

This keeps personal configuration out of the repo while ensuring the team’s shared process artifacts are committed. What you commit is the shared process. What you don’t commit is your personal preferences.


The Mental Model Shift

Most people think of AI agent customization as configuration — you tweak settings, write prompts, adjust parameters. That mental model positions you as the user of a product.

The more useful mental model: agent behavior in your project is code. It has the same properties as code — it can be right or wrong, clear or ambiguous, well-tested or brittle. It should be reviewed, versioned, and maintained with the same discipline.

Skills committed to a repo are the first concrete instantiation of this. They’re not prompts you type. They’re process artifacts that live in the project, travel with it, evolve with it, and can be held to the same standards as everything else in the codebase.

The teams that figure this out early will have a meaningful advantage — not because the AI is smarter, but because their process for using the AI is institutionalized rather than improvised. That’s not a new idea. It’s what separates good engineering organizations from ad hoc ones. It now applies to a new class of artifact.


What’s still unresolved: there’s no standard for skill quality yet. No linting, no testing framework, no shared conventions across projects. Research on “AI Skills as the Institutional Knowledge Primitive” (arXiv, Mar 2026) points toward composable, graph-traversable skill units — but how you know if a skill is doing its job well is still an open question. The skills I write for one repo have no relationship to the skills someone else writes for theirs. That’s probably the next thing to figure out.


Share this post on:


Previous Post
From Failure Mode to Skill Chain
Next Post
The Skill Is the Teacher