Claude Agent OS
A file-based orchestration layer that turns Claude Code into a persistent, context-aware, skill-driven operator. No vector stores. No custom models. No external databases.
The context window is the runtime. The filesystem is the database. Skills are the API layer. The decision log is the audit trail.
Executive Summary
The Claude Agent OS uses no external databases, no vector stores, and no custom model fine-tuning. Persistence, context injection, behavioral configuration, and skill routing are implemented via structured markdown files, a YAML-fronted skill registry, and Claude Code's native MCP integration layer.
The entire OS runs on a local filesystem with git for version control. All intelligence emerges from prompt construction — context files, behavioral rules, and skill instructions combine at session load time to produce a highly specialized agent without any model modifications.
Intentionally low-infrastructure. No ops. No pipelines. No infra bill. The complexity budget is spent on content quality and workflow design, not tooling.
System Components
Architecture
System diagrams and data flows for the Claude Agent OS. All diagrams reflect the production implementation.
System Component Map
How the major subsystems relate at runtime. Four environment groupings: Runtime, Triggers, Skill Execution, External Integrations, and Persistence.
Context Injection Architecture
How state reaches Claude's context window at session start. No RAG. No embeddings. Two-layer guaranteed injection: CLAUDE.md (always loaded) + hooks (deterministic lifecycle commands).
Flat file injection over vector retrieval. For a single-operator system with a bounded context surface, full injection is more reliable than retrieval. Retrieval introduces precision/recall trade-offs; injection is deterministic. The trade-off is context window consumption — mitigated by keeping context files concise and MEMORY.md as a pointer index rather than full content store.
CLAUDE.md injection is reliable but probabilistic — in very long sessions or under compaction, critical context can drift. Hooks are the guarantee layer: shell commands wired to Claude Code lifecycle events in .claude/settings.json. SessionStart fires at every session open and injects your current priorities unconditionally. PreCompact fires before any compaction event and re-injects critical context so it survives the summary. CLAUDE.md is a hint. Hooks are a guarantee.
Lifecycle Automation Hooks
Shell commands wired to Claude Code lifecycle events in .claude/settings.json. Hooks run outside the model's context window — deterministic OS-level automation, not AI instructions.
| Hook event | Fires when | Production use |
|---|---|---|
SessionStart | Session opens | Inject current-priorities.md unconditionally — guaranteed context every session regardless of CLAUDE.md drift |
PreCompact | Before context compaction | Re-inject priorities; prompt operator to run /close-session before the compaction summary is written |
PostToolUse(Skill) | After every Skill tool call | Auto-append timestamped stub to skills/run-log.md — closes the dropped-Learn gap for programmatic skill invocations |
UserPromptSubmit | Every user message sent | Parse for leading /command → append run-log stub for typed slash skills (PostToolUse fires on Skill-tool calls only, not typed /commands) |
SessionEnd | Session closes | If uncommitted or unpushed work exists, auto-commit a WIP snapshot and push — backstop behind /close-session. No-ops on clean state. |
{
"hooks": {
"SessionStart": [
{ "hooks": [{ "type": "command", "command": "python3 .claude/hooks/session-start.py", "timeout": 10 }] }
],
"PreCompact": [
{ "hooks": [{ "type": "command", "command": "python3 .claude/hooks/pre-compact.py", "timeout": 15 }] }
],
"SessionEnd": [
{ "hooks": [{ "type": "command", "command": "bash .claude/hooks/session-end-snapshot.sh", "timeout": 60 }] }
],
"PostToolUse": [
{ "matcher": "Skill", "hooks": [{ "type": "command", "command": "python3 .claude/hooks/log-skill-run.py", "timeout": 10 }] }
],
"UserPromptSubmit": [
{ "hooks": [{ "type": "command", "command": "python3 .claude/hooks/log-slash-skill.py", "timeout": 10 }] }
]
}
}
PostToolUse(Skill) fires when Claude Code dispatches a skill as an internal tool call. UserPromptSubmit fires on every user message — including /commands that are inline-expanded rather than dispatched as tool calls. The two hooks are disjoint: together they guarantee every skill invocation reaches the log regardless of invocation path.
Full Production Session
End-to-end data flow for a content creation session using the /youtube-pipeline skill.
Weekly OS Audit (Scheduled)
Fires every Friday at 4am CT via RemoteTrigger. No human input required.
Components
Detailed specification for each of the seven core components of the Agent OS. Each component is independent and composable.
Role
The entry point for every session. Sets operational scope, loads referenced files via @ syntax, declares the active project inventory, and establishes system rules. Every line in CLAUDE.md is injected into every session — treat it as precious real estate.
Keep CLAUDE.md under 200 lines. Move detailed instructions to skill files or project-specific projects/[name]/CLAUDE.md files. Every line costs context window on every session.
Implementation Pattern
# CLAUDE.md
## Context Files
- @context/me.md
- @context/work.md
- @context/current-priorities.md
## Active Projects
| Project | Description |
|---------|-------------|
| `projects/[name]/` | [Status and description] |
**Hygiene rule:** Projects enter the table when they have a folder + active work.
Projects exit only when moved to archives/ AND logged in decisions/log.md.
The @ Reference Mechanism
The @ reference syntax in Claude Code causes the referenced file's full content to be injected into the system prompt at load time. This is not a link — it's a literal include.
Each context file should be under ~500 lines. Above that, consider splitting or summarizing. The entire referenced file is injected, unconditionally, on every session load.
Active Projects Table as a Service Registry
The Active Projects table functions as a service registry — it tells Claude what workstreams exist, where they live, and what their current status is. Without it, Claude has to infer project existence from file exploration, which is slower and less reliable.
Role
Static facts about the operator, the business, and current priorities. These are the "long-term memory" of the system, version-controlled in git. They load on every session via @ references in CLAUDE.md.
File Inventory
| File | Content | Update Frequency |
|---|---|---|
context/me.md | Identity, working style, constraints, failure modes | Rarely — only when circumstances change |
context/work.md | Business model, products, platforms, revenue | Monthly or when major changes occur |
context/current-priorities.md | Active sprint, ordered tasks, backlog | Every session or weekly |
context/goals.md | Quarterly objectives and milestones | Quarterly |
context/brand-story.md | Core narrative, avatar, positioning | Rarely — locked early |
Status Taxonomy in current-priorities.md
✅ Completed — done, ship it, move on
- [ ] Pending — active, in queue
⏸ Parked — deliberately deferred, not forgotten
Staleness Detection
The os-verify skill includes an age check — it greps Last updated: from each context file, compares to the current date, and flags any file older than 28 days. Implemented as a plain bash command inside the skill, not a scheduled job.
Role
.claude/rules/*.md files encode operator-specific behavioral constraints. These load automatically as part of the Claude Code workspace configuration. Rules are written in the second person, addressed to Claude, and describe both what to do and why.
Including the reason allows Claude to apply the rule to edge cases rather than following it mechanically. Generic rules ("be helpful," "be concise") add no value. Effective rules encode operator-specific failure modes.
Pattern
# execution-mode.md
[Operator's documented failure mode — e.g., "Default pattern:
research > refine > optimize > never ship."]
## Rules of Engagement
1. **Bias toward shipping.** "Good enough" beats "perfect but
unpublished" every time.
2. **One review pass.** When reviewing content — do ONE thorough
pass. Don't suggest a second round unless something is
genuinely broken.
3. **Don't feed the research spiral.** If the operator asks to
"look into" something tangential to top priorities, flag it
and suggest parking it.
...
Design Principle
The more specific the rule, the more leverage it provides. Rules should encode operator-specific failure modes and preferences that deviate from Claude's defaults — not restate behaviors Claude already exhibits.
Role
Skills are reusable, trigger-activated workflow specifications. Each skill is a structured markdown document that instructs Claude how to execute a specific multi-step process. No code. No execution environment beyond Claude Code.
Skill File Schema
---
name: skill-name # Machine-readable identifier
description: > # Used by Claude to determine when to invoke
[Detailed trigger conditions and purpose]
trigger: /skill-name # The /command that activates this skill
status: active # active | deprecated | experimental
---
# Skill Title
## What It Does
[Plain English summary]
## Mode Selection (optional)
[Parse user input to determine execution path]
## Steps
### Phase 1 — [Name]
[Specific instructions — what to read, generate, where to write]
**Gate:** Present output to user. Wait for approval before proceeding.
### Phase 2 — [Name]
...
## Output
[Where output lands and in what format]
Execution Patterns
The skill's main thread collects input and dispatches parallel subagents via the Agent tool for content generation. The main thread aggregates results, presents for approval, then proceeds.
The skill drives a sequence of tool calls — file reads, bash commands, API calls — with no subagent dispatch. Used for mechanical workflows like video processing pipelines.
Spawns 5 parallel subagents — each a distinct thinking style, not a persona. They analyze independently, peer-review each other anonymously, then a chairman synthesizes the verdict. Three behavioral rules prevent the council from becoming an enabler:
- No Rescue — If an idea is off-priority or avoidance-driven, the council shuts it down instead of finding ways to make it work
- Pattern Recognition — Scans memory for prior behavioral patterns (shiny object syndrome, analysis paralysis, scope creep) and alerts all advisors
- Accountability Turn — At least one advisor turns a question back on the user: "Why are you asking this instead of finishing [priority]?"
Calling conventions control scope: Team (all 5 + peer review), Consultants (3 advisors, no peer review), The Guys (2 advisors), Quick Council (all 5, skip peer review), or any individual name. Used for the /llm-council skill.
Lifecycle Management
Active skill: .claude/skills/[name]/SKILL.md # status: active
Deprecated skill: archives/[YYYY-MM-DD]-skill-[name]-deprecated/
Move deprecated skills to archives/ immediately. A deprecated skill left in .claude/skills/ will eventually be invoked accidentally. The os-verify skill checks for skills with status: deprecated still in the active directory.
Two-Tier Persistence Model
- Tracked in git
- Stable, slow-changing facts
- Shared identity — what's true about the business
- Products, strategy, brand, current sprint
- NOT version-controlled (intentional)
- Dynamic, session-accumulated
- Private to the operator
- Platform quirks, anti-patterns, key people, incidents
Memory File Schema
---
name: [memory name]
description: [one-line hook — used to determine relevance at load time]
type: user | feedback | project | reference
---
[Memory content]
**Why:** [The reason this matters]
**How to apply:** [When this kicks in]
MEMORY.md — The Pointer Index
memory/MEMORY.md is loaded into every session context. It contains only one-line pointers to individual memory files — not the memory content itself. This keeps the index small while allowing full memory to be read on demand.
# Memory Index
## Feedback
- [Anti-pattern: never batch publish](feedback_no_batch_publish.md) — violated 2026-04-08, burned
- [Platform: YouTube API scope requirements](feedback_youtube_api.md) — force-ssl required for comments
## Projects
- [Project: Run AI product](project_run_ai.md) — $147, live on Systeme.io 2026-05-04
On macOS, ~/.claude/ can be synced via iCloud Drive (symlink ~/.claude to ~/Library/Mobile Documents/com~apple~CloudDocs/.claude) to make memory available across machines without making it public.
Memory files often contain sensitive operational details — API quirks, day-job context, financial specifics, interpersonal notes. Keeping them in ~/.claude/projects/ ensures they never get pushed to a remote, even accidentally. Trade-off: no memory backup by default. Mitigation: copy to a private encrypted backup on a schedule.
Role
An immutable audit trail of architectural decisions. Every meaningful choice — platform selection, product decisions, process changes, brand decisions — is recorded here. Prevents decision re-litigation.
Format
[YYYY-MM-DD] DECISION: [What was decided]
| REASONING: [Why]
| CONTEXT: [Where it affects the system]
Implementation Constraints
No edits to past entries, ever. History is immutable by convention.
Full history via git log -p decisions/log.md.
May contain financial specifics, product decisions, competitive strategy.
The audit skill greps the last N decisions and cross-checks against context files for contradictions.
Value
In a single-operator setup, the decision log replaces the "why did we do it this way?" conversation that would otherwise happen with a team. When an AI agent questions a past choice, the log provides full reasoning in context.
Role
Time-based skill invocation without requiring an active session. Claude Code's RemoteTrigger fires a skill on a schedule, spins up a Claude Code session, executes the skill, and terminates.
Production Scheduled Triggers
| Trigger | Schedule | Skill | Purpose |
|---|---|---|---|
| Weekly OS audit | Friday 4am CT | /os-verify | Catch context drift, stale skills, deprecated platform references |
os-verify Audit Modules
List all skills in .claude/skills/. Check status: field — flag deprecated skills still in active directory. Check trigger references in CLAUDE.md for orphaned entries.
grep Last updated: from all context/*.md. Flag files >28 days. Read last 10 decisions. Cross-check against context for contradictions. Platform reference scan: grep for retired platforms.
Read MEMORY.md index. Flag memory files referencing archived/retired projects. Flag memories >90 days old with no recent corroboration.
Check projects/ for folders not in CLAUDE.md Active Projects table. Check archives/ for items missing a corresponding decision log entry.
Output: projects/[strategist]/workspace/os-health-[date].md. System score (0-100), flags by category, no-action-needed summary.
API Reference
Complete schema definitions, format specifications, and configuration reference for all Agent OS components.
Skill File Schema
YAML frontmatter required fields. File location: .claude/skills/[name]/SKILL.md
Machine-readable identifier. Used for internal routing and logging. Kebab-case recommended.
Multi-line description of trigger conditions and purpose. Claude reads this to determine when to auto-invoke the skill. The more specific, the more reliable the routing.
The /command that activates this skill. Must start with /. Unique across all skills in the workspace.
Lifecycle state of the skill. One of: active | deprecated | experimental. Deprecated skills must be moved to archives/ immediately — do not leave them in .claude/skills/.
Memory File Schema
YAML frontmatter required fields. Files live in ~/.claude/projects/[path]/memory/
Human-readable name for the memory. Used for display in the MEMORY.md index.
One-line hook used to determine relevance when loading context. This is the key used for retrieval decisions. Make it specific and actionable.
Memory category. One of: user (operator preferences, role, style) | feedback (corrections, validated approaches) | project (active workstream context) | reference (pointers to external systems).
MCP Server Configuration
File: .mcp.json in workspace root. Loaded by Claude Code on session start.
{
"mcpServers": {
"google-calendar": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-google-calendar"],
"env": { "GOOGLE_OAUTH_TOKEN": "..." }
},
"notion": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-notion"],
"env": { "NOTION_TOKEN": "..." }
}
}
}
Never embed live credentials directly in .mcp.json if the file is version-controlled. Use a secrets manager or environment variables for team deployments. For solo use: ensure .mcp.json is in .gitignore.
MCP Selection Hierarchy
Follow in order. Never fall through silently from a higher tier to a lower tier when a tool errors.
| Priority | Tool | Use When |
|---|---|---|
| 1 | Dedicated MCP for the target app | App has its own MCP server — fastest and most precise |
| 2 | Browser automation MCP (Claude in Chrome) | Web app without a dedicated MCP |
| 3 | Desktop Commander | Native desktop apps and cross-app workflows |
| 4 | Computer use | Last resort for anything else |
Decision Log Format
[YYYY-MM-DD] DECISION: [What was decided]
| REASONING: [Why — include the alternatives considered]
| CONTEXT: [Which files or components this affects]
# Examples:
[2026-05-01] DECISION: Use file-based context over vector RAG
| REASONING: Deterministic retrieval, no infrastructure,
debuggable — for bounded single-operator context, full
injection is more reliable than retrieval
| CONTEXT: context/*.md, CLAUDE.md, Memory architecture
Credential Storage Pattern
~/.config/fiti-youtube/
├── client_secrets.json # OAuth app credentials — NOT in repo
└── token.json # Cached access + refresh token (auto-refreshes)
# Git credential helper (never embed tokens in remote URLs)
git config --local credential.helper osxkeychain
git remote set-url origin https://github.com/[user]/[repo].git
.gitignore Minimums
*.env
client_secrets.json
token.json
.mcp.json # if it contains inline credentials
Guides
Runbooks and how-to guides for common operations. Each guide is a self-contained procedure.
Session Startup Sequence
Injects all @context files — identity, work, priorities, goals.
Cross-session context and learnings become available.
For skill invocations, the matching SKILL.md is loaded into context.
The skill's instructions guide execution from this point.
Adding a New Skill
Create .claude/skills/[name]/SKILL.md with the full schema. Set status: active in frontmatter.
Add the skill to the Active Skills table with its trigger and purpose.
Test with a dry run before relying on the skill in production. Verify outputs land in the correct locations.
Append to decisions/log.md — what the skill does and why it was added.
Skills should emerge from repeated workflows. The threshold: if you've done it manually three times and found yourself wishing it was automated, build the skill. A skill built for a hypothetical workflow becomes maintenance burden.
Adding a New MCP Server
Follow the MCP server configuration format in the API Reference. Store credentials outside the file or use env vars.
MCP connections are loaded at session start. A restart is required to reload .mcp.json.
Verify with a simple call to one of the server's tools before building workflows that depend on it.
Add to the Connected Tools section if the MCP materially changes Claude's behavior or capabilities.
Project Lifecycle
Adding a Project
Create projects/[name]/ with a README.md describing the project.
Add to CLAUDE.md Active Projects table.
Append entry to decisions/log.md.
Retiring a Project
Move projects/[name]/ to archives/[YYYY-MM-DD]-[name]/. Never delete.
Remove from CLAUDE.md Active Projects table.
Append retirement decision to decisions/log.md.
Update any memory files that reference the project.
Recovering from Stale Context
Or equivalent audit skill. This generates a health report identifying all flags.
Treat the health report as a production incident list — each flag is a potential source of bad outputs.
Update each context file identified as stale. Update the Last updated: timestamp.
Read recent decisions/log.md entries. Verify each is reflected in the relevant context files.
Append a recovery entry to decisions/log.md noting what was stale and what was fixed.
Scaling to a Team
The single-operator model scales to small teams with these modifications:
| Component | Single Operator | Team |
|---|---|---|
| Memory | ~/.claude/projects/[path]/memory/ per machine | Separate memory paths per team member. No sharing. |
| Decision log | One file, one contributor | Same pattern — git blame provides attribution |
| Context files | Single identity layer | Shared identity (work.md, team.md) + per-member context |
| Skills | Shared skill library | Shared .claude/skills/ in repo; member-specific skills in personal forks |
| CLAUDE.md | Single master file | Core CLAUDE.md + project-specific CLAUDE.md in projects/[name]/ |
Replacing File Storage with a Knowledge Base
For teams needing search across a larger document corpus:
me.md, work.md — these are small and always needed.
Notion, Confluence, or a vector DB for large document sets.
The MCP server provides the query interface.
Replace @context/large-doc.md with an instruction to query the MCP on demand.
Loses deterministic loading; gains scalability for large document sets. Introduces precision/recall trade-offs that the file-based system avoids entirely.
FAQ & Trade-offs
Design decisions, anti-patterns, security considerations, and observability model for the Agent OS.
Design Decisions
| Decision | Alternative Considered | Rationale |
|---|---|---|
| File-based context over vector RAG | Pinecone/Weaviate + embeddings | Deterministic retrieval, no infrastructure, debuggable — for bounded single-operator context, full injection is more reliable than retrieval |
| Markdown over structured config | JSON skill definitions | Claude reads and generates markdown natively; markdown skills are self-documenting and editable without tooling |
| Append-only decision log | Database with CRUD | Immutability is a feature — decisions are permanent records; git history provides full diff |
| Memory outside version control | Memory in repo | Prevents accidental exposure of sensitive operational context to public remotes |
| Monorepo for all projects | Separate repos per project | Single context load covers entire operation; Claude doesn't need to switch repos for cross-project dependencies |
| Skills as text documents | Code-defined workflows (LangGraph, etc.) | No framework dependency; skills are readable, editable, and debuggable by anyone who can read markdown |
| Phase-gate approval pattern | Fully autonomous execution | For content + business decisions, human approval at phase boundaries prevents compounding errors |
| Hook-driven automation over manual logging | Relying on /close-session to capture run-log | PostToolUse + UserPromptSubmit hooks fire unconditionally — operator cannot forget to log. SessionEnd snapshot is a backstop for dropped close-session calls. Automation beats discipline for high-frequency operations. |
Anti-Patterns
Context files are for slow-changing facts. If something changes every session, it should be in current-priorities.md at most — and if it changes multiple times per session, it should be written by Claude to a project-specific file during the skill, not loaded at session start.
Skills should emerge from repeated workflows. A skill built for a hypothetical workflow becomes maintenance burden. The threshold: if you've done it manually three times and found yourself wishing it was automated, build the skill.
Every line in CLAUDE.md is injected into every session. Keep it under 200 lines. Move detailed instructions to skill files or project-specific CLAUDE.md files in projects/[name]/CLAUDE.md.
The audit exists to catch drift before it causes bad outputs. A skill flagged for stale references, or a context file contradicting a logged decision, will produce incorrect behavior until fixed. Treat the health report as a production incident list.
Memory files encode individual behavioral patterns — including failure modes, preferences, and sensitive operational context. They are deliberately not version-controlled. Sharing them across operators produces a contaminated context that reflects no one accurately.
.mcp.json, .env, client_secrets.json — if it contains a token, it does not go in the repo. The operational cost of a credential rotation after an accidental commit is always higher than the inconvenience of using a credential helper from the start.
Security Considerations
API tokens belong in ~/.config/[service]/ — never in the repo. Git credentials go in osxkeychain credential helper — never embedded in remote URLs (the common mistake: https://user:token@github.com/... stored in .git/config). MCP env vars: use a secrets manager for team deployments.
decisions/log.md is version-controlled but should not be pushed to public remotes — it may contain financial specifics, product decisions, and competitive strategy. Add to .gitignore if using a public repo; use a private repo if tracking is needed.
~/.claude/projects/[path]/memory/ is not version-controlled by design. For team deployments: each operator must have a separate memory path. Never share memory files across operators.
Observability
No metrics dashboard by default. Observability is achieved through structured file outputs and the weekly audit.
| Signal | Source | How to Read It |
|---|---|---|
| System health | os-verify report | projects/[strategist]/workspace/os-health-[date].md — generated weekly |
| Decision history | decisions/log.md | git log -p decisions/log.md for full diff history |
| Skill usage | skills/run-log.md (auto) | Appended automatically by PostToolUse(Skill) + UserPromptSubmit hooks on every invocation |
| WIP backup | SessionEnd hook | Auto-commits + pushes uncommitted work on session close; inspectable via git log --oneline |
| Memory drift | os-verify Module 3 | Flags stale or contradicted memory entries |
| Context staleness | os-verify Module 2 | Flags context files >28 days since last update |
Add a structured output step to os-verify that writes a JSON summary alongside the markdown report — parseable by a dashboard or alerting system without modifying the core OS.
Common Questions
~/.claude/projects/ rather than the repo ensures they never get pushed to a remote, even accidentally. This is a deliberate design decision with one trade-off: no automatic backup. Mitigation: copy the memory directory to a private encrypted backup on a schedule.context/*.md) contain slow-changing facts about the business — products, strategy, current sprint, brand positioning. They're version-controlled in git, shared, and appropriate for information that defines the operation. Memory files (~/.claude/projects/.../memory/) contain dynamic, session-accumulated learnings — behavioral patterns, platform quirks, anti-patterns discovered through experience, notes about key people. They're private, not version-controlled, and appropriate for information that reflects individual operational history. If it changes quarterly, it's context. If it changes through experience, it's memory.