Core Concepts¶
Bayesian-Agent treats Skill evolution as Bayesian inference over operational hypotheses. Its core contribution is not another closed agent loop, but a portable evolution layer that can run from scratch, repair existing agents incrementally, and adapt across harnesses.
Inference Environment¶
A base model samples from:
An agent system samples from:
theta is the base model parameter state. C is the inference environment: prompts, tools, memory, retrieved context, Skills, SOPs, benchmark traces, verifier feedback, and runtime constraints.
Bayesian-Agent improves C. It does not train or fine-tune the base model, and it does not require replacing the user's existing agent framework.
Skill as Hypothesis¶
A Skill or SOP is a hypothesis about how to make an agent succeed under a task context:
The same Skill may work in one context and fail in another. That is why Bayesian-Agent records both outcomes and context distribution.
Three Operating Patterns¶
Bayesian-Agent is meant to be used in three complementary ways:
| Pattern | What it does | Why it matters |
|---|---|---|
| Full self-evolution | Runs tasks from scratch and updates Skill beliefs online. | Tests whether Skills can emerge without prior traces. |
| Incremental repair | Reads baseline failures and reruns only failed tasks. | Improves existing agents with small additional inference cost. |
| Cross-harness adaptation | Uses a common trajectory schema and adapters. | Lets Bayesian Skill evolution move across agent frameworks. |
Trajectory Evidence¶
Each agent run should emit verified evidence:
- task id
- skill id
- task context
- success or failure outcome
- input, output, and total tokens
- turns and elapsed time
- failure mode
- summary and metadata
Evidence should come from a benchmark grader, test suite, deterministic checker, or other action-grounded verifier.
Posterior Belief¶
Each Skill uses a Beta posterior:
The registry also tracks mean token cost, failure modes, and context counts.
Rewrite Policy¶
The default policy maps posterior state to small, inspectable actions:
| Signal | Action |
|---|---|
| no evidence | explore |
| stable success | compress |
| repeated failure mode | patch |
| mixed contexts | split |
| dominant failures | retire |
These actions are recommendations. External harnesses decide how to rewrite, rerun, or retire Skills.