Core Concepts¶

Bayesian-Agent treats Skill evolution as Bayesian inference over operational hypotheses. Its core contribution is not another closed agent loop, but a portable evolution layer that can run from scratch, repair existing agents incrementally, and adapt across harnesses.

Inference Environment¶

A base model samples from:

P(X | theta)

An agent system samples from:

P(X | theta, C)

theta is the base model parameter state. C is the inference environment: prompts, tools, memory, retrieved context, Skills, SOPs, benchmark traces, verifier feedback, and runtime constraints.

Bayesian-Agent improves C. It does not train or fine-tune the base model, and it does not require replacing the user's existing agent framework.

Skill as Hypothesis¶

A Skill or SOP is a hypothesis about how to make an agent succeed under a task context:

P(success | theta, C, skill)

The same Skill may work in one context and fail in another. That is why Bayesian-Agent records both outcomes and context distribution.

Three Operating Patterns¶

Bayesian-Agent is meant to be used in three complementary ways:

Pattern	What it does	Why it matters
Full self-evolution	Runs tasks from scratch and updates Skill beliefs online.	Tests whether Skills can emerge without prior traces.
Incremental repair	Reads baseline failures and reruns only failed tasks.	Improves existing agents with small additional inference cost.
Cross-harness adaptation	Uses a common trajectory schema and adapters.	Lets Bayesian Skill evolution move across agent frameworks.

Trajectory Evidence¶

Each agent run should emit verified evidence:

task id
skill id
task context
success or failure outcome
input, output, and total tokens
turns and elapsed time
failure mode
summary and metadata

Evidence should come from a benchmark grader, test suite, deterministic checker, or other action-grounded verifier.

Posterior Belief¶

Each Skill uses a Beta posterior:

success: alpha += 1
failure: beta += 1
posterior_success = alpha / (alpha + beta)

The registry also tracks mean token cost, failure modes, and context counts.

Rewrite Policy¶

The default policy maps posterior state to small, inspectable actions:

Signal	Action
no evidence	`explore`
stable success	`compress`
repeated failure mode	`patch`
mixed contexts	`split`
dominant failures	`retire`

These actions are recommendations. External harnesses decide how to rewrite, rerun, or retire Skills.