Bayesian-Agent: A Bayesian Self-Evolving Agent Framework with Cross-Harness Adaptation¶
📚 Docs | 🐙 GitHub | 📄 arXiv:2606.08348
Bayesian-Agent is a Bayesian self-evolving layer for turning verified agent trajectories into reusable, evidence-weighted Skills and SOPs across agent frameworks and execution harnesses.
The project focuses on the inference side of agent improvement. Instead of changing base model parameters, it changes the agent's inference environment by maintaining posterior beliefs over Skills, failure modes, token cost, and context-specific reliability.
In v0.5, the default Bayesian core is a per-Skill Bayesian Evidence Model over verified success/failure evidence and context/runtime features. The current implementation uses a categorical likelihood backend exposed as categorical_bayes; the old naive_bayes name remains a compatibility alias. The earlier Beta-Bernoulli update remains available as an optional ablation backend. Fuller Bayesian model selection and uncertainty-aware Skill selection are planned in the roadmap.
Agent runs are expensive: tokens are expensive, latency is high, benchmark cases are limited, and real production failures are even rarer. When samples are scarce, each sample is costly, and we cannot wait for large-sample statistics to stabilize, Bayesian modeling lets Bayesian-Agent combine prior belief, uncertainty, and new verified evidence into more stable Skill rewrite decisions.
Bayesian-Agent is designed to avoid being just another monolithic agent framework:
- Full-run evolution from scratch: run tasks without prior traces and evolve Skills online.
- Incremental repair: attach to an existing agent, learn from failed trajectories, and rerun only failed tasks.
- Native-first, cross-harness adaptation: run with the lightweight BA native harness, or integrate with GenericAgent and other agent frameworks through adapters.
News¶
- 2026-06-09: The Bayesian-Agent paper is now available on arXiv: arXiv:2606.08348.
Verified trajectories from compatible harnesses become evidence-ranked Skills and executable SOP patches.
Why It Exists¶
LLM agent engineering has moved through three layers:
- Prompt Engineering: make task instructions clearer.
- Context Engineering: decide what evidence the model can see.
- Harness Engineering: put the model inside a runnable, observable, recoverable system.
Skills and SOPs are the durable memory of a harness. Bayesian-Agent makes their evolution evidence-driven and portable:
What v0.5 Includes¶
- Bayesian Skill registry with Bayesian Evidence Model updates and optional Beta-Bernoulli updates.
- Evidence schema for agent trajectories.
- Posterior-weighted Skill context rendering.
- Failure-mode-aware repair planning.
- First-party native harness with a minimal LLM loop, workspace tools, three-layer memory, and trajectory capture.
- CLI utilities for trace ingestion, summarization, and incremental repair.
- GenericAgent, mini-swe-agent, and Claude Code integration boundaries without copying or vendoring those runtimes.
- Three operating patterns: full self-evolution, incremental repair, and cross-harness adaptation.
- SOP-Bench, Lifelong AgentBench, and RealFin result artifacts.
Install¶
git clone https://github.com/DataArcTech/Bayesian-Agent.git
cd Bayesian-Agent
python -m pip install -e .
For documentation development:
Next Steps¶
- Start with the Quick Start.
- Read the Core Concepts if you want the Bayesian framing.
- Read Why Bayesian for Skill Evolution for the small-sample, cost-sensitive motivation.
- Read Native Harness for the first-party harness design.
- Use the CLI to update a registry from traces.
- Read Adapters to understand how Bayesian-Agent moves across harnesses.
- Inspect Experiments for native-harness full-sample results and GA-backed validation.