Skip to content

Quick Start

This guide shows the shortest path from installation to a Bayesian Skill registry and its audit/context outputs.

Bayesian-Agent supports three paths:

  • run the first-party native harness and evolve Skills during a full run
  • repair only failed tasks from an existing agent run
  • adapt the same Bayesian Skill registry to another harness through an adapter

Install

git clone https://github.com/DataArcTech/Bayesian-Agent.git
cd Bayesian-Agent
python -m pip install -e .

The core package has no runtime dependencies beyond the Python standard library.

Run The Native Harness

The benchmark runner defaults to the Bayesian-Agent harness. It owns the LLM loop, workspace tools, three-layer memory, and trajectory capture:

export DEEPSEEK_API_KEY="sk-..."
python experiments/run_benchmarks.py \
  --harness bayesian-agent \
  --model deepseek-v4-flash \
  --bench core \
  --mode all \
  --limit 1

Use --bench realfin for RealFin-Bench. External compatibility backends remain available with --harness genericagent, --harness mini-swe-agent, or --harness claude-code.

Update a Skill Registry

Use evolve to ingest one or more result files and update a persistent Bayesian Skill registry:

bayesian-agent evolve \
  --results artifacts/ga_deepseek_baseline/sop_results.json \
  --registry temp/bayesian_skill_beliefs.json \
  --context-out temp/skill_context.md

This command:

  • reads benchmark or agent run traces
  • converts each run into TrajectoryEvidence
  • updates the corresponding Skill posterior
  • optionally renders a posterior audit context for inspection or custom adapter use

Plan Incremental Repair

Use repair-plan to extract failed task ids from a baseline run:

bayesian-agent repair-plan \
  --baseline artifacts/ga_deepseek_baseline/sop_results.json \
  --out temp/failed_tasks.json

This is useful when Bayesian-Agent is attached to another agent as a repair layer.

Summarize Results

bayesian-agent summarize \
  --results artifacts/bayesian_incremental/results.json \
  --out temp/summary.json

For baseline plus repair traces:

bayesian-agent incremental-summary \
  --baseline artifacts/ga_deepseek_baseline/sop_results.json \
  --repairs artifacts/bayesian_incremental/results.json \
  --out temp/incremental_summary.json

Python Example

from bayesian_agent import BayesianSkillRegistry, SkillContextBuilder, TrajectoryEvidence

registry = BayesianSkillRegistry("temp/beliefs.json")
registry.record(
    TrajectoryEvidence(
        task_id="sop_12",
        skill_id="benchmark/sop_bench",
        context="sop_bench",
        outcome="failure",
        failure_mode="xml_wrapped_answer",
        input_tokens=70123,
        output_tokens=4242,
    )
)

print(SkillContextBuilder(registry).render(task_context="sop_bench"))

Expected Output Shape

The generic SkillContextBuilder rendering is intentionally short. Treat it as an audit/debug view of the posterior state unless your adapter explicitly wants this format:

### Bayesian Posterior Audit
Posterior summaries are for ranking, rewrite decisions, and debugging; model-facing prompts should use executable Skill/SOP text.
- benchmark/sop_bench: algorithm=categorical_bayes, posterior_success=0.333, context_success=0.333, alpha=1.0, beta=2.0, observations=1, mean_tokens=74365.0, rewrite=explore, failures=xml_wrapped_answer=1
Current task files and runtime feedback remain authoritative.