Skip to content

Adapter guide

Adapters convert external model/framework APIs into plain callables that work with bayesbench.

OpenAI-compatible adapter

from bayesbench.adapters.openai_compat import openai_model

model = openai_model("gpt-4o")
groq_model = openai_model(
    "llama-3.1-70b-versatile",
    base_url="https://api.groq.com/openai/v1",
)

Use for OpenAI and OpenAI-compatible providers (Groq, Together, Ollama, vLLM, etc.).

Anthropic adapter

from bayesbench.adapters.anthropic_adapter import anthropic_model

model = anthropic_model("claude-opus-4-6")

Use for Claude model families through Anthropic APIs.

Hugging Face adapter

from bayesbench.adapters.huggingface import hf_model, hf_dataset

model = hf_model("meta-llama/Llama-3.1-8B-Instruct")
dataset = hf_dataset("openai/gsm8k", split="test")

Use when your workloads are centered on Hugging Face-hosted models/datasets.

Inspect adapter

from bayesbench.adapters.inspect_ai import from_inspect_dataset, inspect_model

Use for AISI Inspect datasets, model wrappers, and scorers without rewriting task logic.

MTEB adapter

from bayesbench.adapters.mteb import mteb_sts_dataset, st_model, sts_score_fn

Use for embedding benchmarking and STS-style continuous metrics.

OpenClaw adapter

from bayesbench.adapters.openclaw import openclaw_agent

agent = openclaw_agent(my_agent)

Use for agent-vs-agent or agent-vs-LLM comparisons.

Adapter selection checklist

  • Need text-generation A/B across providers → OpenAI-compatible or Anthropic
  • Need existing Inspect pipeline reuse → Inspect
  • Need embedding STS benchmarks → MTEB
  • Need full agent-loop benchmarking → OpenClaw
  • Need HF-centric model/data loading → Hugging Face