Pairwise Cohen's kappa between backends on edge presence — llm_benchmark_kappa • edaphos

3.10.0

Reference
- Capstone — six pillars integrated
- Cross-pillar campaign
- Pilar 1 — Causal AI
- Causal backdoor on br_cerrado
- Causal on 1 095 real WoSIS profiles
- Causal-discovery trio (expert × LLM × data-driven)
- LLM-KG benchmark (Gemma 4 × GPT × Claude)
- Annotation workflow (pre-annotation + Shiny review)
- Pilar 2 — Physics-Informed ML
- Parametric ODE + Neural ODE
- Pilar 3 — 4D Pedometry
- ConvLSTM on synthetic cube
- Real MODIS + NASA POWER + EnKF
- Pilar 4 — Foundation Models
- SimCLR / MoCo embeddings
- Pilar 5 — Active Learning
- Hybrid policy + physics gate
- SoilGrids-BR live
- Pilar 6 — Quantum ML
- ZZFeatureMap + Q-KRR
- Cross-pillar bridges
- Pilar 1 × Pilar 4 — Foundation IVs
- Pilar 4 × Pilar 6 — Quantum-foundation kernel
- Unified infrastructure
- Unified uncertainty API
- Case — end-to-end Cerrado benchmark
- Releases
- v2.0.0 — Quantum kernel × Foundation (Pilar 4 × Pilar 6)
- v1.9.x — Foundation embeddings as causal IVs + Sargan/Cinelli-Hazlett
- v1.8.x — LLM benchmark infrastructure + Shiny annotation tool
- v1.7.x — Capstone vignette + native-query calibration + causal discovery
- Changelog

Computes agreement (Cohen's kappa) for every pair of backends on the union of edges seen across them (and optionally gold) per abstract. For each backend pair (A, B), treats every (abstract, edge) as a binary rating: "did backend X extract this edge?".

Usage

llm_benchmark_kappa(claims_by_backend, gold = NULL)

Arguments

claims_by_backend: Named list of data frames, each with abstract_id, cause, effect.
gold: Optional gold-standard frame to include as a rater.

Value

Numeric matrix of kappa values (symmetric, diagonal = 1).