
Match extracted LLM claims against a gold-standard set
Source:R/llm_benchmark.R
llm_benchmark_match.RdTakes a data frame of extracted claims (from any backend) and a data frame of gold-standard claims, and returns a per-claim match table with TP / FP / FN labels.
Arguments
- predicted
Data frame with columns
abstract_id,cause,effect(and optionalconfidence). Every edge found by the backend for a given abstract.- gold
Data frame with columns
abstract_id,cause,effect(and optionalpolarity). One row per annotated claim.- fuzzy
Logical; if
TRUE(default), also count a predicted edge as TP when one of its endpoints matches by Levenshtein distance withinfuzzy_threshold. IfFALSE, require exact match on canonicalised labels.- fuzzy_threshold
Integer; maximum edit distance for a fuzzy endpoint match. Default
2.