Collapses a edaphos_causal_kg to one row per unique
(cause, effect) edge and ranks the result by a user-selected
metric:
Usage
causal_kg_rank_edges(
kg,
by = c("n_sources", "mean_confidence", "agrovoc_support"),
alignment = NULL,
top_n = NULL
)Arguments
- kg
An
edaphos_causal_kg.- by
Character vector of ranking metrics, any subset of
c("n_sources", "mean_confidence", "agrovoc_support"), in priority order.- alignment
Optional data frame as returned by
causal_kg_alignment()(must have columnsoriginal,canonical, plusuriwhen derived fromvocab = "agrovoc"). Used to computeagrovoc_support.- top_n
Optional integer — return at most this many rows.
NULLreturns the full ranking.
Value
A data frame with columns
cause, effect, n_sources, mean_confidence,
max_confidence, sources (collapsed " | "-separated),
evidence (collapsed), and — when available —
agrovoc_support, agrovoc_cause, agrovoc_effect.
Details
"n_sources"Number of distinct sources supporting the edge (counted by splitting the
|-separatedsourcefield on the underlyingigraphedge-attribute). The single most informative signal for an LLM-extracted KG built over thousands of papers: an edge asserted by 50 papers is far more trustworthy than one asserted by 1."mean_confidence"Mean LLM confidence across extractions that produced the same (cause, effect) pair.
"agrovoc_support"Fraction of the pair's endpoints (cause, effect) that resolve to a FAO AGROVOC concept under an
alignmentmapping (0, 0.5 or 1). Requires eitheralignmentto be supplied orkgalready aligned; when neither is available the column isNA.
The return is a tidy data frame sorted in descending order by
all ranking columns requested in by (i.e. the first by
element is the primary sort key, the second breaks ties, etc.).
This is usually more informative than a single-metric ranking —
an edge that has both many sources AND high confidence is more
trustworthy than either alone.
Examples
# \donttest{
kg <- causal_kg_new()
kg <- causal_kg_add_edge(kg, "precipitation", "soc",
source = "Jenny 1941", confidence = 0.9)
kg <- causal_kg_add_edge(kg, "precipitation", "soc",
source = "Minasny 2017", confidence = 0.85)
causal_kg_rank_edges(kg, by = c("n_sources", "mean_confidence"))
#> cause effect n_sources mean_confidence max_confidence
#> 1 precipitation soc 2 0.9 0.9
#> sources evidence
#> 1 Jenny 1941 | Minasny 2017
# }
