KSSL + NASIS: multi-level USDA Soil Taxonomy benchmark • soilKey

This vignette describes the multi-level USDA Soil Taxonomy 13ed benchmark that ships with soilKey (versions 0.9.24 and later). It is the package’s headline real-data validation: the first public USDA Soil Taxonomy implementation that resolves classifier accuracy at every level of the hierarchy (Order / Suborder / Great Group / Subgroup) on real, published lab data.

1. The KSSL + NASIS join (load_kssl_pedons_with_nasis)

The Kellogg Soil Survey Laboratory (NCSS lab data, USDA-NRCS) ships ~36 000 pedons with full analytical chemistry (clay, sand, silt, OC, pH, CEC, BS, …) but minimal field morphology. The NASIS Morphological database (a separate sqlite distribution) carries the matching field-survey morphology: Munsell colors, structure grade / size / type, clay films (phpvsf), slickensides, and the pediagfeatures diagnostic-features table (~13 500 entries identifying argillic, mollic, cambic, spodic, fragipan, etc.).

load_kssl_pedons_with_nasis() joins the two by peiid and depth-overlap matching, returning one PedonRecord per profile with both lab and morphology populated:

peds <- load_kssl_pedons_with_nasis(
  gpkg   = "<path>/ncss_labdata.gpkg",
  sqlite = "<path>/NASIS_Morphological_09142021.sqlite",
  head   = 3000)

length(peds)
#> [1] 2964   # after the 3000-row gpkg slice and the join

peds[[1]]$horizons[, c("designation", "clay_pct", "munsell_chroma_moist",
                          "clay_films_amount")]
#>    designation clay_pct munsell_chroma_moist clay_films_amount
#> 1: Ap          22.5     3                    NA
#> 2: Bt1         34.7     4                    common
#> 3: Bt2         38.3     4                    many

NASIS-derived attributes coverage on the 2021 snapshot (n=865 quality-filtered):

Slot	Coverage
`nasis_diagnostic_features` (any)	~52 %
`pediagfeatures` argillic flag	~11 %
Per-horizon `clay_films_amount`	~28 %
Per-horizon `munsell_chroma_moist`	~99 %
Per-horizon `structure_grade`	~93 %

The high Munsell + structure coverage powers the pediagfeatures tie-breaker (v0.9.21) – when a canonical lab + morphology gate returns passed = NA, the surveyor’s diagnostic flag flips it to TRUE. The argillic / fragipan / clay-films paths (v0.9.27, v0.9.28, v0.9.31) leverage the same NASIS slot for direct positive evidence.

2. The four levels (benchmark_run_classification)

Once you have a list of PedonRecords with reference_usda, reference_usda_subgroup, reference_usda_grtgroup, and reference_usda_suborder populated (the gpkg loader handles all four), benchmark_run_classification() measures top-1 accuracy at any of the four hierarchy levels:

# Quality filter: profiles with usable clay data and a non-empty
# subgroup reference label.
keep <- vapply(peds, function(p) {
  hz <- p$horizons
  if (is.null(hz) || nrow(hz) == 0) return(FALSE)
  if (!any(!is.na(hz$clay_pct))) return(FALSE)
  !is.null(p$site$reference_usda_subgroup) &&
    !is.na(p$site$reference_usda_subgroup) &&
    nzchar(p$site$reference_usda_subgroup)
}, logical(1))
peds <- peds[keep]

for (lvl in c("order", "suborder", "great_group", "subgroup")) {
  res <- benchmark_run_classification(peds, system = "usda",
                                         level = lvl, boot_n = 500L)
  cat(sprintf("%-12s n=%d  top1=%.4f  CI=[%.3f, %.3f]\n",
                lvl, res$n_evaluated, res$accuracy_top1,
                res$accuracy_ci[1], res$accuracy_ci[2]))
}

3. Headline numbers (v0.9.31, n=2638, ±1.7 pp CI)

Level	n	top-1	95 % CI
Order	2 638	34.19 %	[32.4 %, 36.0 %]
Suborder	2 636	13.85 %	[12.5 %, 15.2 %]
Great Group	2 633	7.94 %	[7.0 %, 8.9 %]
Subgroup	2 638	4.17 %	[3.5 %, 4.9 %]

These are the values reported in the methodology paper. They were measured on the n=2638 large-scale validation sample (3x the n=865 development sample) with 500 bootstrap reps for ±1.7 pp confidence intervals.

4. Release-by-release trajectory

The v0.9.24 → v0.9.31 release series progressively closed key reasoning gaps. Apples-to-apples on the n=865 development sample:

Release	Change	GG (n=865)	Subgroup (n=865)
v0.9.22	(baseline before this series)	–	3.0 %
v0.9.23	argic clay-increase canonicalisation	–	–
v0.9.24	aquic / oxyaquic subgroup tightening + new GG/Suborder benchmark levels	6.5 %	3.8 %
v0.9.25	KST 13ed Great Group canonicaliser (Pellusterts → Hapluderts; etc.)	10.3 % (+3.84)	5.0 % (+1.15)
v0.9.26	per-system argic threshold API (infrastructure)	10.3 %	5.0 %
v0.9.27	clay-films test via NASIS pediagfeatures + phpvsf	10.6 % (+0.23)	5.1 % (+0.12)
v0.9.28	designation-based clay-films proxy (‘t’ suffix)	10.6 %	5.1 %
v0.9.30	bundled WoSIS sample + WoSIS retry bug fix	10.6 %	5.1 %
v0.9.31	Quartzipsamment proxy broadened + Fragipan NASIS path	10.92 % (+0.35)	5.32 % (+0.23)

The v0.9.25 KST 13ed Great Group canonicaliser remains the largest single-version GG lift (+3.84 pp, +59 % relative). It required no classifier changes — the predictor is already correct for KST 13ed; only the comparison was unfair to legacy KSSL labels (which span Soil Taxonomy editions 8 through 13).

5. The four levels are independently measurable

level = "great_group" reads pedon$site$reference_usda_grtgroup and compares the LAST token of classify_usda(pedon)$name after the v0.9.25 KST canonicaliser collapses edition-driven renames.

level = "suborder" reads pedon$site$reference_usda_suborder and applies .gg_to_suborder() (a ~70-Suborder canonical-suffix mapping derived from KST 13ed Ch 4) to the predicted Great Group name.

level = "subgroup" reads pedon$site$reference_usda_subgroup and compares the full subgroup name token-by-token (Subgroup modifier kept intact; Great Group canonicalised).

level = "order" reads pedon$site$reference_usda and compares the top-level Order name directly.

6. v0.9.25 – the KST canonicaliser story

KSSL samp_taxgrtgroup is populated from historical pedon descriptions spanning Soil Taxonomy editions 8 through 13. Several Great Group names changed between editions, and KSSL did NOT retroactively update them. soilKey’s classifier follows KST 13ed (the current edition), so direct string equality between predicted (13ed) and reference (mixed editions) Great Group names produces false-negative misses for every profile whose KSSL label is a pre-13ed name.

The 16 most common edition-driven renames in KSSL:

Pre-13ed name (KSSL)	KST 13ed equivalent	Reason
Haplaquolls	Endoaquolls / Epiaquolls	KST 11 split Hapl- into endo (deep) / epi (perched) saturation
Haplaquepts	Endoaquepts / Epiaquepts	same
Haplaquerts	Endoaquerts / Epiaquerts	same
Haplaquents	Endoaquents / Epiaquents	same
Haplaqualfs	Endoaqualfs / Epiaqualfs	same
Haplaquods	Endoaquods / Epiaquods	same
Pellusterts	Hapluderts / Salusterts / Calciusterts	KST 12: dark-colour Pellu split by chemistry
Chromusterts	Hapluderts	KST 12: bright-colour Chromu merged
Pelluderts / Chromuderts	(same Pellu/Chromu reorg, udic)	–
Dystrochrepts	Dystrudepts	KST 11: Ochrept suborder retired; Udept created
Eutrochrepts	Eutrudepts	same
Camborthids	Haplocambids	KST 11: Orthid suborder retired; Cambid created
Calciorthids	Haplocalcids	same
Paleorthids	Haplodurids / Durargids	(KST 11 Aridisol reshuffling)
Vitrandepts	Vitrudands	KST 11: Andisols promoted to its own Order
Medisaprists / Medihemists / Medifibrists	Haplosaprists / Haplohemists / Haplofibrists	“medi-” temperature regime moved to Subgroup

The canonicaliser canonicalise_kst13ed_gg() collapses each pair (or triple, for the Hapl-/Endo-/Epi- splits) to a shared canonical key. Apply to BOTH ref and pred before comparing – the Subgroup modifier (Typic / Aquic / …) is left intact and the canonicalisation only affects the Great Group token.

7. What’s missing (roadmap)

The 4.2 % Subgroup top-1 leaves substantial headroom. The v0.9.27 confusion analysis identified the dominant remaining gaps:

Pale- / Glossic Alfisol prefixes (~11 misses, deferred to v0.9.32+): current pale_qualifying_usda() proxy is too strict (clay >= 35 %); KST 13ed actually defines Pale- by clay-stability within 150 cm. Tightening here without regressing on Hapludalfs is delicate work.
NASIS data sparsity (~47 % of profiles lack any pediagfeatures / phpvsf record): roadmap candidate is field-survey morphology + soilKey’s pedometric proxies as a third evidence path.
Endo/Epi-aquic precise distinction: currently the v0.9.25 canonicaliser collapses these for fair comparison; a future release could restore the distinction by inferring saturation depth from redoximorphic_features_pct distribution.

Summary

The KSSL + NASIS join is soilKey’s headline real-data benchmark. It exercises every level of the USDA Soil Taxonomy 13ed hierarchy on n=2638 profiles with full lab + morphology data, with confidence intervals tight enough (±1.7 pp) for paper-grade claims. The v0.9.24 → v0.9.31 release series moved Great Group accuracy from a 6.5 % baseline to 10.92 %, primarily through the KST 13ed name canonicaliser (v0.9.25) which collapsed edition-driven renames in the historical KSSL labels.

For the WRB and SiBCS benchmark axes, see vignettes v06_wosis_benchmark and the FEBR / Embrapa report at inst/benchmarks/reports/embrapa_v0929_2026-05-03.md.