
Classify a soil by spectral similarity to OSSL reference profiles
Source:R/spectra-neighbours.R
classify_by_spectral_neighbours.RdGiven a Vis-NIR (or MIR) spectrum and an OSSL reference library enriched with WRB / SiBCS / USDA labels, returns the K most spectrally similar profiles plus a probabilistic class prediction aggregated from their labels.
Usage
classify_by_spectral_neighbours(
spectrum,
ossl_library,
system = c("wrb2022", "sibcs", "usda"),
k = 25L,
preprocess = "snv+sg1",
region = NULL,
verbose = TRUE
)Arguments
- spectrum
Numeric vector or 1-row matrix (the query spectrum). Must align (after preprocessing) with the column space of
ossl_library$Xr.- ossl_library
A list with
Xr(numeric matrix, rows = OSSL training profiles, cols = wavelengths) andYr(data frame keyed by property; must include a column namedwrb_rsgand / orsibcs_ordem/usda_orderfor the labels to aggregate over).ossl_librarymay also carrylatandloncolumns inYrfor the regional filter.- system
One of
"wrb2022"(default),"sibcs","usda". Controls which label column ofYris aggregated.- k
Number of nearest neighbours (default 25).
- preprocess
Pre-processing pipeline; passed to
preprocess_spectra. Default"snv+sg1".- region
Optional
list(lat, lon, radius_km)for a regional filter onossl_library$Yr$lat / lon.- verbose
Emit a
clisummary.
Value
A list with three elements:
distributionA
data.tablewith columnsclass,n_neighbours,probability(=n_neighbours / k), sorted by probability.neighboursA
data.tablewith one row per neighbour (top K), columnsrank,distance,class, plus any other columns present inossl_library$Yr.queryThe query metadata (system, k, region filter, n_library_rows, n_filtered).
Details
This is the **spectral analogy** classifier. It does not replace
the deterministic key in
classify_wrb2022 / classify_sibcs /
classify_usda; instead it provides a high-prior
"expected class" before the user has lab data, reducing the
search space when collecting confirming attributes.
Distance metric
By default we compute distances on PLS scores (matching the
resemble / OSSL recipe), with PLS components fit on the OSSL
reference Yr matrix. When resemble is unavailable, we fall
back to PCA scores from stats::prcomp on the preprocessed
Xr – a defensible-but-simpler heuristic.
Region filter
Optional lat / lon / radius_km arguments filter the OSSL
library to profiles within radius_km (great-circle) of the
query location before computing distances. This implements the
"biome-aware" use case the architecture document calls for: a
Cerrado profile shouldn't have its class inferred from spectral
neighbours in the Boreal taiga.
See also
predict_ossl_mbl (predicts attributes),
classify_wrb2022 (the deterministic key).
Examples
if (FALSE) { # \dontrun{
# Toy run against the bundled demo library (synthetic):
data(ossl_demo_sa)
# Inject a fake label column for the demo (real OSSL has it):
ossl_demo_sa$Yr$wrb_rsg <- sample(c("FR", "AC", "LX", "AL"),
nrow(ossl_demo_sa$Yr),
replace = TRUE)
query <- ossl_demo_sa$Xr[1, ]
res <- classify_by_spectral_neighbours(query, ossl_demo_sa,
k = 10)
res$distribution # ranked classes
res$neighbours # the 10 most similar profiles
} # }