
Download an OSSL subset and attach WRB / SiBCS / USDA labels
Source:R/spectra-fetch-labels.R
download_ossl_subset_with_labels.RdFetches a region-filtered slice of the Open Soil Spectral Library
via download_ossl_subset and post-joins WRB
Reference Soil Group labels from WoSIS GraphQL by spatial
nearest-neighbour. The resulting artefact has the canonical
list(Xr, Yr, metadata) shape – with extra columns in
Yr: wrb_rsg, wrb_label_source,
wrb_label_distance_km, plus optionally sibcs_ordem
and usda_order when translate_systems = TRUE.
Usage
download_ossl_subset_with_labels(
region = c("global", "south_america", "north_america", "europe", "africa", "asia",
"oceania"),
max_distance_km = 5,
wosis_endpoint = NULL,
translate_systems = TRUE,
max_to_label = Inf,
verbose = TRUE,
query_fn = NULL,
...
)Arguments
- region
OSSL region filter; one of
"global","south_america","north_america","europe","africa","asia","oceania".- max_distance_km
WoSIS spatial-join tolerance in kilometres (default 5). Profiles whose nearest WRB-labeled WoSIS neighbour is farther than this are left unlabeled.
- wosis_endpoint
Override for the WoSIS GraphQL endpoint (default
getOption("soilKey.wosis_graphql")). The canonical value is"https://graphql.isric.org/wosis/graphql".- translate_systems
If
TRUE(default), also addssibcs_ordemandusda_ordercolumns derived from the WRB label via the Schad (2023) Annex Table 1 / SiBCS 5ª ed. Annex A correspondence. Those translations are 1:N for some classes; we pick the most-common partner and tag rows where the translation is genuinely ambiguous.- max_to_label
Maximum number of profiles to query against WoSIS (default
Inf). WoSIS throttles aggressive queries; cap this when running interactive demos.- verbose
Emit
cliprogress messages.- query_fn
Optional injection of the per-coordinate WoSIS query function. Default uses
.query_nearest_wosis_wrb. Tests pass a stub here to exercise the join logic without network.- ...
Forwarded to
download_ossl_subset.
Value
A list with Xr (numeric matrix), Yr (data
frame with the labels attached), and metadata
(list with the OSSL fetch metadata + the join statistics:
number of profiles labeled, average / max distance,
WoSIS endpoint, snapshot date).
Why this function exists
OSSL stores Vis-NIR / MIR spectra and lab data but typically lacks
WRB Reference Soil Group labels on most profiles (KSSL data is
USDA-flavoured; non-US contributions are inconsistent). WoSIS, by
contrast, archives ~228 000 profiles with WRB labels but no
spectra. This function bridges the two so the user can run
classify_by_spectral_neighbours on a real-data
OSSL library without having to do the spatial join themselves.
Caveats and provenance
WRB labels obtained via spatial join are weak labels. The same physical location may have been classified differently across surveys (different WRB editions, different interpretations). Each row carries:
wrb_label_source = "wosis_spatial_join": label inherited from a WoSIS neighbour withinmax_distance_km.wrb_label_distance_km: the distance to that neighbour (NA when no neighbour was found within tolerance).wrb_label_source = "ossl_native": label was already present in OSSL Yr (rare; preserved verbatim).wrb_label_source = "missing": no neighbour within tolerance; the row stays unlabeled and will be skipped downstream.
Treat the labels as priors, not ground truth.
Examples
if (FALSE) { # \dontrun{
# Real OSSL South-America subset with WRB labels:
lib <- download_ossl_subset_with_labels(
region = "south_america",
max_distance_km = 10
)
table(lib$Yr$wrb_rsg, useNA = "always")
table(lib$Yr$wrb_label_source)
# Drop into the spectral analogy classifier:
res <- classify_by_spectral_neighbours(
spectrum = my_query_spectrum,
ossl_library = lib,
k = 25,
region = list(lat = -22.7, lon = -43.7,
radius_km = 500)
)
} # }