Skip to contents

The honest measurement that has been data-blocked until a spectra-bearing, labelled dataset exists. For each cross-validation fold it calibrates a spectral library on the training profiles, then classifies the held-out profiles twice – OFF (spectra-only pedon, no lab attributes) and ON (fill_from_spectra predicts the lab attributes from the scan first) – and scores both against the reference label. Non-circular: the calibration library never includes a test profile.

Usage

benchmark_spectral_fill(
  reflectance,
  metadata,
  id_col = "id",
  system = c("sibcs", "wrb2022", "usda"),
  profile_col = NULL,
  folds = 5L,
  properties = NULL,
  method = c("mbl", "plsr_local", "pretrained"),
  wavelengths = NULL,
  resample_to = NULL,
  property_map = NULL,
  label_map = NULL,
  normalize = c("auto", "none", "percent"),
  fold_id = NULL,
  verbose = TRUE
)

Arguments

reflectance

Reflectance data: a matrix / data.frame with rows = samples and columns named by wavelength (nm); OR a long data.frame with id_col, wavelength_nm, reflectance; OR a path to a CSV in either form.

metadata

A data.frame with one row per sample carrying id_col plus lab attributes and optional taxonomic labels and lat/ lon. Rows are aligned to reflectance by id_col.

id_col

Sample identifier column shared by both tables (default "id").

system

One of "sibcs" (default), "wrb2022", "usda".

profile_col

Column grouping samples into profiles (default id_col).

folds

Number of CV folds (default 5).

properties

Attributes to predict from spectra (default the fill_from_spectra set).

method

Spectral model: "mbl", "plsr_local" or "pretrained" (passed to fill_from_spectra).

wavelengths

Optional explicit wavelength vector (nm) when the reflectance columns are not wavelength-named.

resample_to

Optional target wavelength grid (nm) to linearly resample every spectrum onto (e.g. 350:2500); default keeps the native grid.

property_map, label_map

Optional named lists overriding the alias auto-detection, e.g. property_map = list(clay_pct = "ARGILA").

normalize

One of "auto" (divide by 100 when values look like percent), "percent", or "none".

fold_id

Optional integer vector (one per profile, in sorted-id order) to use fixed folds instead of the deterministic modulo split.

verbose

Print a one-line summary (default TRUE).

Value

A list with accuracy_off, accuracy_on, delta, n, per-fold rows, and the per-profile predictions frame.