
BatchBALD information-theoretic batch acquisition
Source:R/active_batchbald.R
al_query_batchbald.RdSelects an information-theoretically optimal batch of Active
Learning queries from a pool of candidates, following Kirsch,
van Amersfoort and Gal 2019 (see the @references section below).
Unlike the top-n BALD strategy (which
repeatedly picks the single most uncertain candidate and therefore
tends to select n copies of "the same question" on clustered
pools), BatchBALD optimises the joint mutual information between
the batch \(y_B = (y_{x_1}, \ldots, y_{x_n})\) and the model
parameters:
Arguments
- model
A
edaphos_al_modelfromal_fit()oral_loop(). The underlyingranger::rangerobject must have been trained withkeep.inbag = TRUE(default inal_fit()), so that per-tree predictions are available viapredict(..., predict.all = TRUE).- candidates
Data frame of unlabelled candidates. Must contain the covariates listed in
model$covariates.- n
Integer — batch size.
- sigma_a2
Optional numeric — aleatoric noise variance \(\sigma_a^2\). When
NULL(default), estimated from the out-of-bag residuals of the fitted forest.- physics_gate
Optional function
function(candidates, predicted_mean) -> logical. Seeal_query().
Value
Integer vector of row indices in candidates that form
the selected batch, in greedy-selection order (the first index
is the highest-BALD single point; each subsequent index is the
point that maximally increases the joint log-determinant given
the previously selected batch).
Details
$$ \mathrm{BatchBALD}(B) \;=\; I\bigl(y_B ; \theta \mid x_B, \mathcal D\bigr). $$
For a regression model with Gaussian aleatoric noise of variance
\(\sigma_a^2\) and an epistemic posterior represented by T
parameter draws \(f_\theta^{(1)}, \ldots, f_\theta^{(T)}\), the
objective reduces to a log-determinant :
$$ \mathrm{BatchBALD}(B) \;\propto\; \tfrac{1}{2}\log\det\!\bigl( \mathrm{Cov}_\theta\bigl(f_\theta(B)\bigr) + \sigma_a^2 I_{|B|} \bigr). $$
For a Quantile Regression Forest
(which is what al_fit() produces) the trees themselves are the
T parameter draws, so the joint covariance is just the per-tree
empirical covariance across candidates. The greedy selection
inherits the \((1 - 1/e)\)-optimality guarantee of submodular
maximisation (Nemhauser, Wolsey and Fisher 1978) and is implemented via
Schur-complement / Cholesky updates so every greedy step is
\(O(m^2 n_\mathrm{pool})\) rather than \(O(m^3 n_\mathrm{pool})\).
This is a complement to al_query(), not a replacement: the
hybrid uncertainty + diversity strategy there remains the default
for low-budget settings where a physical-distance term is needed.
Use BatchBALD when (a) the covariate pool contains clusters of
near-duplicate candidates and top-n BALD would select all of
them, (b) the QRF aleatoric noise is well-estimated, and (c) the
batch size is moderate (n <= 50 for laptop-scale pools of up to
~10 000 candidates).
References
Kirsch, A., van Amersfoort, J. and Gal, Y. (2019). BatchBALD: Efficient and diverse batch acquisition for deep Bayesian active learning. NeurIPS 32, 7024–7035.
Meinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research 7, 983–999.
Nemhauser, G. L., Wolsey, L. A. and Fisher, M. L. (1978). An analysis of approximations for maximizing submodular set functions — I. Mathematical Programming 14, 265–294.
See also
al_query() for uncertainty-plus-diversity acquisition;
al_fit() for the QRF backbone.