Skip to contents

Runs a breeding-relevant cross-validation and reports predictive ability (the correlation between predictions and observed phenotypes on held-out individuals) for each model. Two schemes are supported: "kfold" (random k-fold, i.e. predicting untested lines, repeated reps times) and "leave_group_out" (hold out one family/environment at a time, via groups).

Usage

gs_cv(
  y,
  geno,
  models = available_models(),
  k = 5,
  reps = 1,
  scheme = c("kfold", "leave_group_out"),
  groups = NULL,
  seed = NULL,
  ...
)

Arguments

y

Numeric phenotype vector (length n), no missing values.

geno

Marker matrix (n x m, 0/1/2, no missing values).

models

Models to evaluate; defaults to all available (see available_models()). Unavailable models are dropped with a warning.

k

Number of folds for "kfold". Default 5.

reps

Number of repeats for "kfold". Default 1.

scheme

"kfold" or "leave_group_out".

groups

Grouping vector (length n) for "leave_group_out".

seed

Optional seed (applied via withr::with_seed()).

...

Passed to gs_fit() (model hyperparameters).

Value

An object of class gs_cv: a list with accuracy (a data frame of model, mean, sd, n_folds), per_fold (the raw fold results), and the call settings.

Details

Fitting is wrapped so that a model which errors on a fold records NA for that fold rather than aborting the whole run.

Examples

sim <- simulate_population(n = 120, m = 400, seed = 1)
cv <- gs_cv(sim$pheno, sim$geno, models = "gblup", k = 5, seed = 1)
cv
#> <gs_cv: kfold>
#>   5-fold x 1 rep(s)
#>  model  mean    sd n_folds
#>  gblup 0.153 0.106       5
#>   (accuracy = predictive ability, cor(pred, observed) on held-out data)