Creates lists of subsamples for stability selection.
createSubsamples.Rd
Creates lists of subsamples for stability selection.
Arguments
- n
Integer or numeric; sample size of the data set.
- p
Integer or numeric; number of features.
- B
Integer or numeric; the number of subsamples. Note: For
sampling_type=="MB"
the total number of subsamples will beB
; forsampling_type="SS"
the number of subsamples will be2*B
. Default is 100 forsampling_type="MB"
and 50 forsampling_type="SS"
.- sampling_type
A character vector; either "SS" or "MB". For "MB", all B subsamples are drawn randomly (as proposed by Meinshausen and Bühlmann 2010). For "SS", in addition to these B subsamples, the B complementary pair subsamples will be drawn as well (see Faletto and Bien 2022 or Shah and Samworth 2013 for details). Default is "SS", and "MB" is not supported yet.
- num_feats_remove
Integer; number of features select automatically on every iteration. Determined earlier from input prop_feats_remove to css.
Value
A list of length B
(or 2*B
for sampling_type = "SS"
). If
prop_feats_remove = 0
, each list element is an integer vector of length
floor(n/2)
containing the indices of a subsample of 1:n
. (For
sampling_type=="SS"
, the last B
subsamples will be complementary pairs of
the first B
subsamples; see Faletto and Bien 2022 or Shah and Samworth 2013
for details.) If prop_feats_remove > 0
, each element is a named list with
members "subsample" (same as above) and "feats_to_keep", a logical vector
of length p
; feats_to_keep[j] = TRUE
if feature j
is chosen for this
subsample, and false otherwise.
References
Faletto, G., & Bien, J. (2022). Cluster Stability Selection. arXiv preprint arXiv:2201.00494. https://arxiv.org/abs/2201.00494.
Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 75(1), 55–80. https://doi.org/10.1109/RITA.2014.2302071.