Skip to contents

A convenient S3 accessor for the selected clusters (the default) or the selected features of a fitted cssr object, without re-running the (computationally expensive) subsampling. This is a thin wrapper around getCssSelections().

Usage

selected(object, ...)

# S3 method for class 'cssr'
selected(
  object,
  type = c("clusters", "features"),
  cutoff = 0,
  min_num_clusts = 1,
  max_num_clusts = NA,
  weighting = "sparse",
  ...
)

Arguments

object

An object of class "cssr" (the output of the function css()).

...

Additional arguments passed to methods (currently unused).

type

Character; either "clusters" (the default) to return the named list of selected clusters, or "features" to return the flat integer vector of selected features. May be abbreviated.

cutoff

Numeric; only those clusters with selection proportions equal to at least cutoff will be selected. Must be between 0 and 1. Default is 0 (in which case either all clusters are selected, or max_num_clusts are selected, if max_num_clusts is specified).

min_num_clusts

Integer or numeric; the minimum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns fewer than min_num_clusts clusters, the cutoff will be lowered until at least min_num_clusts clusters are selected.) Default is 1. May be set to 0 to allow a pure cutoff-based (threshold) selection that returns an empty result when no cluster's selection proportion meets the cutoff.

max_num_clusts

Integer or numeric; the maximum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns more than max_num_clusts clusters, the cutoff will be raised until at most max_num_clusts clusters are selected.) Default is NA (in which case max_num_clusts is ignored).

weighting

Character; passed to getCssSelections() to determine the weights of individual features within the selected clusters. This affects ONLY type = "features" (it determines which cluster members have nonzero weight and are therefore returned); it is a no-op for type = "clusters", whose selected clusters, selection proportions, and sizes do not depend on the weighting. Must be one of "sparse", "weighted_avg", or "simple_avg". See getCssSelections() for details. Default is "sparse".

Value

For the cssr method: if type = "clusters" (the default), a named list of integer vectors (each the indices of the features in one selected cluster; an empty list if no cluster is selected); if type = "features", a named integer vector of the indices of the selected features (an empty integer vector if none).

Details

Note that, unlike the selected() accessor in the stabs package (which returns the selected variables), the default here returns the selected clusters – the natural unit of cluster stability selection. Pass type = "features" to obtain the flat integer vector of selected features instead.

References

Faletto, G., & Bien, J. (2022). Cluster Stability Selection. arXiv preprint arXiv:2201.00494. https://arxiv.org/abs/2201.00494.

See also

summary.cssr() for an overview (counts plus a per-cluster table); getCssSelections() for the underlying selection (clusters, features, and weights together); printCssDf() and print.cssr() for the printed summary.

Author

Gregory Faletto, Jacob Bien

Examples

set.seed(1)
data <- genClusteredData(n = 50, p = 11, k_unclustered = 2,
  cluster_size = 4, n_clusters = 1, snr = 3)
clusters <- list(cluster1 = 1:4)
res <- css(X = data$X, y = data$y, lambda = 0.01, clusters = clusters,
  B = 10)
# Selected clusters (the default):
selected(res)
#> $cluster1
#> [1] 1 2 3 4
#> 
#> $c2
#> [1] 5
#> 
#> $c3
#> [1] 6
#> 
#> $c4
#> [1] 7
#> 
#> $c5
#> [1] 8
#> 
#> $c6
#> [1] 9
#> 
#> $c7
#> [1] 10
#> 
#> $c8
#> [1] 11
#> 
# Selected features (a flat integer vector):
selected(res, type = "features")
#> [1]  3  5  6  7  8  9 10 11