Extract the selected clusters or features from cluster stability selection
selected.RdA convenient S3 accessor for the selected clusters (the default) or the
selected features of a fitted cssr object, without re-running the
(computationally expensive) subsampling. This is a thin wrapper around
getCssSelections().
Usage
selected(object, ...)
# S3 method for class 'cssr'
selected(
object,
type = c("clusters", "features"),
cutoff = 0,
min_num_clusts = 1,
max_num_clusts = NA,
weighting = "sparse",
...
)Arguments
- object
An object of class "cssr" (the output of the function
css()).- ...
Additional arguments passed to methods (currently unused).
- type
Character; either "clusters" (the default) to return the named list of selected clusters, or "features" to return the flat integer vector of selected features. May be abbreviated.
- cutoff
Numeric; only those clusters with selection proportions equal to at least cutoff will be selected. Must be between 0 and 1. Default is 0 (in which case either all clusters are selected, or max_num_clusts are selected, if max_num_clusts is specified).
- min_num_clusts
Integer or numeric; the minimum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns fewer than min_num_clusts clusters, the cutoff will be lowered until at least min_num_clusts clusters are selected.) Default is 1. May be set to 0 to allow a pure cutoff-based (threshold) selection that returns an empty result when no cluster's selection proportion meets the cutoff.
- max_num_clusts
Integer or numeric; the maximum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns more than max_num_clusts clusters, the cutoff will be raised until at most max_num_clusts clusters are selected.) Default is NA (in which case max_num_clusts is ignored).
- weighting
Character; passed to
getCssSelections()to determine the weights of individual features within the selected clusters. This affects ONLYtype = "features"(it determines which cluster members have nonzero weight and are therefore returned); it is a no-op fortype = "clusters", whose selected clusters, selection proportions, and sizes do not depend on the weighting. Must be one of "sparse", "weighted_avg", or "simple_avg". SeegetCssSelections()for details. Default is "sparse".
Value
For the cssr method: if type = "clusters" (the default), a named
list of integer vectors (each the indices of the features in one selected
cluster; an empty list if no cluster is selected); if type = "features", a
named integer vector of the indices of the selected features (an empty integer
vector if none).
Details
Note that, unlike the selected() accessor in the stabs package (which
returns the selected variables), the default here returns the selected
clusters – the natural unit of cluster stability selection. Pass
type = "features" to obtain the flat integer vector of selected features
instead.
References
Faletto, G., & Bien, J. (2022). Cluster Stability Selection. arXiv preprint arXiv:2201.00494. https://arxiv.org/abs/2201.00494.
See also
summary.cssr() for an overview (counts plus a per-cluster table);
getCssSelections() for the underlying selection (clusters, features, and
weights together); printCssDf() and print.cssr() for the printed summary.
Examples
set.seed(1)
data <- genClusteredData(n = 50, p = 11, k_unclustered = 2,
cluster_size = 4, n_clusters = 1, snr = 3)
clusters <- list(cluster1 = 1:4)
res <- css(X = data$X, y = data$y, lambda = 0.01, clusters = clusters,
B = 10)
# Selected clusters (the default):
selected(res)
#> $cluster1
#> [1] 1 2 3 4
#>
#> $c2
#> [1] 5
#>
#> $c3
#> [1] 6
#>
#> $c4
#> [1] 7
#>
#> $c5
#> [1] 8
#>
#> $c6
#> [1] 9
#>
#> $c7
#> [1] 10
#>
#> $c8
#> [1] 11
#>
# Selected features (a flat integer vector):
selected(res, type = "features")
#> [1] 3 5 6 7 8 9 10 11