Skip to contents

If cutoff is too high for at least min_num_clusts clusters to be selected, then it will be lowered until min_num_clusts can be selected. After that, if the cutoff is too low such that more than max_num_clusts are selected, then the cutoff will be increased until no more than max_num_clusts are selected. Note that because clusters can have tied selection proportions, it is possible that the number of selected clusters will be strictly lower than max_num_clusts or strictly greater than min_num_clusts. In fact, it is possible that both cutoffs won't be able to be satisfied simulteaneously, even if there is a strictly positive difference between max_num_clusts and min_num_clusts. If this occurs, max_num_clusts will take precedence over min_num_clusts. getSelectedClusters will throw an error if the provided inputs don't allow it to select any clusters.

Usage

getSelectedClusters(
  css_results,
  weighting,
  cutoff,
  min_num_clusts,
  max_num_clusts
)

Arguments

css_results

An object of class "cssr" (the output of the function css).

weighting

Character; determines how to calculate the weights for individual features within the selected clusters. Only those features with nonzero weight within the selected clusters will be returned. Must be one of "sparse", "weighted_avg", or "simple_avg'. For "sparse", all the weight is put on the most frequently selected individual cluster member (or divided equally among all the clusters that are tied for the top selection proportion if there is a tie). For "weighted_avg", only the features within a selected cluster that were themselves selected on at least one subsample will have nonzero weight. For "simple_avg", each cluster member gets equal weight regardless of the individual feature selection proportions (that is, all cluster members within each selected cluster will be returned.). See Faletto and Bien (2022) for details.

cutoff

Numeric; getCssSelections will select and return only of those clusters with selection proportions equal to at least cutoff. Must be between 0 and 1.

min_num_clusts

Integer or numeric; the minimum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns fewer than min_num_clusts clusters, the cutoff will be increased until at least min_num_clusts clusters are selected.)

max_num_clusts

Integer or numeric; the maximum number of clusters to use regardless of cutoff. (That is, if the chosen cutoff returns more than max_num_clusts clusters, the cutoff will be decreased until at most max_num_clusts clusters are selected.) If NA, max_num_clusts is ignored.

Value

A named list with the following elements:

selected_clusts

A named numeric vector containing the selection proportions for the selected clusters. The name of each entry is the name of the corresponding cluster.

selected_feats

A named integer vector; the indices of the features with nonzero weights from all of the selected clusters.

weights

A named list of the same length as the number of selected clusters. Each list element weights[j] is a numeric vector of the weights to use for the jth selected cluster, and it has the same name as the cluster it corresponds to.

Author

Gregory Faletto, Jacob Bien