7 Convenient wrapper functions

Finally, we provide convenient wrapper functions which yield user-desired output in a single step at the price of flexibility and efficiency. cssSelect() yields a selected set of clusters and features (the same output as getCssSelections()) in a single function call. cssPredict() takes in a training/selection data set as well as a test X. It uses the labeled data to select a set of features and train an OLS model on the selected features, and then it generates predictions on the test set using the fitted model.

Besides requiring only a single function call to yield desired output, these wrapper functions also automatically select hyperparameters (lambda used for the lasso, a desired model size, and even selection and training splits for cssPredict()) in a sensible way if these values are not provided by the user. So these functions are very convenient for an end user who wants quick results without getting “under the hood.”

The two main disadvantages of these functions are flexibility and computational efficiency. For simplicity of use, these functions do not provide as many options as the component functions they call (for example, min_num_clusts is not an available argument for these models). Further, both of these functions make (computationally intensive) calls to css() internally every time they are called, so these functions are not recommended for users who want to tinker with the model size and other parameters. Instead, it would be more efficient to call css once and then work with the output as desired using the other package functions, which are very efficient given the stored output from css().

cssSelect() and cssPredict() have no new dependencies; they rely only on already-defined functions.

cssSelect():

#' Obtain a selected set of clusters and features using cluster stability
#' selection
#'
#' Takes in data X and y and returns a set of clusters (and a set of features)
#' that are useful for predicting y from the data in X. This is a wrapper
#' function for css and getCssSelections. Using cssSelect is simpler, but it
#' has fewer options, and it executes the full (computationally expensive)
#' subsampling procedured every time it is called. In contrast, css can be
#' called just once, and then getCssSelections can quickly return results using
#' different values of cutoff, max_num_clusts, etc. from the calculations done
#' in one call to css.
#' @param X An n x p numeric matrix (preferably) or a data.frame (which will
#' be coerced internally to a matrix by the function model.matrix) containing
#' the p >= 2 features/predictors.
#' @param y A length-n numeric vector containing the responses; `y[i]` is the
#' response corresponding to observation `X[i, ]`. (Note that for the css
#' function, y does not have to be a numeric response, but for this function,
#' the underlying selection procedure is the lasso, so y must be a real-valued
#' response.)
#' @param clusters Optional; either an integer vector of a list of integer
#' vectors; each vector should contain the indices of a cluster of features (a
#' subset of 1:p). (If there is only one cluster, clusters can either be a list
#' of length 1 or an integer vector.) All of the provided clusters must be
#' non-overlapping. Every feature not appearing in any cluster will be assumed
#' to be unclustered (that is, they  will be treated as if they are in a
#' "cluster" containing only themselves). If clusters is a list of length 0 (or
#' a list only containing clusters of length 1), then css() returns the same
#' results as stability selection (so feat_sel_mat will be identical to
#' clus_sel_mat). Names for the clusters will be needed later; any clusters that
#' are not given names in the list clusters will be given names automatically by
#' css. CAUTION: if the provided X is a data.frame that contains a categorical
#' feature with more than two levels, then the resulting matrix made from
#' model.matrix will have a different number of columns than the provided
#' data.frame, some of the feature numbers will change, and the clusters
#' argument will not work properly (in the current version of the package). To
#' get correct results in this case, please use model.matrix to convert the
#' data.frame to a numeric matrix on your own, then provide this matrix and
#' cluster assignments with respect to this matrix. Default is list() (so no
#' clusters are specified, and every feature is assumed to be in a "cluster"
#' containing only itself).
#' @param lambda Optional; the tuning parameter to be used by the lasso for
#' feature selection in each subsample. If lambda is not provided, cssSelect
#' will choose one automatically by cross-validation. Default is NA.
#' @param cutoff Numeric; cssSelect will only select those clusters with
#' selection proportions equal to at least cutoff. Must be between 0 and 1.
#' Default is NA (in which case max_num_clusts are used).
#' @param max_num_clusts Integer or numeric; the maximum number of clusters to
#' use regardless of cutoff. (That is, if the chosen cutoff returns more than
#' max_num_clusts clusters, the cutoff will be decreased until at most
#' max_num_clusts clusters are selected.) Default is NA (in which case
#' either cutoff is used to choose the number of clusters, or if cutoff was also
#' unspecified, cssSelect chooses max_num_clusts by cross-validation).
#' @param auto_select_size Logical; if TRUE, then max_num_clusts will be
#' automatically estimated using the lasso with cross-validation. Default is
#' TRUE, though his argument is ignored if either cutoff or max_num_clusts is
#' provided. (If desired output is to return all clusters, you should set
#' auto_select_size to FALSE and do not provide cutoff or max_num_clusts.)
#' @return A named list with two items. \item{selected_clusts}{A list of
#' integer vectors; each vector contains the indices of one of the selected
#' clusters.} \item{selected_feats}{An integer vector; the indices of the
#' all of the selected features within all of the selected clusters (typically
#' only one feature is selected from each cluster).}
#' @author Gregory Faletto, Jacob Bien
#' @export
cssSelect <- function(X, y, clusters = list(), lambda=NA, cutoff=NA,
    max_num_clusts=NA, auto_select_size=TRUE){

    # Check inputs (most inputs will be checked by called functions)

    stopifnot(!is.na(auto_select_size))
    stopifnot(length(auto_select_size) == 1)
    stopifnot(is.logical(auto_select_size))

    stopifnot(is.matrix(X) | is.data.frame(X))
    stopifnot(all(!is.na(X)))


    # Check if x is a matrix; if it's a data.frame, convert to matrix.
    if(is.data.frame(X)){
        p <- ncol(X)
        X <- stats::model.matrix(~ ., X)
        X <- X[, colnames(X) != "(Intercept)"]

        if(length(clusters) > 0 & (p != ncol(X))){
            stop("When stats::model.matrix converted the provided data.frame X to a matrix, the number of columns changed (probably because the provided data.frame contained a factor variable with at least three levels). Please convert the data.frame X to a matrix yourself using model.matrix and provide cluster assignments according to the columns of the new matrix.")
        }

    }

    stopifnot(is.matrix(X))
    stopifnot(all(!is.na(X)))

    if(!is.numeric(y) & !is.integer(y)){
        stop("The provided y must be real-valued, because cssSelect uses the lasso for feature selection. (In order to use a different form of response, use the css function and provide your own selection function accommodating your choice of y.)")
    }

    stopifnot(length(lambda) == 1)
    if(is.na(lambda)){
        lambda <- getLassoLambda(X, y)
    }

    css_results <- css(X, y, lambda, clusters)

    # If no indication of how to select model size was provided, choose model
    # size by cross-validation
    if(is.na(cutoff) & is.na(max_num_clusts)){
        if(auto_select_size){
            max_num_clusts <- getModelSize(X, y, css_results$clusters)
        }
    }

    if(is.na(cutoff)){
        cutoff <- 0
    }

    # Get selected features
    getCssSelections(css_results, weighting="sparse", cutoff=cutoff,
        min_num_clusts=1, max_num_clusts=max_num_clusts)
}

Tests for cssSelect():

testthat::test_that("cssSelect works", {
  set.seed(73212)
  
  data <- genClusteredData(n=15, p=11, k_unclustered=1, cluster_size=3,
                        n_clusters=2, sig_clusters=1, sigma_eps_sq=1)
  
  x <- data$X
  y <- data$y
  
  # Intentionally don't provide clusters for all features, mix up formatting,
  # etc.
  good_clusters <- list(red_cluster=1L:3L, 4:6)
  
  res <- cssSelect(X=x, y=y, clusters=good_clusters)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  testthat::expect_true(!is.null(names(res$selected_clusts)))
  testthat::expect_true(is.character(names(res$selected_clusts)))
  testthat::expect_true(length(res$selected_clusts) <=
                          length(res$selected_feats))
  # Total of 11 - 2*(3 - 1) = 7 clusters
  testthat::expect_true(length(res$selected_clusts) <= 7)
  testthat::expect_true(length(res$selected_clusts) >= 1)

  testthat::expect_true(is.list(res$selected_clusts))
  testthat::expect_equal(length(names(res$selected_clusts)),
                           length(unique(names(res$selected_clusts))))
  already_used_feats <- integer()
  for(i in 1:length(res$selected_clusts)){
    sels_i <- res$selected_clusts[[i]]
    testthat::expect_true(length(sels_i) >= 1)
    testthat::expect_true(is.integer(sels_i))
    testthat::expect_true(all(sels_i %in% 1:11))
    testthat::expect_equal(length(sels_i), length(unique(sels_i)))
    testthat::expect_equal(length(intersect(already_used_feats, sels_i)), 0)
    already_used_feats <- c(already_used_feats, sels_i)
  }
  testthat::expect_true(length(already_used_feats) <= 11)
  testthat::expect_equal(length(already_used_feats),
                         length(unique(already_used_feats)))
  testthat::expect_true(all(already_used_feats %in% 1:11))
  
  testthat::expect_true(length(res$selected_feats) <= 11)
  testthat::expect_true(is.integer(res$selected_feats))
  testthat::expect_true(length(res$selected_feats) >= 1)
  testthat::expect_equal(length(names(res$selected_feats)),
                         length(unique(names(res$selected_feats))))
  testthat::expect_true(all(res$selected_feats >= 1))
  testthat::expect_true(all(res$selected_feats <= 11))
  testthat::expect_equal(length(res$selected_feats),
                             length(unique(res$selected_feats)))
  
  # No provided clusters
  
  res <- cssSelect(X=x, y=y)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  testthat::expect_true(!is.null(names(res$selected_clusts)))
  testthat::expect_true(is.character(names(res$selected_clusts)))
  testthat::expect_true(length(res$selected_clusts) <=
                          length(res$selected_feats))

  testthat::expect_true(length(res$selected_clusts) <= 11)
  testthat::expect_true(length(res$selected_clusts) >= 1)

  testthat::expect_true(is.list(res$selected_clusts))
  testthat::expect_equal(length(names(res$selected_clusts)),
                           length(unique(names(res$selected_clusts))))
  already_used_feats <- integer()
  for(i in 1:length(res$selected_clusts)){
    sels_i <- res$selected_clusts[[i]]
    testthat::expect_true(length(sels_i) >= 1)
    testthat::expect_true(is.integer(sels_i))
    testthat::expect_true(all(sels_i %in% 1:11))
    testthat::expect_equal(length(sels_i), length(unique(sels_i)))
    testthat::expect_equal(length(intersect(already_used_feats, sels_i)), 0)
    already_used_feats <- c(already_used_feats, sels_i)
  }
  testthat::expect_true(length(already_used_feats) <= 11)
  testthat::expect_equal(length(already_used_feats),
                         length(unique(already_used_feats)))
  testthat::expect_true(all(already_used_feats %in% 1:11))
  
  testthat::expect_true(length(res$selected_feats) <= 11)
  testthat::expect_true(is.integer(res$selected_feats))
  testthat::expect_true(length(res$selected_feats) >= 1)
  testthat::expect_equal(length(names(res$selected_feats)),
                         length(unique(names(res$selected_feats))))
  testthat::expect_true(all(res$selected_feats >= 1))
  testthat::expect_true(all(res$selected_feats <= 11))
  testthat::expect_equal(length(res$selected_feats),
                             length(unique(res$selected_feats)))

  ## Trying other inputs

  # X as a data.frame
  X_df <- datasets::mtcars
  
  res <- cssSelect(X=X_df, y=stats::rnorm(nrow(X_df)), clusters=1:3)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  testthat::expect_true(!is.null(names(res$selected_clusts)))
  testthat::expect_true(is.character(names(res$selected_clusts)))
  testthat::expect_true(length(res$selected_clusts) <=
                          length(res$selected_feats))
  # Total of ncol(X_df) - (3 - 1) clusters
  testthat::expect_true(length(res$selected_clusts) <= ncol(X_df) - 2)
  testthat::expect_true(length(res$selected_clusts) >= 1)

  testthat::expect_true(is.list(res$selected_clusts))
  testthat::expect_equal(length(names(res$selected_clusts)),
                           length(unique(names(res$selected_clusts))))
  already_used_feats <- integer()
  for(i in 1:length(res$selected_clusts)){
    sels_i <- res$selected_clusts[[i]]
    testthat::expect_true(length(sels_i) >= 1)
    testthat::expect_true(is.integer(sels_i))
    testthat::expect_true(all(sels_i %in% 1:ncol(X_df)))
    testthat::expect_equal(length(sels_i), length(unique(sels_i)))
    testthat::expect_equal(length(intersect(already_used_feats, sels_i)), 0)
    already_used_feats <- c(already_used_feats, sels_i)
  }
  testthat::expect_true(length(already_used_feats) <= ncol(X_df))
  testthat::expect_equal(length(already_used_feats),
                         length(unique(already_used_feats)))
  testthat::expect_true(all(already_used_feats %in% 1:ncol(X_df)))

  testthat::expect_true(length(res$selected_feats) <= ncol(X_df))
  testthat::expect_true(is.integer(res$selected_feats))
  testthat::expect_true(length(res$selected_feats) >= 1)
  testthat::expect_equal(length(names(res$selected_feats)),
                         length(unique(names(res$selected_feats))))
  testthat::expect_true(all(res$selected_feats >= 1))
  testthat::expect_true(all(res$selected_feats <= ncol(X_df)))
  testthat::expect_equal(length(res$selected_feats),
                             length(unique(res$selected_feats)))

  # X as a dataframe with factors (number of columns of final design matrix
  # after one-hot encoding factors won't match number of columns of df2)
  # cyl, gear, and carb are factors with more than 2 levels
  df2 <- X_df
  df2$cyl <- as.factor(df2$cyl)
  df2$vs <- as.factor(df2$vs)
  df2$am <- as.factor(df2$am)
  df2$gear <- as.factor(df2$gear)
  df2$carb <- as.factor(df2$carb)
  
  # Should get error if I try to use data.frame with clusters, since data.frame
  # has factors with more than two levels
  testthat::expect_error(cssSelect(X=df2, y=stats::rnorm(nrow(X_df)),
                                   clusters=1:3),
                         "When stats::model.matrix converted the provided data.frame X to a matrix, the number of columns changed (probably because the provided data.frame contained a factor variable with at least three levels). Please convert the data.frame X to a matrix yourself using model.matrix and provide cluster assignments according to the columns of the new matrix.", fixed=TRUE)
  
  # Should be fine if I don't use clusters
  res <- cssSelect(X=df2, y=stats::rnorm(nrow(X_df)))
  
  X_df_mat <- stats::model.matrix(~ ., df2)
  X_df_mat <- X_df_mat[, colnames(X_df_mat) != "(Intercept)"]
  p <- ncol(X_df_mat)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  testthat::expect_true(!is.null(names(res$selected_clusts)))
  testthat::expect_true(is.character(names(res$selected_clusts)))
  testthat::expect_true(length(res$selected_clusts) <=
                          length(res$selected_feats))
  # Total of p - (3 - 1) clusters
  testthat::expect_true(length(res$selected_clusts) <= p - 2)
  testthat::expect_true(length(res$selected_clusts) >= 1)

  testthat::expect_true(is.list(res$selected_clusts))
  testthat::expect_equal(length(names(res$selected_clusts)),
                           length(unique(names(res$selected_clusts))))
  already_used_feats <- integer()
  for(i in 1:length(res$selected_clusts)){
    sels_i <- res$selected_clusts[[i]]
    testthat::expect_true(length(sels_i) >= 1)
    testthat::expect_true(is.integer(sels_i))
    testthat::expect_true(all(sels_i %in% 1:p))
    testthat::expect_equal(length(sels_i), length(unique(sels_i)))
    testthat::expect_equal(length(intersect(already_used_feats, sels_i)), 0)
    already_used_feats <- c(already_used_feats, sels_i)
  }
  testthat::expect_true(length(already_used_feats) <= p)
  testthat::expect_equal(length(already_used_feats),
                         length(unique(already_used_feats)))
  testthat::expect_true(all(already_used_feats %in% 1:p))

  testthat::expect_true(length(res$selected_feats) <= p)
  testthat::expect_true(is.integer(res$selected_feats))
  testthat::expect_true(length(res$selected_feats) >= 1)
  testthat::expect_equal(length(names(res$selected_feats)),
                         length(unique(names(res$selected_feats))))
  testthat::expect_true(all(res$selected_feats >= 1))
  testthat::expect_true(all(res$selected_feats <= p))
  testthat::expect_equal(length(res$selected_feats),
                             length(unique(res$selected_feats)))


  # X as a matrix with column names
  x2 <- x
  colnames(x2) <- LETTERS[1:11]
  
  res <- cssSelect(X=x2, y=y, clusters=good_clusters)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  testthat::expect_true(!is.null(names(res$selected_clusts)))
  testthat::expect_true(is.character(names(res$selected_clusts)))
  testthat::expect_true(length(res$selected_clusts) <=
                          length(res$selected_feats))
  # Total of 11 - 2*(3 - 1) = 7 clusters
  testthat::expect_true(length(res$selected_clusts) <= 7)
  testthat::expect_true(length(res$selected_clusts) >= 1)

  testthat::expect_true(is.list(res$selected_clusts))
  testthat::expect_equal(length(names(res$selected_clusts)),
                           length(unique(names(res$selected_clusts))))
  already_used_feats <- integer()
  for(i in 1:length(res$selected_clusts)){
    sels_i <- res$selected_clusts[[i]]
    testthat::expect_true(length(sels_i) >= 1)
    testthat::expect_true(is.integer(sels_i))
    testthat::expect_true(all(sels_i %in% 1:11))
    testthat::expect_equal(length(sels_i), length(unique(sels_i)))
    testthat::expect_equal(length(intersect(already_used_feats, sels_i)), 0)
    already_used_feats <- c(already_used_feats, sels_i)
  }
  testthat::expect_true(length(already_used_feats) <= 11)
  testthat::expect_equal(length(already_used_feats),
                         length(unique(already_used_feats)))
  testthat::expect_true(all(already_used_feats %in% 1:11))
  
  testthat::expect_true(length(res$selected_feats) <= 11)
  testthat::expect_true(is.integer(res$selected_feats))
  testthat::expect_true(length(res$selected_feats) >= 1)
  testthat::expect_equal(length(names(res$selected_feats)),
                         length(unique(names(res$selected_feats))))
  testthat::expect_true(all(res$selected_feats >= 1))
  testthat::expect_true(all(res$selected_feats <= 11))
  testthat::expect_equal(length(res$selected_feats),
                             length(unique(res$selected_feats)))
  
  # Vary inputs
  res <- cssSelect(X=x, y=y, clusters=good_clusters, lambda=0.01)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  res <- cssSelect(X=x, y=y, clusters=good_clusters, cutoff=0.6)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  
  res <- cssSelect(X=x, y=y, clusters=good_clusters, max_num_clusts=6)
  
  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  testthat::expect_true(length(res$selected_clusts) <= 6)
  
  res <- cssSelect(X=x, y=y, clusters=good_clusters, auto_select_size=FALSE)

  testthat::expect_true(is.list(res))
  testthat::expect_equal(length(res), 2)
  testthat::expect_identical(names(res), c("selected_clusts", "selected_feats"))
  # Total of 11 - 2*(3 - 1) = 7 clusters
  testthat::expect_equal(length(res$selected_clusts), 7)
  
  # Bad inputs
  testthat::expect_error(cssSelect(X=x[1:10, ], y=y),
                         "n == length(y) is not TRUE", fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=character(5), y=y),
                         "is.matrix(X) | is.data.frame(X) is not TRUE",
                         fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=matrix(1:15, 5, 3)),
                         "!is.matrix(y) is not TRUE", fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=factor(rbinom(15, size=1, prob=.5))),
                         "The provided y must be real-valued, because cssSelect uses the lasso for feature selection. (In order to use a different form of response, use the css function and provide your own selection function accommodating your choice of y.)",
                         fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=y, clusters="clusters"),
                         "is.numeric(clusters) | is.integer(clusters) is not TRUE",
                         fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=y, lambda=-.1),
                         "For method cssLasso, lambda must be nonnegative.",
                         fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=y, cutoff=1.1),
                         "cutoff <= 1 is not TRUE", fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=y, max_num_clusts=1000),
                         "max_num_clusts <= p is not TRUE", fixed=TRUE)
  
  testthat::expect_error(cssSelect(X=x, y=y, auto_select_size=1),
                         "is.logical(auto_select_size) is not TRUE", fixed=TRUE)
})

## ── Warning ('<text>:14'): cssSelect works ──────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y, clusters = good_clusters)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:14'): cssSelect works ──────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y, clusters = good_clusters)
##  2. litr (local) getModelSize(X, y, css_results$clusters)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:58'): cssSelect works ──────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:58'): cssSelect works ──────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y)
##  2. litr (local) getModelSize(X, y, css_results$clusters)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:105'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = X_df, y = stats::rnorm(nrow(X_df)), clusters = 1:3)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:105'): cssSelect works ─────────────────────────────────────
## Returning more than max_num_clusts = 1 clusters because increasing the cutoff any further would require returning 0 clusters
## Backtrace:
##  1. litr (local) cssSelect(X = X_df, y = stats::rnorm(nrow(X_df)), clusters = 1:3)
##  2. litr (local) getCssSelections(...)
##  3. litr (local) getSelectedClusters(...)
##  4. litr (local) checkSelectedClusters(...)
## 
## ── Warning ('<text>:164'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = df2, y = stats::rnorm(nrow(X_df)))
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:215'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x2, y = y, clusters = good_clusters)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:215'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x2, y = y, clusters = good_clusters)
##  2. litr (local) getModelSize(X, y, css_results$clusters)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:258'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y, clusters = good_clusters, lambda = 0.01)
##  2. litr (local) getModelSize(X, y, css_results$clusters)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:264'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y, clusters = good_clusters, cutoff = 0.6)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:270'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y, clusters = good_clusters, max_num_clusts = 6)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:277'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssSelect(X = x, y = y, clusters = good_clusters, auto_select_size = FALSE)
##  2. litr (local) getLassoLambda(X, y)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:293'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssSelect(X = x, y = matrix(1:15, 5, 3))
##   8. litr (local) getLassoLambda(X, y)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:300'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssSelect(X = x, y = y, clusters = "clusters")
##   8. litr (local) getLassoLambda(X, y)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:308'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssSelect(X = x, y = y, cutoff = 1.1)
##   8. litr (local) getLassoLambda(X, y)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:311'): cssSelect works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssSelect(X = x, y = y, max_num_clusts = 1000)
##   8. litr (local) getLassoLambda(X, y)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)

cssPredict():

#' Wrapper function to generate predictions from cluster stability selected
#' model in one step
#'
#' Select clusters using cluster stability selection, form cluster
#' representatives, fit a linear model, and generate predictions from a matrix
#' of unlabeled data. This is a wrapper function for css and getCssPreds. Using
#' cssPredict is simpler, but it has fewer options, and it executes the full
#' (computationally expensive) subsampling procedured every time it is called.
#' In contrast, css can be called just once, and then cssPredict can quickly
#' return results for different matrices of new data or using different values
#' of cutoff, max_num_clusts, etc. by using the calculations done in one call to
#' css.
#'
#' @param X_train_selec An n x p numeric matrix (preferably) or a data.frame
#' (which will be coerced internally to a matrix by the function model.matrix)
#' containing the p >= 2 features/predictors. The data from X_train_selec and
#' y_train_selec will be split into two parts; half of the data will be used for
#' feature selection by cluster stability selection, and half will be used for
#' estimating a linear model on the selected cluster representatives.
#' @param y_train_selec A length-n numeric vector containing the responses;
#' `y[i]` is the response corresponding to observation `X[i, ]`. Unlke the more
#' general setup of css, y_train_selec must be real-valued because predictions
#' will be generated by ordinary least squares.
#' @param X_test A numeric matrix (preferably) or a data.frame (which will
#' be coerced internally to a matrix by the function model.matrix) containing
#' the data that will be used to generate predictions. Must contain the same
#' features (in the same number of columns) as X_train_selec, and if the columns
#' of X_test are named, they must match the names of X_train_selec.
#' @param clusters Optional; either an integer vector of a list of integer
#' vectors; each vector should contain the indices of a cluster of features (a
#' subset of 1:p). (If there is only one cluster, clusters can either be a list
#' of length 1 or an integer vector.) All of the provided clusters must be
#' non-overlapping. Every feature not appearing in any cluster will be assumed
#' to be unclustered (that is, they  will be treated as if they are in a
#' "cluster" containing only themselves). If clusters is a list of length 0 (or
#' a list only containing clusters of length 1), then css() returns the same
#' results as stability selection (so feat_sel_mat will be identical to
#' clus_sel_mat). Names for the clusters will be needed later; any clusters that
#' are not given names in the list clusters will be given names automatically by
#' css. CAUTION: if the provided X is a data.frame that contains a categorical
#' feature with more than two levels, then the resulting matrix made from
#' model.matrix will have a different number of columns than the provided
#' data.frame, some of the feature numbers will change, and the clusters
#' argument will not work properly (in the current version of the package). To
#' get correct results in this case, please use model.matrix to convert the
#' data.frame to a numeric matrix on your own, then provide this matrix and
#' cluster assignments with respect to this matrix.Default is list() (so no
#' clusters are specified, and every feature is assumed to be in a "cluster"
#' containing only itself).
#' @param lambda Optional; the tuning parameter to be used by the lasso for
#' feature selection in each subsample. If lambda is not provided, cssPredict
#' will choose one automatically by cross-validation. Default is NA.
#' @param cutoff Numeric; getCssPreds will make use only of those clusters with
#' selection proportions equal to at least cutoff. Must be between 0 and 1.
#' Default is 0 (in which case either all clusters are used, or max_num_clusts
#' are used, if max_num_clusts is specified).
#' @param max_num_clusts Integer or numeric; the maximum number of clusters to
#' use regardless of cutoff. (That is, if the chosen cutoff returns more than
#' max_num_clusts clusters, the cutoff will be decreased until at most
#' max_num_clusts clusters are selected.) Default is NA (in which case
#' max_num_clusts is ignored).
#' @param train_inds Optional; an integer or numeric vector containing the
#' indices of observations in X and y to set aside for model training after
#' feature selection. If train_inds is not provided, half of the data will be
#' used for feature selection and half for model estimation (chosen at random).
#' @param auto_select_size Logical; if TRUE, then max_num_clusts will be
#' automatically estimated using the lasso with cross-validation. Default is
#' TRUE, though his argument is ignored if either cutoff or max_num_clusts is
#' provided. (If desired output is to generate predictions using all clusters,
#' you should set auto_select_size to FALSE and do not provide cutoff or
#' max_num_clusts.)
#' @return A numeric vector of length nrow(X_test) of predictions
#' corresponding to the observations from X_test.
#' @author Gregory Faletto, Jacob Bien
#' @export
cssPredict <- function(X_train_selec, y_train_selec, X_test, clusters=list(),
    lambda=NA, cutoff=NA, max_num_clusts=NA, train_inds=NA,
    auto_select_size=TRUE){

    # Check inputs (most inputs will be checked by called functions)
    if(!is.numeric(y_train_selec) & !is.integer(y_train_selec)){
        stop("The provided y_train_selec must be real-valued, because predictions will be generated by ordinary least squares regression.")
    }

    stopifnot(!is.na(auto_select_size))
    stopifnot(length(auto_select_size) == 1)
    stopifnot(is.logical(auto_select_size))

    stopifnot(is.matrix(X_train_selec) | is.data.frame(X_train_selec))
    stopifnot(all(!is.na(X_train_selec)))

    # Check if x is a matrix; if it's a data.frame, convert to matrix.
    if(is.data.frame(X_train_selec)){
        p <- ncol(X_train_selec)
        X_train_selec <- stats::model.matrix(~ ., X_train_selec)
        X_train_selec <- X_train_selec[, colnames(X_train_selec) !=
            "(Intercept)"]

        if(length(clusters) > 0 & (p != ncol(X_train_selec))){
            stop("When stats::model.matrix converted the provided data.frame X_train_selec to a matrix, the number of columns changed (probably because the provided data.frame contained a factor variable with at least three levels). Please convert X_train_selec to a matrix yourself using model.matrix and provide cluster assignments according to the columns of the new matrix.")
        }
    }

    stopifnot(is.matrix(X_train_selec))
    stopifnot(all(!is.na(X_train_selec)))

    n <- nrow(X_train_selec)

    if(any(is.na(train_inds))){
        train_inds <- sample(n, size=round(n/2))
    }

    stopifnot(length(lambda) == 1)
    if(is.na(lambda)){
        lambda <- getLassoLambda(X_train_selec[setdiff(1:n, train_inds), ],
            y_train_selec[setdiff(1:n, train_inds)]) 
    }

    css_results <- css(X=X_train_selec, y=y_train_selec, lambda=lambda,
        clusters=clusters, train_inds=train_inds)

    # If no indication of how to select model size was provided, choose model
    # size by cross-validation
    if(is.na(cutoff) & is.na(max_num_clusts)){
        if(auto_select_size){
            max_num_clusts <- getModelSize(X_train_selec[train_inds, ],
                y_train_selec[train_inds], css_results$clusters)
        }
    }

    if(is.na(cutoff)){
        cutoff <- 0
    }

    # Get predictions
    getCssPreds(css_results, testX=X_test, weighting="weighted_avg",
        cutoff=cutoff, max_num_clusts=max_num_clusts)
}

Tests for cssPredict():

testthat::test_that("cssPredict works", {
  set.seed(84231)
  
  train_data <- genClusteredData(n=30, p=11, k_unclustered=1, cluster_size=3,
                              n_clusters=2, sig_clusters=1, sigma_eps_sq=1)
  
  x <- train_data$X
  y <- train_data$y
  
  test_x <- genClusteredData(n=5, p=11, k_unclustered=1, cluster_size=3,
                          n_clusters=2, sig_clusters=1, sigma_eps_sq=1)$X
  
  # Intentionally don't provide clusters for all features, mix up formatting,
  # etc.
  good_clusters <- list(red_cluster=1L:3L, 4:6)
  
  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x,
                    clusters=good_clusters)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)
  
  # No provided clusters

  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x)

  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)
  
  # Provide training indices
  
  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x,
                    train_inds=13:28)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)

  ## Trying other inputs

  # X as a data.frame
  X_df <- datasets::mtcars
  
  n <- nrow(X_df)
  test_inds <- 1:round(n/3)
  n_test <- length(test_inds)
  selec_train_inds <- setdiff(1:n, test_inds)
  n_selec_train <- length(selec_train_inds)

  res <- cssPredict(X_train_selec=X_df[selec_train_inds, ],
                    y_train_selec=stats::rnorm(n_selec_train),
                    X_test=X_df[test_inds, ])
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), n_test)

  # X as a dataframe with factors (number of columns of final design matrix
  # after one-hot encoding factors won't match number of columns of df2)
  # cyl, gear, and carb are factors with more than 2 levels
  df2 <- X_df
  df2$cyl <- as.factor(df2$cyl)
  df2$vs <- as.factor(df2$vs)
  df2$am <- as.factor(df2$am)
  df2$gear <- as.factor(df2$gear)
  df2$carb <- as.factor(df2$carb)
  
  # Should get error if clusters are provided because df2 contains factors with
  # more than two levels
  testthat::expect_error(cssPredict(X_train_selec=df2[selec_train_inds, ],
                                    y_train_selec=stats::rnorm(n_selec_train),
                                    X_test=df2[test_inds, ], clusters=1:3),
                         "When stats::model.matrix converted the provided data.frame X_train_selec to a matrix, the number of columns changed (probably because the provided data.frame contained a factor variable with at least three levels). Please convert X_train_selec to a matrix yourself using model.matrix and provide cluster assignments according to the columns of the new matrix.",
                         fixed=TRUE)
  
  # Should be fine if no clusters are provided
  res <- cssPredict(X_train_selec=df2[selec_train_inds, ],
                    y_train_selec=stats::rnorm(n_selec_train),
                    X_test=df2[test_inds, ])
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), n_test)

  # X as a matrix with column names
  x2 <- x
  colnames(x2) <- LETTERS[1:11]
  test_x2 <- test_x
  colnames(test_x2) <- LETTERS[1:11]

  res <- cssPredict(X_train_selec=x2, y_train_selec=y, X_test=test_x2,
                    clusters=good_clusters)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)

  # Vary inputs
  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x,
                    lambda=0.01)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)

  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x, cutoff=0.6)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)

  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x,
                    max_num_clusts=6)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)

  res <- cssPredict(X_train_selec=x, y_train_selec=y, X_test=test_x,
                    auto_select_size=FALSE)
  
  testthat::expect_true(all(!is.na(res)))
  testthat::expect_true(is.numeric(res))
  testthat::expect_equal(length(res), 5)

  # Bad inputs
  testthat::expect_error(cssPredict(X_train_selec=x[1:10, ], y_train_selec=y,
                                    X_test=test_x),
                         "length(y) == n is not TRUE", fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=character(30),
                                    y_train_selec=y, X_test=test_x),
                         "is.matrix(X_train_selec) | is.data.frame(X_train_selec) is not TRUE",
                         fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x,
                                    y_train_selec=matrix(1:30, 10, 3),
                                    X_test=test_x), "!is.matrix(y) is not TRUE",
                         fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x,
                                    y_train_selec=factor(rbinom(30, size=1,
                                                                prob=.5)),
                                    X_test=test_x),
                         "The provided y_train_selec must be real-valued, because predictions will be generated by ordinary least squares regression.",
                         fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x, y_train_selec=y,
                                    X_test=test_x, clusters="clusters"),
                         "is.numeric(clusters) | is.integer(clusters) is not TRUE",
                         fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x, y_train_selec=y,
                                    X_test=test_x, lambda="lambda"),
                         "For method cssLasso, lambda must be a numeric.",
                         fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x, y_train_selec=y,
                                    X_test=test_x, cutoff=-.1),
                         "cutoff >= 0 is not TRUE", fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x, y_train_selec=y,
                                    X_test=test_x, max_num_clusts=0),
                         "max_num_clusts >= 1 is not TRUE", fixed=TRUE)

  testthat::expect_error(cssPredict(X_train_selec=x, y_train_selec=y,
                                    X_test=test_x, auto_select_size=c(TRUE,
                                                                      FALSE)),
                         "length(auto_select_size) == 1 is not TRUE",
                         fixed=TRUE)
})

## ── Warning ('<text>:17'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:17'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:26'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(X_train_selec = x, y_train_selec = y, X_test = test_x)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:26'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(X_train_selec = x, y_train_selec = y, X_test = test_x)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:34'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:34'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:52'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:52'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:79'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:79'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:93'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:93'): cssPredict works ─────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:101'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getModelSize(...)
##  3. glmnet::cv.glmnet(x = X_size, y = y, family = "gaussian")
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:108'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:114'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:121'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##  1. litr (local) cssPredict(...)
##  2. litr (local) getLassoLambda(...)
##  3. glmnet::cv.glmnet(...)
##  4. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:129'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssPredict(X_train_selec = x[1:10, ], y_train_selec = y, X_test = test_x)
##   8. litr (local) getLassoLambda(...)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:138'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssPredict(...)
##   8. litr (local) getLassoLambda(...)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:150'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssPredict(...)
##   8. litr (local) getLassoLambda(...)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:160'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssPredict(...)
##   8. litr (local) getLassoLambda(...)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)
## 
## ── Warning ('<text>:164'): cssPredict works ────────────────────────────────────
## Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
## Backtrace:
##   1. testthat::expect_error(...)
##   7. litr (local) cssPredict(...)
##   8. litr (local) getLassoLambda(...)
##   9. glmnet::cv.glmnet(...)
##  10. glmnet:::cv.glmnet.raw(...)