This function performs regularized regression analysis using LASSO (Least Absolute Shrinkage and Selection Operator) and/or Elastic Net regression methods. Both methods are regularization techniques that prevent overfitting by adding penalty terms to the loss function. LASSO uses L1 regularization for feature selection by shrinking coefficients to zero, while Elastic Net combines L1 and L2 penalties to handle multicollinearity and perform simultaneous feature selection. The function supports both binary and multinomial classification tasks with comprehensive model evaluation and result reporting.

perform_Regression(
  data,
  method = "enet",
  specify_response = NULL,
  train_percent = 80,
  ref = NULL,
  lambda = "1se",
  remember = NULL,
  verbose = TRUE,
  cv_folds = 10,
  parallel = FALSE
)

Arguments

data

A list object containing preprocessed data. Must be the output from the perform_PreprocessingPeakData function, containing the following elements:

  • FunctionOrigin: Character string indicating data source

  • Metadata: Data frame with sample metadata including Group column

  • data_scaledPCA_rsdFiltered_varFiltered: Matrix of preprocessed features

method

Character vector specifying regression method(s) to perform. Options:

  • "lasso": LASSO regression only (L1 penalty, alpha = 1)

  • "enet": Elastic Net regression only (L1+L2 penalty, alpha = 0.5)

  • c("lasso", "enet"): Both methods (recommended for comparison)

Default: "enet"

specify_response

Character string specifying the response variable column name. If NULL, uses the Group column from metadata. Default: NULL

train_percent

Numeric value between 1 and 99 specifying the percentage of data to use for training. Remaining data used for testing. Default: 80

ref

Character string specifying the reference level for the response variable. If NULL, uses the first factor level alphabetically. Default: NULL

lambda

Character string specifying lambda selection criterion:

  • "1se": Lambda within one standard error of minimum (conservative, fewer features)

  • "min": Lambda that minimizes cross-validation error (aggressive, more features)

Default: "1se"

remember

Numeric value for reproducible results. Sets random seed using set.seed(remember). If NULL, no seed is set. Default: NULL

verbose

Logical indicating whether to print progress messages and results to console. Default: TRUE

cv_folds

Integer specifying number of cross-validation folds for model selection. Must be between 3 and 20. Default: 10

parallel

Logical indicating whether to use parallel processing for cross-validation. Default: FALSE

Value

A list containing regression results with the following structure:

FunctionOrigin

Character string identifying the source function

ModelSummary

Data frame summarizing model performance metrics

DataSplit

List containing training/testing data split information

LASSO_Results

List of LASSO results (if method includes "lasso")

ElasticNet_Results

List of Elastic Net results (if method includes "enet")

ComparisonSummary

Data frame comparing methods (if both performed)

Each method-specific results list contains:

  • Model: Fitted cv.glmnet object

  • Predictions: Data frame with actual vs predicted values

  • ConfusionMatrix: Complete confusion matrix object

  • Performance: Data frame with accuracy, sensitivity, specificity, etc.

  • Coefficients: Data frame with non-zero coefficients and odds ratios

  • Lambda: Selected lambda value

  • Alpha: Alpha parameter used

  • ReferenceLevel: Reference level for classification

Details

Perform Regularized Regression Analysis

Author

John Lennon L. Calorio

Examples

if (FALSE) { # \dontrun{
# Load required libraries
library(glmnet)
library(caret)
library(dplyr)

# Perform both LASSO and Elastic Net regression
regression_results <- perform_Regression(
  data = preprocessed_data,
  method = c("lasso", "enet"),
  train_percent = 75,
  lambda = "1se",
  remember = 123,
  cv_folds = 10
)

# View model comparison
print(regression_results$ModelSummary)
print(regression_results$ComparisonSummary)

# Access LASSO results
lasso_coef <- regression_results$LASSO_Results$Coefficients
lasso_perf <- regression_results$LASSO_Results$Performance

# Access Elastic Net results
enet_coef <- regression_results$ElasticNet_Results$Coefficients
enet_perf <- regression_results$ElasticNet_Results$Performance

# View confusion matrices
print(regression_results$LASSO_Results$ConfusionMatrix$table)
print(regression_results$ElasticNet_Results$ConfusionMatrix$table)
} # }