Perform Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) and Related Methods

This function performs various Partial Least Squares (PLS) methods including PLS, PLS-DA, OPLS-DA, and sparse PLS-DA (sPLS-DA) on metabolomics or other omics data. It provides comprehensive analysis with visualization and feature importance assessment.

perform_PLS(
  data,
  method = "oplsda",
  arrangeLevels = NULL,
  includeQC = FALSE,
  predI = 1,
  orthoI = NA,
  crossvalI = 10,
  permI = 20,
  scaleC = "none",
  top_features = 20,
  keepX = NULL,
  ncomp = 2,
  validation = "Mfold",
  folds = 10,
  verbose = TRUE
)

Arguments

data

A list containing processed data with the following required components:

data_scaledPLS_rsdFiltered_varFiltered: Matrix or data.frame of processed data
Metadata: Data.frame containing sample metadata with 'Group' and 'Group_' columns

method

Character string specifying the PLS method to use. Options are:

"pls": Standard PLS regression (experimental)
"plsda": PLS Discriminant Analysis (experimental)
"oplsda": Orthogonal PLS Discriminant Analysis (default)
"splsda": sparse PLS Discriminant Analysis (experimental)

arrangeLevels

Character vector specifying the order of group levels for analysis. If NULL (default), all unique groups will be used in alphabetical order.

includeQC

Logical indicating whether to include QC samples in visualizations. Note: QC samples are never included in model building. Default is FALSE.

predI

Integer specifying the number of predictive components. Default is 1.

orthoI

Integer specifying the number of orthogonal components for OPLS-DA. If NA (default), optimal number is determined automatically.

crossvalI

Integer specifying the number of cross-validation folds. Default is 10.

permI

Integer specifying the number of permutations for validation. Default is 20.

scaleC

Character string specifying scaling method. Options are "none", "center", "pareto", "unit". Default is "none".

top_features

Integer specifying the number of top VIP features to display in plots. Default is 20.

keepX

Integer vector for sPLS-DA specifying the number of variables to keep on each component. Default is NULL (automatic selection).

ncomp

Integer specifying the number of components for sPLS-DA. Default is 2.

validation

Character string for sPLS-DA validation method. Default is "Mfold".

folds

Integer for sPLS-DA cross-validation folds. Default is 10.

verbose

Logical indicating whether to print progress messages. Default is TRUE.

Value

A list containing analysis results with the following components:

method: The PLS method used
data_used: The data matrix used for analysis
results_[method]_[comparison]: Model results for each pairwise comparison
data_VIPScores_[comparison]: VIP scores for each comparison (PLS-DA/OPLS-DA)
data_Abundance_[comparison]: Abundance data for top features
plot_VIPAbundance_[comparison]: Combined VIP and abundance plots
data_SPlot_[comparison]: S-plot data (OPLS-DA only)
plot_SPlot_[comparison]: S-plots (OPLS-DA only)
plot_Scores_[comparison]: Score plots for each comparison
summary: Summary statistics and model performance metrics

Author

John Lennon L. Calorio

Examples

if (FALSE) { # \dontrun{
# Example data structure
data <- list(
  data_scaledPLS_rsdFiltered_varFiltered = matrix(rnorm(1000), nrow = 50, ncol = 20),
  Metadata = data.frame(
    Group = c(rep(c("Control", "Treatment"), c(20, 20)), rep(c("SQC", "EQC"), c(5, 5))),
    Group_ = c(rep(c("Control", "Treatment"), c(20, 20)), rep("QC", 10))
  )
)
colnames(data$data_scaledPLS_rsdFiltered_varFiltered) <- paste0("Feature_", 1:20)

# Perform OPLS-DA
results <- perform_PLS(data, method = "oplsda")

# Perform PLS-DA with specific group arrangement
results <- perform_PLS(data, method = "plsda",
                      arrangeLevels = c("Control", "Treatment"))

# Perform sparse PLS-DA
results <- perform_PLS(data, method = "splsda", ncomp = 3, keepX = c(10, 5, 5))
} # }