perform_PLS.RdThis function performs various Partial Least Squares (PLS) methods including PLS, PLS-DA, OPLS-DA, and sparse PLS-DA (sPLS-DA) on metabolomics or other omics data. It provides comprehensive analysis with visualization and feature importance assessment.
perform_PLS(
data,
method = "oplsda",
arrangeLevels = NULL,
includeQC = FALSE,
predI = 1,
orthoI = NA,
crossvalI = 10,
permI = 20,
scaleC = "none",
top_features = 20,
keepX = NULL,
ncomp = 2,
validation = "Mfold",
folds = 10,
verbose = TRUE
)A list containing processed data with the following required components:
data_scaledPLS_rsdFiltered_varFiltered: Matrix or data.frame of processed data
Metadata: Data.frame containing sample metadata with 'Group' and 'Group_' columns
Character string specifying the PLS method to use. Options are:
"pls": Standard PLS regression (experimental)
"plsda": PLS Discriminant Analysis (experimental)
"oplsda": Orthogonal PLS Discriminant Analysis (default)
"splsda": sparse PLS Discriminant Analysis (experimental)
Character vector specifying the order of group levels for analysis. If NULL (default), all unique groups will be used in alphabetical order.
Logical indicating whether to include QC samples in visualizations. Note: QC samples are never included in model building. Default is FALSE.
Integer specifying the number of predictive components. Default is 1.
Integer specifying the number of orthogonal components for OPLS-DA. If NA (default), optimal number is determined automatically.
Integer specifying the number of cross-validation folds. Default is 10.
Integer specifying the number of permutations for validation. Default is 20.
Character string specifying scaling method. Options are "none", "center", "pareto", "unit". Default is "none".
Integer specifying the number of top VIP features to display in plots. Default is 20.
Integer vector for sPLS-DA specifying the number of variables to keep on each component. Default is NULL (automatic selection).
Integer specifying the number of components for sPLS-DA. Default is 2.
Character string for sPLS-DA validation method. Default is "Mfold".
Integer for sPLS-DA cross-validation folds. Default is 10.
Logical indicating whether to print progress messages. Default is TRUE.
A list containing analysis results with the following components:
method: The PLS method used
data_used: The data matrix used for analysis
results_[method]_[comparison]: Model results for each pairwise comparison
data_VIPScores_[comparison]: VIP scores for each comparison (PLS-DA/OPLS-DA)
data_Abundance_[comparison]: Abundance data for top features
plot_VIPAbundance_[comparison]: Combined VIP and abundance plots
data_SPlot_[comparison]: S-plot data (OPLS-DA only)
plot_SPlot_[comparison]: S-plots (OPLS-DA only)
plot_Scores_[comparison]: Score plots for each comparison
summary: Summary statistics and model performance metrics
if (FALSE) { # \dontrun{
# Example data structure
data <- list(
data_scaledPLS_rsdFiltered_varFiltered = matrix(rnorm(1000), nrow = 50, ncol = 20),
Metadata = data.frame(
Group = c(rep(c("Control", "Treatment"), c(20, 20)), rep(c("SQC", "EQC"), c(5, 5))),
Group_ = c(rep(c("Control", "Treatment"), c(20, 20)), rep("QC", 10))
)
)
colnames(data$data_scaledPLS_rsdFiltered_varFiltered) <- paste0("Feature_", 1:20)
# Perform OPLS-DA
results <- perform_PLS(data, method = "oplsda")
# Perform PLS-DA with specific group arrangement
results <- perform_PLS(data, method = "plsda",
arrangeLevels = c("Control", "Treatment"))
# Perform sparse PLS-DA
results <- perform_PLS(data, method = "splsda", ncomp = 3, keepX = c(10, 5, 5))
} # }