Conducts comprehensive comparative statistical analysis on preprocessed metabolomics data. The function automatically selects appropriate statistical tests based on data characteristics including normality, variance homogeneity, and sample independence. Supports both two-group and multi-group comparisons with parametric and non-parametric alternatives.

perform_ComparativeAnalysis(
  data,
  adjust_p_method = "BH",
  sort_p = TRUE,
  paired = FALSE,
  plot_metabolites = NULL,
  alpha = 0.05,
  min_group_size = 3,
  verbose = FALSE
)

Arguments

data

List. Output from perform_PreprocessingPeakData function containing:

  • data_scaledPCA_rsdFiltered_varFiltered: Numeric matrix of processed metabolite data

  • Metadata: Data frame with sample metadata including 'Group' column

adjust_p_method

Character. Method for p-value adjustment. Default is "BH". Options include:

  • "holm": Holm (1979) - Controls family-wise error rate

  • "hochberg": Hochberg (1988) - Less conservative than Bonferroni

  • "hommel": Hommel (1988) - More powerful than Hochberg

  • "bonferroni": Classical Bonferroni correction

  • "BH": Benjamini & Hochberg (1995) - Controls false discovery rate

  • "BY": Benjamini & Yekutieli (2001) - More conservative FDR control

  • "fdr": Alias for "BH"

  • "none": No adjustment

sort_p

Logical. If TRUE (default), results are sorted by adjusted p-values in ascending order.

paired

Logical. If TRUE, performs paired sample tests. Default is FALSE. Note: Requires equal group sizes for multi-group comparisons.

plot_metabolites

Character vector. Names of metabolites to visualize. If provided, generates statistical plots using ggstatsplot. Default is NULL (no plots).

alpha

Numeric. Significance threshold for assumption tests. Default is 0.05.

min_group_size

Integer. Minimum required sample size per group. Default is 3.

verbose

Logical. If TRUE, prints detailed progress information. Default is FALSE.

Value

List containing:

  • results: Data frame with statistical test results for each metabolite

  • plots: List of ggplot objects (if plot_metabolites specified)

  • summary: Summary statistics of the analysis

  • assumptions: Results of assumption tests

  • metadata: Analysis metadata and parameters used

Details

The function performs the following workflow:

  1. Validates input data structure and parameters

  2. Removes quality control samples from analysis

  3. Tests statistical assumptions (normality, variance homogeneity)

  4. Selects appropriate statistical tests automatically

  5. Applies multiple comparison corrections

  6. Generates optional visualization plots

For two-group comparisons, the function chooses between:

  • Paired/Independent t-test (parametric)

  • Welch's t-test (unequal variances)

  • Mann-Whitney U test (non-parametric)

  • Wilcoxon signed-rank test (paired non-parametric)

For multi-group comparisons:

  • One-way ANOVA (parametric)

  • Repeated measures ANOVA (paired)

  • Kruskal-Wallis test (non-parametric)

Author

John Lennon L. Calorio

Examples

if (FALSE) { # \dontrun{
# Basic two-group comparison
results <- perform_ComparativeAnalysis(
  data = preprocessed_data,
  adjust_p_method = "BH"
)

# Paired comparison with plots
results <- perform_ComparativeAnalysis(
  data = preprocessed_data,
  paired = TRUE,
  plot_metabolites = c("metabolite_1", "metabolite_2"),
  verbose = TRUE
)

# Multi-group comparison with strict correction
results <- perform_ComparativeAnalysis(
  data = preprocessed_data,
  adjust_p_method = "bonferroni",
  sort_p = TRUE
)
} # }