introduction.RmdMetaboStatR provides a comprehensive workflow for analyzing metabolomics data. This package is designed to handle the complete pipeline from raw data preprocessing to statistical analysis and visualization, with particular emphasis on managing multiple groups and batch effects.
Metabolomics is the large-scale study of small molecules, commonly known as metabolites, within cells, biofluids, tissues, or organisms. These metabolites are the end products of cellular processes and their levels can be regarded as the ultimate response of biological systems to genetic or environmental changes.
# Install from GitHub (once package is properly structured)
devtools::install_github("jllcalorio/MetaboStatR")
# Load the package
library(MetaboStatR)Before using MetaboStatR, ensure your data follows the correct format:
library(dplyr)
# Example data format
sample_data <- data.frame(
Sample = c("S001", "S002", "S003", "S004", "S005", "S006"),
SubjectID = rep(NA, 6),
Replicate = rep(NA, 6),
Group = c("Control", "Control", "Control", "Treatment", "Treatment", "Treatment"),
Batch = c("Batch1", "Batch1", "Batch2", "Batch1", "Batch2", "Batch2"),
Injection = c(1, 2, 3 ,4, 5, 6),
Normalization = c(3.5, 1.3, 5.6, 6.3, 2, 1.2),
Response = rep(NA, 6)
) %>% t()
head(sample_data)
# Quality-check your data
data <- perform_DataQualityCheck("file_location_of_your_metabolomics_data.csv")The preprocessing step is crucial for metabolomics data quality. These preprocessing steps (in specific order) include:
# Using the default parameters
processed_data <- perform_PreprocessingPeakData(
raw_data = list_from_perform_DataQualityCheck_function
)Explore data structure and quality:
# Perform PCA and produce a scree plot and 3 scores plots
pca_results <- perform_PCA(
processed_data,
# PC1 vs 2 Pc1 vs 3 PC2 vs 3
list(c(1, 2), c(1, 3), c(2, 3))
)
# Generate all 4 PCA plots in 1 plot
display_AllPlots(pca_results)Identify discriminating metabolites:
# Perform OPLS-DA
plsda_results <- perform_PLS(
data = processed_data,
method = "oplsda",
arrangeLevels = c("Severe Case", "Less Severe Case", "Control")
)
# "arrangeLevels" is suggested to be arranged where severe cases first then control at the endCalculate effect sizes:
# Calculate fold changes
fc_results <- calculate_fold_changes(
data = processed_data,
arrangeLevels = c("Severe Case", "Less Severe Case", "Control")
)
# "arrangeLevels" is suggested to be arranged where severe cases first then control at the endTest for significant differences between groups:
# Perform statistical tests
stats_results <- perform_ComparativeAnalysis(
data = processed_data,
adjust_p_method = "BH"
)
auroc_results <- perform_AUROC(
data_PP = processed_data, # The list from preprocessing
data_DR = plsda_results, # The list from OPLS-DA
data_FC = fc_results, # The list from fold change analysis
data_CA = stats_results, # The list from statistical analysis
arrangeLevels = c("Control", "Less Severe Case", "Severe Case")
)
# UNLIKE OPLS-DA and fold change analysis, "arrangeLevels" in AUROC is suggested to be arranged where controls first then severe cases at the end
regression_results <- perform_Regression(
data = processed_data,
method = "enet"
)
# method options are "enet" for Elastic net regression
# "lasso" for LASSO
# "both" to perform both enet and lassoEach item in the “list” parameter will be checked for data frames, tibbles and the likes, that can be exported to Excel. Each item in the list will have its own Excel workbook. Each data frame will have its own worksheet. Long names will be shortened to match Excel’s limit.
perform_Export2Excel(
results = list(
processed_data,
pca_results,
plsda_results,
fc_results,
stats_results,
auroc_results,
regression_results
),
folder_name = "Results_Folder", # Which folder to store the results
file_name = "Results_Export" # Base file names of the exported data frames
)
# Note:
# "file_name" can accept a list of the same length as "results" to specify the file name of each listPlots and figures are exported to PNG, JPG, etc.
perform_ExportPlots2Image(
results = list(
processed_data,
pca_results,
plsda_results,
fc_results,
stats_results,
auroc_results,
regression_results
),
folder_name = "Plots_Export",
file_prefix = "Plot",
image_format = "png",
width = 15,
height = 12,
dpi = 300
)Create publication-ready plots:
# Volcano plot
plot_Volcano(
PPData = processed_data,
FCData = fc_results,
CAData = stats_results,
arrangeLevels = c("Severe Case", "Less Severe Case", "Control"),
fcUP = 2, # Fold change threshold
fcDown = 0.5, # Fold change threshold
adjpvalue = 0.05 # Adjusted p-value threshold
)After running the complete workflow, you should have:
For questions about MetaboStatR: - Check the function documentation:
?function_name - Browse additional vignettes for specific
topics - Report issues on GitHub: https://github.com/jllcalorio/MetaboStatR/issues -
Contact: jllcalorio@gmail.com
sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=English_Philippines.utf8 LC_CTYPE=English_Philippines.utf8
#> [3] LC_MONETARY=English_Philippines.utf8 LC_NUMERIC=C
#> [5] LC_TIME=English_Philippines.utf8
#>
#> time zone: Asia/Manila
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.37 desc_1.4.3 R6_2.6.1 fastmap_1.2.0
#> [5] xfun_0.52 cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1
#> [9] rmarkdown_2.29 lifecycle_1.0.4 cli_3.6.5 sass_0.4.10
#> [13] pkgdown_2.1.3 textshaping_1.0.1 jquerylib_0.1.4 systemfonts_1.2.3
#> [17] compiler_4.4.1 rstudioapi_0.17.1 tools_4.4.1 ragg_1.4.0
#> [21] bslib_0.9.0 evaluate_1.0.4 yaml_2.3.10 jsonlite_2.0.0
#> [25] rlang_1.1.6 fs_1.6.6 htmlwidgets_1.6.4