perform_DataQualityCheck.RdThis function imports and validates metabolomics data from Excel, CSV, TSV, or text files, performing comprehensive quality controls to ensure data integrity and compatibility with downstream analysis functions. The function validates data structure, checks for required metadata rows, ensures uniqueness constraints, cleans special characters from identifiers, and prepares data for preprocessing pipelines.
Key validation checks include:
Presence of required metadata rows (Sample, SubjectID, Replicate, Group, Batch, Injection, Normalization, Response)
Uniqueness of sample names, injection sequences, and feature/metabolite identifiers
Proper QC sample annotation (empty values in SubjectID, Replicate, Normalization, Response for QC samples)
Numeric validation of feature/metabolite data
Character cleaning and standardization of identifiers
perform_DataQualityCheck(
file_location = NULL,
sheet_name = NULL,
skip_rows = 0,
separator = ",",
validate_qc = TRUE,
allow_missing_optional = TRUE,
clean_names = TRUE,
verbose = TRUE
)Character string specifying the file path. If NULL, an interactive file selection dialog will open. Supported formats: .xlsx, .csv, .tsv, .txt
Character string specifying the Excel worksheet name. Ignored for non-Excel files. If NULL for Excel files, the first sheet is used.
Integer specifying the number of rows to skip when reading the file. Default is 0.
Character string specifying the field separator for delimited files. Common values: "," (comma), "\t" (tab). Default is ",". Ignored for Excel files.
Logical indicating whether to enforce QC validation rules. Default is TRUE.
Logical indicating whether to allow missing values in optional metadata rows (SubjectID, Replicate, Normalization, Response). Default is TRUE.
Logical indicating whether to clean special characters from names. Default is TRUE.
Logical indicating whether to display progress messages. Default is TRUE.
A list containing:
Data frame with the original data as loaded from the file
Data frame with validated and cleaned data, sorted by injection sequence
Summary statistics of the metadata
Detailed validation results
Information about the source file
Log of all processing steps performed
The input data must follow a specific structure:
Row 1: "Sample" - Unique sample identifiers (no spaces recommended)
Row 2: "SubjectID" - Numeric subject identifiers (can be non-unique)
Row 3: "Replicate" - Replicate identifiers (can be non-unique)
Row 4: "Group" - Group assignments including QC samples
Row 5: "Batch" - Batch numbers
Row 6: "Injection" - Unique injection sequence numbers
Row 7: "Normalization" - Concentration markers (e.g., osmolality)
Row 8: "Response" - Response variable values
Rows 9+: Feature/metabolite data (e.g., m/z@retention_time format)
Missing values should be left blank or encoded as 0. QC samples in the Group row must have empty values in SubjectID, Replicate, Normalization, and Response rows.
perform_PreprocessingPeakData for the next step in the analysis pipeline
if (FALSE) { # \dontrun{
# Basic usage with file selection dialog
result <- perform_DataQualityCheck()
# Specify file location directly
result <- perform_DataQualityCheck(
file_location = "path/to/metabolomics_data.xlsx",
sheet_name = "Sheet1"
)
# CSV file with custom separator
result <- perform_DataQualityCheck(
file_location = "path/to/data.csv",
separator = ";",
skip_rows = 1
)
# Access results
clean_data <- result$raw_data
validation_summary <- result$validation_report
} # }