perform_PreprocessingPeakData.RdPerforms a complete data preprocessing workflow to prepare raw data for downstream analysis. This function applies preprocessing steps sequentially in the order specified by the parameters to ensure optimal data quality and analytical readiness.
perform_PreprocessingPeakData(
raw_data,
outliers = NULL,
filterMissing = 20,
filterMissing_by_group = TRUE,
filterMissing_includeQC = FALSE,
denMissing = 5,
driftBatchCorrection = TRUE,
spline_smooth_param = 0,
spline_smooth_param_limit = c(-1.5, 1.5),
log_scale = TRUE,
min_QC = 5,
removeUncorrectedFeatures = TRUE,
dataNormalize = "Normalization",
refSample = NULL,
groupSample = NULL,
reference_method = "mean",
dataTransform = "vsn",
dataScalePCA = "meanSD",
dataScalePLS = "mean2SD",
filterMaxRSD = 30,
filterMaxRSD_by = "EQC",
filterMaxVarSD = 10,
verbose = TRUE
)List. Quality-checked data from the perform_DataQualityCheck function.
Vector. Biological samples and/or QC samples considered as outliers. Example format: c('Sample1', 'Sample2', 'QC1', 'QC2', ...). Defaults to NULL.
Numeric. Minimum percentage of missing values across all groups required to remove a feature.
Boolean. Determines whether filterMissing should assess group-specific missingness before feature removal.
Boolean. If FALSE (default), QC samples are excluded when implementing filterMissing.
Numeric. Denominator value used in the fraction 1/denMissing to replace missing values.
Boolean. If TRUE (default), perform QC-RSC algorithm for signal drift and batch effect correction.
Numeric. Spline smoothing parameter ranging from 0 to 1.
Vector. A vector of format c(min, max) for spline parameter limits.
Boolean. If TRUE (default), performs signal correction fit on log-scaled data.
Numeric. Minimum number of QC samples required for signal correction per batch.
Boolean. If TRUE (default), removes features that were not corrected by QCRSC due to insufficient QC samples meeting the min_QC threshold.
String. Data normalization method. Options:
"none": No normalization
"Normalization": Using the values from "Normalization" row
"sum": By sum
"median": By median
"PQN1": By median of reference spectrum
"PQN2": By reference sample supplied in refSample
"groupPQN": By group in c("SQC", "EQC", "QC"), if both (default) then all QCs are considered as QC
"quantile": By quantile
Default: "Normalization" (if present, otherwise, "sum")
String. Reference sample for dataNormalize = "PQN2".
String. Used only if dataNormalize = "groupPQN".
String. Method for computing reference from QC samples in dataNormalize = "quantile". Options:
"mean": Default
"median"
String. Data transformation method applied after dataNormalize. Options:
"none": No transformation
"log2": log base 2
"log10": log base 10
"sqrt": Square-root
"cbrt": Cube-root
"vsn": Variance Stabilizing Normalization
String. Data scaling for PCA analysis. Options:
"none": No data scaling
"mean": Scale by mean (average)
"meanSD": Scale by mean divided by standard deviation (SD)
"mean2SD": Pareto-scaling. Scale by mean divided by square-root of SD. Always use this for PLS-type analysis.
String. Data scaling for PLS analysis. Same options as dataScalePCA.
Numeric. Threshold for RSD filtering.
String. Which QC samples to use for RSD filtering. Options:
"SQC": Filter by sample QC
"EQC": Filter by extract QC (default)
"both": Filter by by both SQC and EQC (or QC altogether)
Numeric. Remove nth percentile of features with lowest variability.
Logical. Whether to print detailed progress messages. Default TRUE.
A list containing results from all preprocessing steps.
perform_DataQualityCheck for the data quality check