Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data

Isaac Ampong, Kip D. Zimmerman, Peter W. Nathanielsz, Laura Cox, Michael Olivier

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Gas chromatography–coupled mass spectrometry (GC–MS) has been used in biomedical research to analyze volatile, non-polar, and polar metabolites in a wide array of sample types. Despite advances in technology, missing values are still common in metabolomics datasets and must be properly handled. We evaluated the performance of ten commonly used missing value imputation methods with metabolites analyzed on an HR GC–MS instrument. By introducing missing values into the complete (i.e., data without any missing values) National Institute of Standards and Technology (NIST) plasma dataset, we demonstrate that random forest (RF), glmnet ridge regression (GRR), and Bayesian principal component analysis (BPCA) shared the lowest root mean squared error (RMSE) in technical replicate data. Further examination of these three methods in data from baboon plasma and liver samples demonstrated they all maintained high accuracy. Overall, our analysis suggests that any of the three imputation methods can be applied effectively to untargeted metabolomics datasets with high accuracy. However, it is important to note that imputation will alter the correlation structure of the dataset and bias downstream regression coefficients and p-values.

Original languageEnglish (US)
Article number429
JournalMetabolites
Volume12
Issue number5
DOIs
StatePublished - May 2022
Externally publishedYes

Keywords

  • HR GC–MS
  • imputation missing values
  • metabolomics

ASJC Scopus subject areas

  • Endocrinology, Diabetes and Metabolism
  • Biochemistry
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Optimization of Imputation Strategies for High-Resolution Gas Chromatography–Mass Spectrometry (HR GC–MS) Metabolomics Data'. Together they form a unique fingerprint.

Cite this