CeL-ID: Cell line identification using RNA-seq data

Tabrez A. Mohammad, Yun S. Tsai, Safwa Ameer, Hung I.Harry Chen, Yu Chiao Chiu, Yidong Chen

Research output: Contribution to journalArticle

Abstract

Background: Cell lines form the cornerstone of cell-based experimentation studies into understanding the underlying mechanisms of normal and disease biology including cancer. However, it is commonly acknowledged that contamination of cell lines is a prevalent problem affecting biomedical science and available methods for cell line authentication suffer from limited access as well as being too daunting and time-consuming for many researchers. Therefore, a new and cost effective approach for authentication and quality control of cell lines is needed. Results: We have developed a new RNA-seq based approach named CeL-ID for cell line authentication. CeL-ID uses RNA-seq data to identify variants and compare with variant profiles of other cell lines. RNA-seq data for 934 CCLE cell lines downloaded from NCI GDC were used to generate cell line specific variant profiles and pair-wise correlations were calculated using frequencies and depth of coverage values of all the variants. Comparative analysis of variant profiles revealed that variant profiles differ significantly from cell line to cell line whereas identical, synonymous and derivative cell lines share high variant identity and are highly correlated (ρ > 0.9). Our benchmarking studies revealed that CeL-ID method can identify a cell line with high accuracy and can be a valuable tool of cell line authentication in biomedical science. Finally, CeL-ID estimates the possible cross contamination using linear mixture model if no perfect match was detected. Conclusions: In this study, we show the utility of an RNA-seq based approach for cell line authentication. Our comparative analysis of variant profiles derived from RNA-seq data revealed that variant profiles of each cell line are distinct and overall share low variant identity with other cell lines whereas identical or synonymous cell lines show significantly high variant identity and hence variant profiles can be used as a discriminatory/identifying feature in cell authentication model.

Original languageEnglish (US)
Article number5371
JournalBMC Genomics
Volume20
DOIs
StatePublished - Feb 4 2019

Fingerprint

RNA
Cell Line
Benchmarking
Quality Control
Linear Models
Research Personnel
Costs and Cost Analysis

Keywords

  • CeL-ID
  • Cell line authentication
  • Cell line identification
  • Mutation
  • RNA-Seq variant profiles
  • SNP/Indel

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Mohammad, T. A., Tsai, Y. S., Ameer, S., Chen, H. I. H., Chiu, Y. C., & Chen, Y. (2019). CeL-ID: Cell line identification using RNA-seq data. BMC Genomics, 20, [5371]. https://doi.org/10.1186/s12864-018-5371-9

CeL-ID : Cell line identification using RNA-seq data. / Mohammad, Tabrez A.; Tsai, Yun S.; Ameer, Safwa; Chen, Hung I.Harry; Chiu, Yu Chiao; Chen, Yidong.

In: BMC Genomics, Vol. 20, 5371, 04.02.2019.

Research output: Contribution to journalArticle

Mohammad, TA, Tsai, YS, Ameer, S, Chen, HIH, Chiu, YC & Chen, Y 2019, 'CeL-ID: Cell line identification using RNA-seq data', BMC Genomics, vol. 20, 5371. https://doi.org/10.1186/s12864-018-5371-9
Mohammad TA, Tsai YS, Ameer S, Chen HIH, Chiu YC, Chen Y. CeL-ID: Cell line identification using RNA-seq data. BMC Genomics. 2019 Feb 4;20. 5371. https://doi.org/10.1186/s12864-018-5371-9
Mohammad, Tabrez A. ; Tsai, Yun S. ; Ameer, Safwa ; Chen, Hung I.Harry ; Chiu, Yu Chiao ; Chen, Yidong. / CeL-ID : Cell line identification using RNA-seq data. In: BMC Genomics. 2019 ; Vol. 20.
@article{a60ef090d4184b509c7fab5e90e51ed3,
title = "CeL-ID: Cell line identification using RNA-seq data",
abstract = "Background: Cell lines form the cornerstone of cell-based experimentation studies into understanding the underlying mechanisms of normal and disease biology including cancer. However, it is commonly acknowledged that contamination of cell lines is a prevalent problem affecting biomedical science and available methods for cell line authentication suffer from limited access as well as being too daunting and time-consuming for many researchers. Therefore, a new and cost effective approach for authentication and quality control of cell lines is needed. Results: We have developed a new RNA-seq based approach named CeL-ID for cell line authentication. CeL-ID uses RNA-seq data to identify variants and compare with variant profiles of other cell lines. RNA-seq data for 934 CCLE cell lines downloaded from NCI GDC were used to generate cell line specific variant profiles and pair-wise correlations were calculated using frequencies and depth of coverage values of all the variants. Comparative analysis of variant profiles revealed that variant profiles differ significantly from cell line to cell line whereas identical, synonymous and derivative cell lines share high variant identity and are highly correlated (ρ > 0.9). Our benchmarking studies revealed that CeL-ID method can identify a cell line with high accuracy and can be a valuable tool of cell line authentication in biomedical science. Finally, CeL-ID estimates the possible cross contamination using linear mixture model if no perfect match was detected. Conclusions: In this study, we show the utility of an RNA-seq based approach for cell line authentication. Our comparative analysis of variant profiles derived from RNA-seq data revealed that variant profiles of each cell line are distinct and overall share low variant identity with other cell lines whereas identical or synonymous cell lines show significantly high variant identity and hence variant profiles can be used as a discriminatory/identifying feature in cell authentication model.",
keywords = "CeL-ID, Cell line authentication, Cell line identification, Mutation, RNA-Seq variant profiles, SNP/Indel",
author = "Mohammad, {Tabrez A.} and Tsai, {Yun S.} and Safwa Ameer and Chen, {Hung I.Harry} and Chiu, {Yu Chiao} and Yidong Chen",
year = "2019",
month = "2",
day = "4",
doi = "10.1186/s12864-018-5371-9",
language = "English (US)",
volume = "20",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - CeL-ID

T2 - Cell line identification using RNA-seq data

AU - Mohammad, Tabrez A.

AU - Tsai, Yun S.

AU - Ameer, Safwa

AU - Chen, Hung I.Harry

AU - Chiu, Yu Chiao

AU - Chen, Yidong

PY - 2019/2/4

Y1 - 2019/2/4

N2 - Background: Cell lines form the cornerstone of cell-based experimentation studies into understanding the underlying mechanisms of normal and disease biology including cancer. However, it is commonly acknowledged that contamination of cell lines is a prevalent problem affecting biomedical science and available methods for cell line authentication suffer from limited access as well as being too daunting and time-consuming for many researchers. Therefore, a new and cost effective approach for authentication and quality control of cell lines is needed. Results: We have developed a new RNA-seq based approach named CeL-ID for cell line authentication. CeL-ID uses RNA-seq data to identify variants and compare with variant profiles of other cell lines. RNA-seq data for 934 CCLE cell lines downloaded from NCI GDC were used to generate cell line specific variant profiles and pair-wise correlations were calculated using frequencies and depth of coverage values of all the variants. Comparative analysis of variant profiles revealed that variant profiles differ significantly from cell line to cell line whereas identical, synonymous and derivative cell lines share high variant identity and are highly correlated (ρ > 0.9). Our benchmarking studies revealed that CeL-ID method can identify a cell line with high accuracy and can be a valuable tool of cell line authentication in biomedical science. Finally, CeL-ID estimates the possible cross contamination using linear mixture model if no perfect match was detected. Conclusions: In this study, we show the utility of an RNA-seq based approach for cell line authentication. Our comparative analysis of variant profiles derived from RNA-seq data revealed that variant profiles of each cell line are distinct and overall share low variant identity with other cell lines whereas identical or synonymous cell lines show significantly high variant identity and hence variant profiles can be used as a discriminatory/identifying feature in cell authentication model.

AB - Background: Cell lines form the cornerstone of cell-based experimentation studies into understanding the underlying mechanisms of normal and disease biology including cancer. However, it is commonly acknowledged that contamination of cell lines is a prevalent problem affecting biomedical science and available methods for cell line authentication suffer from limited access as well as being too daunting and time-consuming for many researchers. Therefore, a new and cost effective approach for authentication and quality control of cell lines is needed. Results: We have developed a new RNA-seq based approach named CeL-ID for cell line authentication. CeL-ID uses RNA-seq data to identify variants and compare with variant profiles of other cell lines. RNA-seq data for 934 CCLE cell lines downloaded from NCI GDC were used to generate cell line specific variant profiles and pair-wise correlations were calculated using frequencies and depth of coverage values of all the variants. Comparative analysis of variant profiles revealed that variant profiles differ significantly from cell line to cell line whereas identical, synonymous and derivative cell lines share high variant identity and are highly correlated (ρ > 0.9). Our benchmarking studies revealed that CeL-ID method can identify a cell line with high accuracy and can be a valuable tool of cell line authentication in biomedical science. Finally, CeL-ID estimates the possible cross contamination using linear mixture model if no perfect match was detected. Conclusions: In this study, we show the utility of an RNA-seq based approach for cell line authentication. Our comparative analysis of variant profiles derived from RNA-seq data revealed that variant profiles of each cell line are distinct and overall share low variant identity with other cell lines whereas identical or synonymous cell lines show significantly high variant identity and hence variant profiles can be used as a discriminatory/identifying feature in cell authentication model.

KW - CeL-ID

KW - Cell line authentication

KW - Cell line identification

KW - Mutation

KW - RNA-Seq variant profiles

KW - SNP/Indel

UR - http://www.scopus.com/inward/record.url?scp=85060976319&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060976319&partnerID=8YFLogxK

U2 - 10.1186/s12864-018-5371-9

DO - 10.1186/s12864-018-5371-9

M3 - Article

C2 - 30712511

AN - SCOPUS:85060976319

VL - 20

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - 5371

ER -