TY - JOUR
T1 - Quality-controlled R-loop meta-analysis reveals the characteristics of R-loop consensus regions
AU - Miller, Henry E.
AU - Montemayor, Daniel
AU - Abdul, Jebriel
AU - Vines, Anna
AU - Levy, Simon A.
AU - Hartono, Stella R.
AU - Sharma, Kumar
AU - Frost, Bess
AU - Chédin, Frédéric
AU - Bishop, Alexander J.R.
N1 - Publisher Copyright:
© The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2022/7/22
Y1 - 2022/7/22
N2 - R-loops are three-stranded nucleic acid structures formed from the hybridization of RNA and DNA. While the pathological consequences of R-loops have been well-studied to date, the locations, classes, and dynamics of physiological R-loops remain poorly understood. R-loop mapping studies provide insight into R-loop dynamics, but their findings are challenging to generalize. This is due to the narrow biological scope of individual studies, the limitations of each mapping modality, and, in some cases, poor data quality. In this study, we reprocessed 810 R-loop mapping datasets from a wide array of biological conditions and mapping modalities. From this data resource, we developed an accurate R-loop data quality control method, and we reveal the extent of poor-quality data within previously published studies. We then identified a set of high-confidence R-loop mapping samples and used them to define consensus R-loop sites called 'R-loop regions' (RL regions). In the process, we identified a stark divergence between RL regions detected by S9.6 and dRNH-based mapping methods, particularly with respect to R-loop size, location, and colocalization with RNA binding factors. Taken together, this work provides a much-needed method to assess R-loop data quality and offers novel context regarding the differences between dRNH- and S9.6-based R-loop mapping approaches.
AB - R-loops are three-stranded nucleic acid structures formed from the hybridization of RNA and DNA. While the pathological consequences of R-loops have been well-studied to date, the locations, classes, and dynamics of physiological R-loops remain poorly understood. R-loop mapping studies provide insight into R-loop dynamics, but their findings are challenging to generalize. This is due to the narrow biological scope of individual studies, the limitations of each mapping modality, and, in some cases, poor data quality. In this study, we reprocessed 810 R-loop mapping datasets from a wide array of biological conditions and mapping modalities. From this data resource, we developed an accurate R-loop data quality control method, and we reveal the extent of poor-quality data within previously published studies. We then identified a set of high-confidence R-loop mapping samples and used them to define consensus R-loop sites called 'R-loop regions' (RL regions). In the process, we identified a stark divergence between RL regions detected by S9.6 and dRNH-based mapping methods, particularly with respect to R-loop size, location, and colocalization with RNA binding factors. Taken together, this work provides a much-needed method to assess R-loop data quality and offers novel context regarding the differences between dRNH- and S9.6-based R-loop mapping approaches.
UR - http://www.scopus.com/inward/record.url?scp=85134791085&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85134791085&partnerID=8YFLogxK
U2 - 10.1093/nar/gkac537
DO - 10.1093/nar/gkac537
M3 - Article
C2 - 35758606
AN - SCOPUS:85134791085
SN - 0305-1048
VL - 50
SP - 7260
EP - 7286
JO - Nucleic acids research
JF - Nucleic acids research
IS - 13
ER -