TY - JOUR
T1 - A database of calculated solution parameters for the AlphaFold predicted protein structures
AU - Brookes, Emre
AU - Rocco, Mattia
N1 - Funding Information:
Research reported in this publication was supported by the NIGMS of the National Institutes of Health under Award Number R01GM120600 and by the National Science Foundation under awards CHE-1265817, OAC-1740097, OAC-1912444, all to E.B. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health nor the National Science Foundation. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant Number ACI-1548562 and utilized Jetstream at Indiana University through allocation TG-MCB17057 to EB. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper ( http://www.tacc.utexas.edu ). The computing infrastructure at the University of Lethbridge was funded by the Canada Foundation for Innovation (CFI-37589 to Borries Demeler). We are grateful to P. Vachette (I2BC, Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France) for comments and suggestions.
Funding Information:
Research reported in this publication was supported by the NIGMS of the National Institutes of Health under Award Number R01GM120600 and by the National Science Foundation under awards CHE-1265817, OAC-1740097, OAC-1912444, all to E.B. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health nor the National Science Foundation. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation Grant Number ACI-1548562 and utilized Jetstream at Indiana University through allocation TG-MCB17057 to EB. The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper (http://www.tacc.utexas.edu). The computing infrastructure at the University of Lethbridge was funded by the Canada Foundation for Innovation (CFI-37589 to Borries Demeler). We are grateful to P. Vachette (I2BC, Université Paris-Saclay, CEA, CNRS, Gif-sur-Yvette, France) for comments and suggestions.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients (Dt(20,w)0, s(20,w)0) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding Dt(20,w)0, s(20,w)0, [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.
AB - Recent spectacular advances by AI programs in 3D structure predictions from protein sequences have revolutionized the field in terms of accuracy and speed. The resulting “folding frenzy” has already produced predicted protein structure databases for the entire human and other organisms’ proteomes. However, rapidly ascertaining a predicted structure’s reliability based on measured properties in solution should be considered. Shape-sensitive hydrodynamic parameters such as the diffusion and sedimentation coefficients (Dt(20,w)0, s(20,w)0) and the intrinsic viscosity ([η]) can provide a rapid assessment of the overall structure likeliness, and SAXS would yield the structure-related pair-wise distance distribution function p(r) vs. r. Using the extensively validated UltraScan SOlution MOdeler (US-SOMO) suite, a database was implemented calculating from AlphaFold structures the corresponding Dt(20,w)0, s(20,w)0, [η], p(r) vs. r, and other parameters. Circular dichroism spectra were computed using the SESCA program. Some of AlphaFold’s drawbacks were mitigated, such as generating whenever possible a protein’s mature form. Others, like the AlphaFold direct applicability to single-chain structures only, the absence of prosthetic groups, or flexibility issues, are discussed. Overall, this implementation of the US-SOMO-AF database should already aid in rapidly evaluating the consistency in solution of a relevant portion of AlphaFold predicted protein structures.
UR - http://www.scopus.com/inward/record.url?scp=85129452623&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129452623&partnerID=8YFLogxK
U2 - 10.1038/s41598-022-10607-z
DO - 10.1038/s41598-022-10607-z
M3 - Article
C2 - 35513443
AN - SCOPUS:85129452623
VL - 12
JO - Scientific Reports
JF - Scientific Reports
SN - 2045-2322
IS - 1
M1 - 7349
ER -