While the problem is well-studied within the biological and chemical sciences, it is less well represented within the machine learning community. Protein structure prediction is one of the central problems of biochemistry. ![]() See the raw data section for more information. Transfer requires downloading of a Globus client. However, the raw MSA data (4TB) for ProteinNet 12 is available upon request. The raw data used for construction of the data sets, as well as the MSAs, are not yet generally available. It is organized as a series of data sets, spanning CASP 7 through 12 (covering a ten-year period), to provide a range of data set sizes that enable assessment of new methods in relatively data poor and data rich regimes. ProteinNet builds on the biennial CASP assessments, which carry out blind predictions of recently solved but publicly unavailable protein structures, to provide test sets that push the frontiers of computational methodology. ![]() It provides protein sequences, structures ( secondary and tertiary), multiple sequence alignments ( MSAs), position-specific scoring matrices ( PSSMs), and standardized training / validation / test splits. ProteinNet is a standardized data set for machine learning of protein structure.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |