Investigating Protein Structure Populations from Simulation Data using Unsupervised Learning

No Thumbnail Available
Date
2022-02
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
Data obtained from molecular dynamics simulation provides important intuition into the dynamical interactions of biological molecules. The chronicles of sequential time-dependent atomic motions of configurations obtained from simulation and the derived properties estimated from molecule’s trajectory is specified by this sequence. Therefore, knowing how to efficiently extract representative structures from simulation data is important because often, we will want to identify changes in conformation of a protein structure when simulation is performed. We use unsupervised machine learning techniques to cluster such data and investigated a few of protein structural properties. The algorithms implemented in this paper presents clusters of the simulation data that tends to group frames from an adjacent block of time together, even when sampling at 10 ps intervals. We found that sampling of conformational space for a shorter run simulation may not be able to completely visit all structures that belong to a specific cluster. But for the sufficiently long simulation, the systems revisit previous clusters repeatedly. Cluster populations change rapidly at the initial stage of the simulations, but became steady before each got to their terminal values, indicating equilibrium attainment. Investigation of protein structure properties also attest the correspondence between clusters of protein structures obtained from the clustering algorithms.
Description
This article is published by IEEE 2022 and is also available at 10.1109/CSCI58124.2022.00199
Keywords
Citation
2022 International Conference on Computational Science and Computational Intelligence (CSCI)
Collections