Nonlinear signal analysis to understand the dynamics of the protein sequencesS. Angadi and A. Kulkarni
Systems Research Lab., Tata Research Development and Design Centre (TRDDC), 54-B, Hadapsar Industrial Estate, Pune-411013, India
Recurrence plots are a useful tool to identify structure in a data set in a time resolved way qualitatively. Recurrence plots and its quantification has become an important research tool in the analysis of nonlinear dynamical systems. In the present work, we utilize the recurrence property to study the protein sequences. The sequences that we analyze belong to two distinct classes, viz., soluble proteins and proteins that form inclusion bodies when over expressed in Escherichia coli. We use Kyte-Doolittle hydrophobicity scale in the analysis. We study the underlying dynamics and extract the information which codes the essential class of a protein using simple statistical and global characteristics based features as well as some advanced features based on recurrence quantification. The extracted features are used in probability estimation using Gaussian Process Classification technique. The results give meaningful insights to the level of understanding the protein sequence dynamics.
© EDP Sciences, Springer-Verlag 2008