DOI: 10.1140/epjst/e2008-00840-6
Nonlinear signal analysis to understand the dynamics of the protein sequences
S. Angadi and A. KulkarniSystems Research Lab., Tata Research Development and Design Centre (TRDDC), 54-B, Hadapsar Industrial Estate, Pune-411013, India
abhijitj.kulkarni@tcs.com
savita.angadi@sas.com
Abstract
Recurrence plots are a useful tool to identify structure
in a data set in a time resolved way qualitatively. Recurrence plots and its
quantification has become an important research tool in the analysis of
nonlinear dynamical systems. In the present work, we utilize the recurrence
property to study the protein sequences. The sequences that we analyze
belong to two distinct classes, viz., soluble proteins and proteins that
form inclusion bodies when over expressed in Escherichia coli. We use Kyte-Doolittle
hydrophobicity scale in the analysis. We study the underlying dynamics and
extract the information which codes the essential class of a protein using
simple statistical and global characteristics based features as well as some
advanced features based on recurrence quantification. The extracted features
are used in probability estimation using Gaussian Process Classification
technique. The results give meaningful insights to the level of understanding the protein sequence dynamics.
© EDP Sciences, Springer-Verlag 2008