Protein sequence comparison under a new complex representation of amino acids based on their physio-chemical properties


Jayanta Pal,Soumen Ghosh,Bansibadan Maji ,Dilip Kumar Bhattacharya,



omplex Representatio, DFT, Hydrophobicity Proper,Hydrophilicity (Polarity) Property,ICD; Phylogenetic Tree,Voss Representation,


The paper first considers a new complex representation of amino acids of which the real parts and imaginary parts are taken respectively from hydrophilic properties and residue volumes of amino acids. Then it applies complex Fourier transform on the represented sequence of complex numbers to obtain the spectrum in the frequency domain. By using the method of ‘Inter coefficient distances’ on the spectrum obtained, it constructs phylogenetic trees of different Protein sequences. Finally on the basis of such phylogenetic trees pair wise comparison is made for such Protein sequences. The paper also obtains pair wise comparison of the same protein sequences following the same method but based on a known complex representation of amino acids, where the real and imaginary parts refer to hydrophobicity properties and residue volumes of the amino acids respectively. The results of the two methods are now compared with those of the same sequences obtained earlier by other methods. It is found that both the methods are workable, further the new complex representation is better compared to the earlier one. This shows that the hydrophilic property (polarity) is a better choice than hydrophobic property of amino acids especially in protein sequence comparison.


I.K. Brodzik, and 0. Peters, “Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences,”in Proc. IEEE ICASSP, vol. 5, pp. 373-376, 2005.
II.B. D. Silverman, and R. Linsker, “A measure of DNA periodicity,” J. Theor. Biol., vol. 118, pp. 295-300, 1986.
III.Changchuan Yin and Stephen S. –T. Yau, Numerical representation of DNA sequences Based on Genetic Code Context and its applications in Periodicity Analysis Genomes- 978-1—1779-7/08/$25.00@2008 IEEE
IV.D. Anastassiou, Frequency-domain analysis of bimolecular sequences, Bioinformatics, vol.16, no.4, pp. 1073-1081, 2000.
V.D. Anastassiou, “Genomic signal processing,” IEEE Signal Proc.Mag., vol. 18, no. 4, pp. 8-20, July 2001.
VI.D. E. Godsack and R. C. Chalifoux, Contribution of the free energy of mixing hydrophobic side chains to the stability of the tertiary structure, Journal of Theoretical Biology vol. 39, pp. 645-651, 1973.
VII.Ghosh, S., Pal, J. S. Das and Bhattacharya, D.K (2015)-Biological and Theoretical Classifications of Amino Acids in Six Groups. International Journal of Computer Science and Software Engineering, 5, 695-698.
VIII.Ghosh, S., Pal, J. and Bhattacharya, D.K. (2014) Classification of Amino Acids of a Protein on the Basis of Fuzzy Set Theory. International Journal of Modern Sciences and Engineering Technology, 1, 30-35.
IX.G. L. Rosen, “Signal processing for biologically-inspired gradient source localization and DNA sequence analysis,” PhD thesis, Georgia Institute of Technology, Aug. 2006.
X.J. Ning, C. N. Moore, and J. C. Nelson, “Preliminary wavelet analysis of genomic sequences,” in Proc. IEEE Bioinformatics Conf (CSB), pp. 509-510, August 2003.
XI.King, B.R., Aburdene, M., Thompson, A. and Warres, Z. (2014) Application of Discrete Fourier Inter-Coefficient Difference for Assessing Genetic Sequence Similarity.EURASIP Journal on Bioinformatics and
Systems Biology, 2014, 8.
XII.M. Elloumi et al. (Eds.) “Complex Representation of DNA Sequences by Carlo Cattani”, BIRD 2008, CCIS 13, pp. 528–537, 2008._c Springer- Verlag Berlin Heidelberg 2008.
XIII.N. Chakravarthy, A. Spanias, L. D. lasemidis, and K. Tsakalis,”Autoregressive modeling and feature analysis of DNA sequences,”EURASIP JASP, vol. 1, pp. 13-28, 2004.

XIV.Pal, J., Ghosh, S., Maji, B. and Bhattacharya, D.K. (2016) Use of FFT in Protein Sequence Comparison under Their Binary Representations.Computational Molecular Bioscience, 6, 33-40.
IV.P. Argos, J.K.M.Rao and P.A.Hargrave, structural prediction of membrane bound proteins, Eur.J.Biochevol.128, pp. 565-575,1982.
XVI.P. D. Cristea, “Genetic signal representation and analysis,” in Proc. SPIE Conference, International Biomedical Optics Symposium (BIOS’02), vol.4623, pp. 77-84, 2002.
XVII.R. F. Voss, “Evolution of long-range fractal correlations and 1/f noise in DNA base sequences,” Phy. Rev. Lett., vol. 68, no. 25, pp. 3805-3808,June 1992
XVIII.R. Zhang, and C. T. Zhang, “Z curves, an intuitive tool for visualizing and analyzing the DNA sequences,” J. Biomol. Struct. Dyn.,vol. 11, no. 4, pp. 767-782, February 1994.
XIX.Tung Hoang, Changchuan Yin, Hui Zheng, Chenglong YU, Rong Lucy He, Stephen S, T. Tay – A new method to cluster DNA sequences using Fourier power spectrum- Journal of Theoretical Biology- 372 (2015), 135-145.
Jayanta Pal, Soumen Ghosh, Bansibadan Maji, Dilip Kumar Bhattacharya View Download