Full Text:   <1361>

CLC number: Q78

On-line Access: 

Received: 2004-10-08

Revision Accepted: 2005-03-07

Crosschecked: 0000-00-00

Cited: 4

Clicked: 4163

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
1. Reference List
Open peer comments

Journal of Zhejiang University SCIENCE B 2005 Vol.6 No.5 P.401~407


A hybrid neural network system for prediction and recognition of promoter regions in human genome

Author(s):  CHEN Chuan-bo, LI Tao

Affiliation(s):  School of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

Corresponding email(s):   chuanboc@163.com, ljrlt@public.wh.hb.cn

Key Words:  Hybrid neural network, Promoter prediction, Compositional features, CpG islands

CHEN Chuan-bo, LI Tao. A hybrid neural network system for prediction and recognition of promoter regions in human genome[J]. Journal of Zhejiang University Science B, 2005, 6(5): 401~407.

@article{title="A hybrid neural network system for prediction and recognition of promoter regions in human genome",
author="CHEN Chuan-bo, LI Tao",
journal="Journal of Zhejiang University Science B",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T A hybrid neural network system for prediction and recognition of promoter regions in human genome
%A CHEN Chuan-bo
%A LI Tao
%J Journal of Zhejiang University SCIENCE B
%V 6
%N 5
%P 401~407
%@ 1673-1581
%D 2005
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.2005.B0401

T1 - A hybrid neural network system for prediction and recognition of promoter regions in human genome
A1 - CHEN Chuan-bo
A1 - LI Tao
J0 - Journal of Zhejiang University Science B
VL - 6
IS - 5
SP - 401
EP - 407
%@ 1673-1581
Y1 - 2005
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.2005.B0401

This paper proposes a high specificity and sensitivity algorithm called PromPredictor for recognizing promoter regions in the human genome. PromPredictor extracts compositional features and cpG islands information from genomic sequence, feeding these features as input for a hybrid neural network system (HNN) and then applies the HNN for prediction. It combines a novel promoter recognition model, coding theory, feature selection and dimensionality reduction with machine learning algorithm. Evaluation on Human chromosome 22 was ~66% in sensitivity and ~48% in specificity. Comparison with two other systems revealed that our method had superior sensitivity and specificity in predicting promoter regions. PromPredictor is written in MATLAB and requires Matlab to run. PromPredictor is freely available at http://www.whtelecom.com/Prompredictor.htm.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1] Bajic, V.B., Seah, S.H., Chong, A., Zhang, G., Koh, J.L.Y., Brusic, V., 2002. Dragon Promoter Finder: recognition of vertebrate RNA Polymerase II promoters. Bioinformatics, 18:198-199.

[2] Bajic, V.B., Seah, S.H., Chong, A., Krishnan, S.P.T., Koh, J.L.Y., Brusic, V., 2003. Computer model for recognition of functional transcription start sites in RNA polymerase II promoter of vertebrates. Journal of Molecular Graphic and Modeling, 21:323-332.

[3] Bassat, M.B., 1982. Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation. In: Krishnaiah, P.R., Kanal, L.N. (Eds.), Classification, Pattern Recognition and Reduction of Dimensionality: Handbook of Statistics. Volume 2, North-Holland Publishing Company, Amsterdam, p.773-791.

[4] Battiti, R., 1992. First and second order methods for learning: Between steepest descent and Newton’s method. Neural Computation, 4(2):141-166.

[5] Bell, P.J.L., Higgins, V.J., Dawes, I.W., Bissinger, P.H., 1997. Tandemly repeated 147 bp elements cause structural and functional variation in divergent MAL promoters of Saccharomyces cerevisiae. Yeast, 13:1135-1144.

[6] Bird, A.P., Taggart, M.H., Nicholls, R.D., Higgs, D.R., 1987. Non-methylated CpG-rich islands at the human α-globin locus: Implications for evolution of the α-globin pseudogene. EMBO J, 6:999-1004.

[7] Bohjanen, P.R., Liu, Y., GarciaBlanco, M.A., 1997. TAR RNA decoys inhibit Tat-activated HIV-1 transcription after preinitiation complex formation. Nucleic Acids Res., 25:4481-4486.

[8] Cavin, P.R., Junier, T., Bucher, P., 1998. The Eukaryotic Promoter Database EPD. Nucleic Acids Res., 26:353-357.

[9] Chetouani, F., Monestié, P., Thébault, P., Gaspin, C., Michot, B., 1997. ESSA: an integrated and interactive computer tool for analyzing RNA secondary structure. Nucleic Acids Res., 25:3514-3522.

[10] Chuzhanova, N.A., Jones, A.J., Margetts, S., 1998. Feature selection for genetic sequence classification. Bioinformatics, 14:139-143.

[11] Claverie, J.M., Sauvaget, I., Bougueleret, L., 1990. K-tuple frequency analysis from intron/exon discrimination to Tcell epitope mapping. Methods Enzimol., 183:237-252.

[12] Cross, S.H., Bird, A.P., 1995. CpG islands and genes. Curr. Opin. Genet. Dev., 5:309-314.

[13] Cross, S.H., Clark, V.H., Bird, A.P., 1999. Isolation of CpG islands from large genomic clones. Nucleic Acids Res., 27:2099-2107.

[14] Dash, M., Liu, H., 1997. Feature selection for classification. Intelligent Data Analysis, 3:1-6.

[15] Davuluri, R.V., Grosse, I., Zhang, M.Q., 2001. Computational identification of promoters and first exons in the human genome. Nature Genetics, 29:412-417.

[16] Fickett, J.W., Hatzigeorgiou, A.G., 1997. Eukaryotic promoter recognition. Genome Res., 7:861-878.

[17] Gardiner, G.M., Frommer, M., 1987. CpG islands in vertebrate genomes. J. Mol. Biol., 196:261-282.

[18] Grillo, G., Attimonelli, M., Liuni, S., Pesole, G., 1996. CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases. Comput. Applic. Biosci., 12:1-8.

[19] Hagan, M.T., Menhaj, M., 1994. Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks, 5(6):989-993.

[20] Hannenhalli, S., Levy, S., 2001. Promoter prediction in the human genome. Bioinformatics, 17:90-96.

[21] Hirsh, H., Noordewier, M., 1994. Using Background Knowledge to Improve Inductive Learning of DNA Sequences. Proceedings of the Tenth Annual Conference on Artificial Intelligence for Applications. San Antonio, p.351-357.

[22] Ioshikhes, I.P., Zhang, M.Q., 2000. Large-scale human promoter mapping using CpG islands. Nature Genetics, 26:61-63.

[23] Kim, J., Klooster, S., Shapiro, D.J., 1995. Intrinsically bent DNA in a eukaryotic transcription factor recognition sequence potentiates transcription activation. J Biol. Chem., 270:1282-1288.

[24] Lander, E.S., 1996. The new genomics: global views of biology. Science, 274:536-539.

[25] Lander, E.S. Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., Fitzhugh, W., 2001. Initial sequencing and analysis of the human genome. Nature, 409:860-921.

[26] Larsen, F., Gundersen, G., Lopez, R., Prydz, H., 1992. CpG islands as gene markers in the human genome. Genomics, 13:1095-1107.

[27] Liu, R.X., David, J., 2002. Consensus promoter identification in the human genome utilizing expressed gene markers and gene modeling. Genome Res., 3:462-469.

[28] Nakaya, A., Yamamoto, K., Yonezawa, A., 1995. RNA secondary structure prediction using highly parallel computers. Comp Appl Biosci., 11:685-692.

[29] Nielsen, D.A., Novoradovsky, A., Goldman, D., 1995. SSCP primer design based on single-strand DNA structure predicted by a DNA folding program. Nucleic Acids Res., 23:2287-2291.

[30] Ohler, U., Niemann, H., 2001. Identification and analysis of eukaryotic promoters: recent computational approaches. TRENDS Genet., 17:56-60.

[31] Pedersen, A.G., Baldi, P., Chauvin, Y., Brunak, S., 1999. The biology of eukaryotic promoter prediction-A review. Comput. Chem., 23:191-207.

[32] Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C., Saccone, C., 2002. UTRdb and UTRsite: specialized database of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Res., 30:335-340.

[33] Ponger, L., Mouchiroud, D., 2002. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics, 18:631-633.

[34] Powell, M.J.D., 1977. Restart procedures for the conjugate gradient method. Mathematical Programming, 12:241-254.

[35] Riedmiller, M., Braun, H., 1993. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. Proceedings of the IEEE International Conference on Neural Networks, San Francisco.

[36] Scherf, M., Klingenhoff, A., Werner, T., 2000. Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. J. Mol. Biol., 297:599-606.

[37] Schuster, P., Stadler, P.F., Renner, A., 1997. RNA structures and folding: from conventional to new issues in structure predictions. Curr. Opin. Struct. Biol., 7:229-235.

[38] Shago, M., Giguere, V., 1996. Isolation of a novel retinoic acid-responsive gene by selection of genomic fragments derived from CpG-island enriched DNA. Mol. Cell Biol., 16:4337-4348.

[39] Solovyev, V.V., Makarova, K.S., 1993. A novel method of protein sequence classification based on oligopeptide frequency analysis and its application to search for functional sites and to domain localization. Computer Applications in the Biosciences, 9(1):17-24.

[40] Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., 2001. The sequence of the human genome. Science, 291:1304-1351.

[41] Wang, W.D., Chi, T.H., Xue, Y.T., Zhou, S., Kuo, A., 1998. Architectural DNA binding by a high-mobility-group/kinesin-like subunit in mammalian SWI/SNF-related complexes. Proc. Natl. Acad. Sci. USA, 95:492-498.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE