International Journal of Applied Science and Engineering
Published by Chaoyang University of Technology

Cheng-Jian Lina*, Chun-Cheng Pengb, and Chi-Yung Leec

Department of Computer Science and Information Engineering, Chaoyang University of Technology, Wufeng, Taichung County 413, Taiwan, R.O.C.
b School of Computer Science and Information Systems, Birkbeck, University of London, London WC1E 7HX, UK
c Department of Electronic Engineering, Nan Kai College, Caotun, Nantou County 542, Taiwan, R.O.C.

Download Citation: |
Download PDF


Escherichia coli (E. coli) K12 was sequenced in 1997. The 4,639,221-base pair DNA sequence consists of 4288 annotated protein-coding genes, 38 percent of which have no attributed function. One of the major problems in predicting prokaryotic promoters is locating the spacers between the -35 box and -10 box and between the -10 box and transcription start site. In this paper, we use the adopted expectation maximization (EM) algorithm to accurately find the localizations of the promoter regions. A brand new purine-pyrimidine encoding method is proposed to reduce the dimensions of the training data. The heavy demand on systems for both computation and memory space can then be avoided through the choice of coding factor. The most representative features are used for training learning vector quantization networks. The simulation results of the proposed coding approach reveal that the precision of promoter prediction using the proposed approach is approximately the same as the precision using the traditional encoding method.

Keywords: E. coli; promoter prediction; purine-pyrimidine; expectation maximization algorithm; learning vector quantization networks.

Share this article with your colleagues



[1]  Genome Online Database,

[2] Blattner, R., Plunkett, G., Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., Gregor, J., Davis, N. W., Kirkpatrick, H. A., Goeden, M. A., Rose, D. J., Mau, B., and Shao, Y. 1997. The complete genome sequence of Escherichia coli K-12. Science, 277, 5331: 1453-1474.

[3] Abello, , Pardalos, P. M., and Resende, M. G. C. 2001. Handbook of Massive Data Sets”. Dordrecht: Kluwer Academic: 1141-1168.

[4] Lin, T. and Lee, C. S. 1999. “Neural Fuzzy Systems – a Neuro-Fuzzy Synergism to Intelligent Systems”. Singapore: Prentice-Hall.

[5] Pederson, G. and Engelbrect, J. 1995. Investigations of Escherichia coli promote sequences with artificial neural networks: new signals discovered upstream of the transcriptional startpoint. Proceedings of 3rd International Conference on Intelligent Systems for Molecular Biology: 292-299.

[6] Handley, 1995. Predicting whether or not a nucleic acid sequence is an E. coli promoter region using genetic programming, Proceedings of 1st International Symposium on Intelligence in Neural and Biological Systems.

[7] Hirsh, and Noordewier, M. 1994. Using background knowledge to improve inductive learning of DNA sequences. Proceedings of IEEE Conference on Artificial Intelligence for Applications.

[8] Wu, H. 1997. Artificial neural networks for molecular sequence analysis. Computers and Chemistry, 21, 4: 237-256.

[9] Stormo, D., Schneider, T. D., and Gold, L. 1982. Use of the perceptron algorithm to distinguish translational initiation sites in E. coli. Nucleic Acid Research, 10: 2997-3011.

[10] Dayhoff, 1990. Neural Network Architectures: an Introduction”. New York: Van Nostrand Reinhold.

[11] Peng, C. and Lin, C. J. 2002. E. coli Promoter prediction using neural fuzzy networks. Proceeding of 10th National Conference on Fuzzy Theory and Its Applications, Hsinchu, Taiwan.

[12] Chamberlin, J. 1974. The selectivity of transcription. Annual Review of Biochemistry, 43, 0: 721-775.

[13] Hawley, K. and McClure, W. R. 1982. Mechanism of activation of transcription initiation from the lambda PRM promoter. Journal of Molecular Biology, 157, 3: 493-525.

[14] Hawley, K. and McClure, W. R. 1983. The effect of a lambda repressor mutation on the activation of transcription initiation from the lambda PRM promoter. Cell, 32, 2: 327-333.

[15] Schneider, D. and Stephens, R. M. 1990. Sequence logos: A new way to display consensus sequences. Nucleic Acids Research, 18, 20: 6097-6100.

[16] Newlands, T., Josaitis, C. A., Ross, W., and Gourse, R. L. 1992. Both fis-dependent and factor-independent upstream activation of the rrnB P1 promoter are face of the helix dependent. Nucleic Acids Research, 20, 4: 719-726.

[17] Ross, , Gosink, K. K., Salomon, J., Igarashi, K., Zou, C., Ishihama, A., Severinov, K., and Gourse, R. L. 1993. A third recognition element in bacterial promoters DNA binding by the alpha subunit of RNA polymerase. Science, 262, 5138: 1407-1413.

[18] Busby, and Ebright, R. H. 1994. Promoter structure, promoter recognition and transcription activation in prokaryotes. Cell, 79, 5: 743-746.

[19] Ozoline, N., Deev, A. A., and Arkhipova, M. V. 1997. Non-canonical sequence elements in the promoter structure. Cluster analysis of promoters recognized by E. coli RNA polymerase. Nucleic Acids Research, 25, 33: 4703-4709.

[20] Wu, F. J. 1983. On the convergence properties of the EM algorithm. The Annals of Statistics, 11, 1: 95–103.

[21] Ash, 1965. “Information Theory”. New York: Interscience.

[22] Kohonen, 1987. Self-Organization and Associative Memory. 2nd Edition, Berlin: Springer-Verlag.


Accepted: 2004-06-16
Available Online: 2004-07-03

Cite this article:

Lin, C.-J., Peng, C.-C., Lee, C.-Y. 2004. Prediction of RNA polymerase binding sites using Pu-rine-Pyrimidine encoding and hybrid learning methods, International Journal of Applied Science and Engineering, 2, 177–188.

We use cookies on this website to improve your user experience. By using this site you agree to its use of cookies.