International Journal of Applied Science and Engineering
Published by Chaoyang University of Technology

Hung-Ming Sun*

Department of Information Management, Kainan University, No. 1 Kainan Road, Luchu, Taoyuan County, 33857, Taiwan, R.O.C.


Download Citation: |
Download PDF


The Constrained Run-Length Algorithm (CRLA) is a well-known technique for pagesegmentation. The algorithm is very efficient for partitioning documents with Manhattan layoutsbut not suited to deal with complex layout pages, e.g. irregular graphics embedded in a textparagraph. Its main drawback is to use only local information during the smearing stage, whichmay lead to erroneous linkage of text and graphics. This paper presents a solution to this problemby adding global information into the process of the CRLA. This enhanced CRLA can be appliedto non-Manhattan page layout successfully. It can also extract text surrounded by a box. Bothcases cannot be processed by the original CRLA.

Keywords: constrained run-length algorithm; page segmentation; document processing.

Share this article with your colleagues



[1] Wahl, F. M., Wong, K. Y., and Casey, R.G. 1982. Block segmentation and textextraction in mixed text/image documents.Computer Graphics Image Processing,20: 375-390.

[2] Nagy, G. and Seth, S. C. 1984. Hierarchicalrepresentation of opticallyscanned documents. In Proceedings. 7thICPR, Montreal, 347-349.

[3] Fletcher, L. A. and Kasturi, R. 1988. Arobust algorithm for text string separationfrom mixed text/graphics Images.IEEE Trans. Pattern Analysis and MachineIntelligence, 10, 6: 910-918.

[4] O’Gorman, L. 1993. The documentspectrum for page layout analysis. IEEETrans. Pattern Analysis and MachineIntelligence, 15, 11: 1162-1173.

[5] Simon, A., Pret, J.-C., and Johnson, A. P.1997. A fast algorithm for bottom-updocument layout analysis. IEEE Trans.Pattern Analysis and Machine Intelligence,19, 3: 273-277.

[6] Jain, A. K. and Bhattachariee, S. 1992.Text segmentation using Gabor filters forautomatic document processing. MachineVision and Applications, 5:169-184.

[7] Williams, P. S. and Alder, M. D. 1996.Generic texture analysis applied tonewspaper segmentation. In Proceedings.1996 IEEE International. Conference.Neural Networks, Washington DC,1664-1669.

[8] Lin, J., Tang, Y. Y., and Suen, C. Y.1997. Chinese document layout analysisbased on adaptive split-and-merge and qualitative spatial reasoning. PatternRecognition, 30, 8: 1265-1278.

[9] Pavlidis, T. and Zhou, J. 1992. Pagesegmentation and classification. CVGIP:Graphical Models and Image Processing,54, 6: 484-496.

[10] Baird, H. S. 1994. Background structurein document images. “Document ImageAnalysis”, World Scientific Publishing,17-34.

[11] Chi, Z., Wang, Q., and Siu, W.-C. 2003.Hierarchical content classification andscript determination for automaticdocument image processing. PatternRecognition, 36, 11: 2483-2500.

[12] Nagy, G. 2000, Twenty years of documentimage analysis in PAMI. IEEETransfusion. Pattern Analysis and MachineIntelligence, 22, 1: 38-62.

[13] Shih, F. Y. and Chen, S. S. 1996. Adaptivedocument block segmentation andclassification. IEEE Transfusion. SystemMan and Cybernetics-PART B: Cybernetics,26, 5: 797-802.

[14] Fisher, J. L., Hinds, S. C., and D’amato,D. P. 1990. A rule-based system fordocument image segmentation. In Proceedings.10th ICPR, Atlantic, 567-572.

[15] Shih, F. Y., Chen, S. S., Hung, D. C. D.,and Ng, P. A. 1992. A document segmentation,classification and recognitionsystem. In Proceedings. IEEE International.Conference. System Integration,Morristown, NJ, 258-267.

[16] Xi, J., Hu, J., and Wu, L. 2002. Pagesegmentation of Chinese newspapers.Pattern Recognition, 35, 12: 2695-2704.

[17] Hadjar, K. and Ingold, R. 2003. Arabicnewspaper page segmentation. Process.7th ICDAR, Edinburgh, Scotland,895-899.

[18] Etemad, K., Doermann, D., and Chellappa,R. 1997. Multiscale segmentationof unstructured document pages usingsoft decision integration. IEEE Trans.Pattern Analysis and Machine Intelligence,19, 1: 92-96.

[19] Antonacopuolos, A. 1998. Page segmentationusing the description of thebackground. Computer Vision and ImageUnderstanding, 70, 3: 350-369.

[20] Kise, K., Sato, A., and Iwata, M. 1998.Segmentation of page images using thearea Voronoi diagram. Computer Visionand Image Understanding, 70, 3:370-382.

[21] Xiao, Y. and Yan, H. 2003. Text regionextraction in a document image based onthe Delaunay tessellation. Pattern Recognition,36, 3: 799-809.

[22] Chi, Z., Wang, Q., and Siu, W. C. 2003.Hierarchical content classification andscript determination for automaticdocument image processing. PatternRecognition, 36, 11: 2483-2500.

[23] Gonzalez, R. C. andWoods, R. E. 1992.“Digital Image Processing”. Addison-Wesley.

[24] Wang, Y., Phillips, I. T., and Haralick, R.M. 2006. Document zone content classificationand its performance evaluation.Pattern Recognition, 39: 57-73.


Available Online: 2006-12-03

Cite this article:

Sun, H.-M., 2006. Enhanced constrained Run-length algorithm for complex layout document processing. International Journal of Applied Science and Engineering, 4, 297–309.

We use cookies on this website to improve your user experience. By using this site you agree to its use of cookies.