Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing

Hung-Ming Sun

doi:10.6703/IJASE.2006.4(3).297

Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing

Hung-Ming Sun^*

Department of Information Management, Kainan University, No. 1 Kainan Road, Luchu, Taoyuan County, 33857, Taiwan, R.O.C.

Download Citation: |
Download PDF

ABSTRACT

The Constrained Run-Length Algorithm (CRLA) is a well-known technique for pagesegmentation. The algorithm is very efficient for partitioning documents with Manhattan layoutsbut not suited to deal with complex layout pages, e.g. irregular graphics embedded in a textparagraph. Its main drawback is to use only local information during the smearing stage, whichmay lead to erroneous linkage of text and graphics. This paper presents a solution to this problemby adding global information into the process of the CRLA. This enhanced CRLA can be appliedto non-Manhattan page layout successfully. It can also extract text surrounded by a box. Bothcases cannot be processed by the original CRLA.

Keywords: constrained run-length algorithm; page segmentation; document processing.

Share this article with your colleagues

REFERENCES

[1] Wahl, F. M., Wong, K. Y., and Casey, R.G. 1982. Block segmentation and textextraction in mixed text/image documents.Computer Graphics Image Processing,20: 375-390.

[2] Nagy, G. and Seth, S. C. 1984. Hierarchicalrepresentation of opticallyscanned documents. In Proceedings. 7thICPR, Montreal, 347-349.

[3] Fletcher, L. A. and Kasturi, R. 1988. Arobust algorithm for text string separationfrom mixed text/graphics Images.IEEE Trans. Pattern Analysis and MachineIntelligence, 10, 6: 910-918.

[4] O’Gorman, L. 1993. The documentspectrum for page layout analysis. IEEETrans. Pattern Analysis and MachineIntelligence, 15, 11: 1162-1173.

[5] Simon, A., Pret, J.-C., and Johnson, A. P.1997. A fast algorithm for bottom-updocument layout analysis. IEEE Trans.Pattern Analysis and Machine Intelligence,19, 3: 273-277.

[6] Jain, A. K. and Bhattachariee, S. 1992.Text segmentation using Gabor filters forautomatic document processing. MachineVision and Applications, 5:169-184.

[7] Williams, P. S. and Alder, M. D. 1996.Generic texture analysis applied tonewspaper segmentation. In Proceedings.1996 IEEE International. Conference.Neural Networks, Washington DC,1664-1669.

[8] Lin, J., Tang, Y. Y., and Suen, C. Y.1997. Chinese document layout analysisbased on adaptive split-and-merge and qualitative spatial reasoning. PatternRecognition, 30, 8: 1265-1278.

[9] Pavlidis, T. and Zhou, J. 1992. Pagesegmentation and classification. CVGIP:Graphical Models and Image Processing,54, 6: 484-496.

[10] Baird, H. S. 1994. Background structurein document images. “Document ImageAnalysis”, World Scientific Publishing,17-34.

[11] Chi, Z., Wang, Q., and Siu, W.-C. 2003.Hierarchical content classification andscript determination for automaticdocument image processing. PatternRecognition, 36, 11: 2483-2500.

[12] Nagy, G. 2000, Twenty years of documentimage analysis in PAMI. IEEETransfusion. Pattern Analysis and MachineIntelligence, 22, 1: 38-62.

[13] Shih, F. Y. and Chen, S. S. 1996. Adaptivedocument block segmentation andclassification. IEEE Transfusion. SystemMan and Cybernetics-PART B: Cybernetics,26, 5: 797-802.

[14] Fisher, J. L., Hinds, S. C., and D’amato,D. P. 1990. A rule-based system fordocument image segmentation. In Proceedings.10th ICPR, Atlantic, 567-572.

[15] Shih, F. Y., Chen, S. S., Hung, D. C. D.,and Ng, P. A. 1992. A document segmentation,classification and recognitionsystem. In Proceedings. IEEE International.Conference. System Integration,Morristown, NJ, 258-267.

[16] Xi, J., Hu, J., and Wu, L. 2002. Pagesegmentation of Chinese newspapers.Pattern Recognition, 35, 12: 2695-2704.

[17] Hadjar, K. and Ingold, R. 2003. Arabicnewspaper page segmentation. Process.7th ICDAR, Edinburgh, Scotland,895-899.

[18] Etemad, K., Doermann, D., and Chellappa,R. 1997. Multiscale segmentationof unstructured document pages usingsoft decision integration. IEEE Trans.Pattern Analysis and Machine Intelligence,19, 1: 92-96.

[19] Antonacopuolos, A. 1998. Page segmentationusing the description of thebackground. Computer Vision and ImageUnderstanding, 70, 3: 350-369.

[20] Kise, K., Sato, A., and Iwata, M. 1998.Segmentation of page images using thearea Voronoi diagram. Computer Visionand Image Understanding, 70, 3:370-382.

[21] Xiao, Y. and Yan, H. 2003. Text regionextraction in a document image based onthe Delaunay tessellation. Pattern Recognition,36, 3: 799-809.

[22] Chi, Z., Wang, Q., and Siu, W. C. 2003.Hierarchical content classification andscript determination for automaticdocument image processing. PatternRecognition, 36, 11: 2483-2500.

[23] Gonzalez, R. C. andWoods, R. E. 1992.“Digital Image Processing”. Addison-Wesley.

[24] Wang, Y., Phillips, I. T., and Haralick, R.M. 2006. Document zone content classificationand its performance evaluation.Pattern Recognition, 39: 57-73.

ARTICLE INFORMATION

Available Online: 2006-12-03

Cite this article:

Sun, H.-M., 2006. Enhanced constrained Run-length algorithm for complex layout document processing. International Journal of Applied Science and Engineering, 4, 297–309. https://doi.org/10.6703/IJASE.2006.4(3).297

Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing

ABSTRACT

REFERENCES

ARTICLE INFORMATION

Other people also read ...

Monitoring soil resilience via the dynamic changes of selected physicochemical properties of soil in a tropical rehabilitated forest

Efficacy of real-time audio biofeedback on physiological strains for simulated tasks with medium and heavy loads

An alternative framework for implementing generator coherency prediction and islanding detection scheme considering critical contingency in an interconnected power grid

Usability evaluation for driving simulation with the mechanical and joystick manual controllers

Formulation, characterization, and optimization of aripiprazole-loaded lyotropic liquid crystalline nanoparticle for sustained release and better encapsulation efficiency against psychosis disorder

Influence of palm oil mills effluent (POME) sludge vermicomposting on soil physicochemical properties and Zea mays growth performances

IJASE - Most Read Articles

IJASE - Most popular articles

Modeling the shear capacity of externally bonded fiber reinforced polymer strengthened beams by artificial neural network

Spam classification problems using support vector machine and grid search

Comparative study of eco-performance evaluation for municipal solid waste management practices

About IJASE

Articles

For Authors

Publisher