Hung-Ming Sun*

Department of Information Management, Kainan University, No. 1 Kainan Road, Luchu, Taoyuan County, 33857, Taiwan, R.O.C.


The Constrained Run-Length Algorithm (CRLA) is a well-known technique for pagesegmentation. The algorithm is very efficient for partitioning documents with Manhattan layoutsbut not suited to deal with complex layout pages, e.g. irregular graphics embedded in a textparagraph. Its main drawback is to use only local information during the smearing stage, whichmay lead to erroneous linkage of text and graphics. This paper presents a solution to this problemby adding global information into the process of the CRLA. This enhanced CRLA can be appliedto non-Manhattan page layout successfully. It can also extract text surrounded by a box. Bothcases cannot be processed by the original CRLA.

Keywords: constrained run-length algorithm; page segmentation; document processing.

Available Online: 2006-12-03

Sun, H.-M., 2006. Enhanced constrained Run-length algorithm for complex layout document processing. International Journal of Applied Science and Engineering, 4, 297–309.