TitleAnalysis of book documents' table of content based on clustering
AuthorsGao, Liangcai
Tang, Zhi
Lin, Xiaofan
Tao, Xin
Chu, Yimin
AffiliationInstitute of Computer Science and Technology, Peking University, China
Vobile Inc.
Issue Date2009
CitationICDAR2009 - 10th International Conference on Document Analysis and Recognition.Barcelona, Spain.
AbstractTable of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we have observed that book documents are multi-page documents with intrinsic local format consistency. Based on this finding we introduce an automatic TOC analysis method through clustering. This method first detects the decorative elements in TOC pages. Then it learns a layout model used in the TOC pages through clustering. Finally, it generates TOC entries and extracts their hierarchical structure under the guidance of the model. More specifically, broken lines are taken into account in the method. Experimental results show that this method achieves high accuracy and efficiency. In addition, this method has been successfully applied in a commercial E-book production software package. ? 2009 IEEE.
URIhttp://hdl.handle.net/20.500.11897/162034
ISSN9780769537252
DOI10.1109/ICDAR.2009.143
IndexedEI
Appears in Collections:王选计算机研究所

Web of Science®



Checked on Last Week

Scopus®



Checked on Current Time

百度学术™



Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.