Title | Predicting Chinese abbreviations with minimum semantic unit and global constraints |
Authors | Zhang, Longkai Li, Li Wang, Houfeng Sun, Xu |
Affiliation | Key Laboratory of Computational Linguistics, Peking University, Ministry of Education, China |
Issue Date | 2014 |
Citation | 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014.Doha, Qatar. |
Abstract | We propose a new Chinese abbreviation prediction method which can incorporate rich local information while generating the abbreviation globally. Different to previous character tagging methods, we introduce the minimum semantic unit, which is more fine-grained than character but more coarse-grained than word, to capture word level information in the sequence labeling framework. To solve the 'character duplication' problem in Chinese abbreviation prediction, we also use a substring tagging strategy to generate local substring tagging candidates. We use an integer linear programming (ILP) formulation with various constraints to globally decode the final abbreviation from the generated candidates. Experiments show that our method outperforms the state-of-the-art systems, without using any extra resource. ? 2014 Association for Computational Linguistics. |
URI | http://hdl.handle.net/20.500.11897/329932 |
Indexed | EI |
Appears in Collections: | 计算语言学教育部重点实验室 |