Title | A unified model for cross-domain and semi-supervised named entity recognition in Chinese Social Media |
Authors | He, Hangfeng Sun, Xu |
Affiliation | MOE Key Laboratory of Computational Linguistics, Peking University School of Electronics Engineering and Computer Science, Peking University, China |
Issue Date | 2017 |
Publisher | 31st AAAI Conference on Artificial Intelligence, AAAI 2017 |
Citation | 31st AAAI Conference on Artificial Intelligence, AAAI 2017. 2017, 3216-3222. |
Abstract | Named entity recognition (NER) in Chinese social media is important but difficult because of its informality and strong noise. Previous methods only focus on in-domain supervised learning which is limited by the rare annotated data. However, there are enough corpora in formal domains and massive in-domain unannotated texts which can be used to improve the task. We propose a unified model which can learn from out-of-domain corpora and in-domain unannotated texts. The unified model contains two major functions. One is for cross-domain learning and another for semi-supervised learning. Cross-domain learning function can learn out-of-domain information based on domain similarity. Semi-Supervised learning function can learn in-domain unannotated information by self-training. Both learning functions outperform existing methods for NER in Chinese social media. Finally, our unified model yields nearly 11% absolute improvement over previously published results. Copyright ? 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. |
URI | http://hdl.handle.net/20.500.11897/505194 |
Indexed | EI |
Appears in Collections: | 信息科学技术学院 计算语言学教育部重点实验室 |