Title | Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model |
Authors | Xu, Jingjing He, Hangfeng Sun, Xu Ren, Xuancheng Li, Sujian |
Affiliation | Peking Univ, MOE Key Lab Computat Linguist, Beijing 100871, Peoples R China. Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA. |
Keywords | Named entity recognition Chinese social media cross-domain learning semi-supervised learning |
Issue Date | 2018 |
Publisher | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING |
Citation | IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING. 2018, 26(11), 2142-2152. |
Abstract | Named entity recognition (NER) in Chinese social media is an important, but challenging task because Chinese social media language is informal and noisy. Most previous methods on NER focus on in-domain supervised learning, which is limited by scarce annotated data in social media. In this paper, we present that sufficient corpora in formal domains and massive unannotated text can be combined to improve the NER performance in social media. We propose a unified model which can learn from out-of-domain corpora and in-domain unannotated text. The unified model is composed of two parts. One is for cross-domain learning and the other is for semisupervised learning. Cross-domain learning can learn out-of-domain information based on domain similarity. Semisupervised learning can learn in-domain unannotated information by self-training. Experimental results show that our unified model yields a 9.57% improvement over strong baselines and achieves the state-of-the-art performance. |
URI | http://hdl.handle.net/20.500.11897/516909 |
ISSN | 2329-9290 |
DOI | 10.1109/TASLP.2018.2856625 |
Indexed | SCI(E) EI |
Appears in Collections: | 计算语言学教育部重点实验室 |