TitleCross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model
AuthorsXu, Jingjing
He, Hangfeng
Sun, Xu
Ren, Xuancheng
Li, Sujian
AffiliationPeking Univ, MOE Key Lab Computat Linguist, Beijing 100871, Peoples R China.
Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA.
KeywordsNamed entity recognition
Chinese social media
cross-domain learning
semi-supervised learning
Issue Date2018
PublisherIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING
CitationIEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING. 2018, 26(11), 2142-2152.
AbstractNamed entity recognition (NER) in Chinese social media is an important, but challenging task because Chinese social media language is informal and noisy. Most previous methods on NER focus on in-domain supervised learning, which is limited by scarce annotated data in social media. In this paper, we present that sufficient corpora in formal domains and massive unannotated text can be combined to improve the NER performance in social media. We propose a unified model which can learn from out-of-domain corpora and in-domain unannotated text. The unified model is composed of two parts. One is for cross-domain learning and the other is for semisupervised learning. Cross-domain learning can learn out-of-domain information based on domain similarity. Semisupervised learning can learn in-domain unannotated information by self-training. Experimental results show that our unified model yields a 9.57% improvement over strong baselines and achieves the state-of-the-art performance.
URIhttp://hdl.handle.net/20.500.11897/516909
ISSN2329-9290
DOI10.1109/TASLP.2018.2856625
IndexedSCI(E)
EI
Appears in Collections:计算语言学教育部重点实验室

Files in This Work
There are no files associated with this item.

Web of Science®



Checked on Last Week

Scopus®



Checked on Current Time

百度学术™



Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.