TitleExploring representations from unlabeled data with co-training for Chinese word segmentation
AuthorsZhang, Longkai
Wang, Houfeng
Sun, Xu
Mansur, Mairgup
AffiliationKey Laboratory of Computational Linguistics, Ministry of Education, Peking University, China
Issue Date2013
Citation2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013.Seattle, WA, United states.
AbstractNowadays supervised sequence labeling models can reach competitive performance on the task of Chinese word segmentation. However, the ability of these models is restricted by the availability of annotated data and the design of features. We propose a scalable semi-supervised feature engineering approach. In contrast to previous works using pre-defined task-specific features with fixed values, we dynamically extract representations of label distributions from both an in-domain corpus and an out-of-domain corpus. We update the representation values with a semi-supervised approach. Experiments on the benchmark datasets show that our approach achieve good results and reach an f-score of 0.961. The feature engineering approach proposed here is a general iterative semi-supervised method and not limited to the word segmentation task. ? 2013 Association for Computational Linguistics.
Appears in Collections:计算语言学教育部重点实验室

Files in This Work
There are no files associated with this item.

Web of Science®

Checked on Last Week


Checked on Current Time

License: See PKU IR operational policies.