Title | Exploring representations from unlabeled data with co-training for Chinese word segmentation |
Authors | Zhang, Longkai Wang, Houfeng Sun, Xu Mansur, Mairgup |
Affiliation | Key Laboratory of Computational Linguistics, Ministry of Education, Peking University, China |
Issue Date | 2013 |
Citation | 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013.Seattle, WA, United states. |
Abstract | Nowadays supervised sequence labeling models can reach competitive performance on the task of Chinese word segmentation. However, the ability of these models is restricted by the availability of annotated data and the design of features. We propose a scalable semi-supervised feature engineering approach. In contrast to previous works using pre-defined task-specific features with fixed values, we dynamically extract representations of label distributions from both an in-domain corpus and an out-of-domain corpus. We update the representation values with a semi-supervised approach. Experiments on the benchmark datasets show that our approach achieve good results and reach an f-score of 0.961. The feature engineering approach proposed here is a general iterative semi-supervised method and not limited to the word segmentation task. ? 2013 Association for Computational Linguistics. |
URI | http://hdl.handle.net/20.500.11897/412122 |
Indexed | EI |
Appears in Collections: | 计算语言学教育部重点实验室 |