TitleEnhancing Robust Text Classification via Category Description
AuthorsGao, Xin
Zhu, Zhengye
Chu, Xu
Wang, Yasha
Ruan, Wenjie
Zhao, Junfeng
AffiliationPeking Univ, Minist Educ, Sch Comp Sci, Key Lab High Confidence Software Technol, Beijing, Peoples R China
Tsinghua Univ, Dept Comp Sci & Technol, Beijing, Peoples R China
Peking Univ, Sch Comp Sci, Beijing, Peoples R China
Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
Univ Exeter, Exeter EX4 4PY, Devon, England
Issue Date2022
Publisher2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
AbstractDespite the success of deep neural networks on text classification, their large capacity also leads to capturing taskirrelevant patterns such as label noise. Label noise is usually introduced into the data during label collection and causes nontrivial declines in performance due to the memorization effect. Though effort has been devoted to combating the label noise in other systems such as image classification, high-quality input features are necessary for discovering task-relevant patterns before memorizing the label noise. However, such a highquality input feature requirement is hard to be satisfied for text classification due to the nature of natural language. To combat the label noise with low-quality input features in the text classification, we propose a novel framework that exploits external category descriptions to construct prototypes that can be used to denoise the input representation and alleviate the overfitting. However, there still remains a challenge that the external category descriptions from other corpora could be semantically discrepant with the underlying task-specific classes in the training corpus. To align their semantics, we propose two regularizers that penalize sample-wise semantic-based deviations at the local level and class-wise structure-based deviations at the global level, respectively. Our extensive experiments across two open datasets and one real-world case study demonstrate that our method is superior to state-of-the-art baselines under various settings of label noise.
URIhttp://hdl.handle.net/20.500.11897/684008
ISBN978-1-6654-5099-7
ISSN1550-4786
DOI10.1109/ICDM54844.2022.00025
IndexedCPCI-S(ISTP)
Appears in Collections:信息科学技术学院
高可信软件技术教育部重点实验室
软件工程国家工程研究中心

Files in This Work
There are no files associated with this item.

Web of Science®



Checked on Last Week

Scopus®



Checked on Current Time

百度学术™



Checked on Current Time

Google Scholar™





License: See PKU IR operational policies.