TitleRAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
AuthorsYang, Wenkai
Lin, Yankai
Li, Peng
Zhou, Jie
Sun, Xu
AffiliationPeking Univ, Ctr Data Sci, Beijing, Peoples R China
Tencent Inc, Pattern Recognit Ctr, WeChat AI, Shenzhen, Peoples R China
Peking Univ, Sch EECS, MOE Key Lab Computat Linguist, Beijing, Peoples R China
Issue Date2021
Publisher2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021)
AbstractBackdoor attacks, which maliciously control a well-trained model's outputs of the instances with specific triggers, are recently shown to be serious threats to the safety of reusing deep neural networks (DNNs). In this work, we propose an efficient online defense mechanism based on robustness-aware perturbations. Specifically, by analyzing the backdoor training process, we point out that there exists a big gap of robustness between poisoned and clean samples. Motivated by this observation, we construct a word-based robustness-aware perturbation to distinguish poisoned samples from clean samples to defend against the backdoor attacks on natural language processing (NLP) models. Moreover, we give a theoretical analysis about the feasibility of our robustness-aware perturbation-based defense method. Experimental results on sentiment analysis and toxic detection tasks show that our method achieves better defending performance and much lower computational costs than existing online defense methods.
URIhttp://hdl.handle.net/20.500.11897/657192
ISBN978-1-955917-09-4
IndexedEI
CPCI-SSH(ISSHP)
CPCI-S(ISTP)
Appears in Collections:其他研究院
信息科学技术学院
计算语言学教育部重点实验室

Files in This Work
There are no files associated with this item.

Web of Science®



Checked on Last Week

百度学术™



Checked on Current Time




License: See PKU IR operational policies.