Title | CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark |
Authors | Zhang, Ningyu Chen, Mosha Bi, Zhen Liang, Xiaozhuan Li, Lei Shang, Xin Yin, Kangping Tan, Chuanqi Xu, Jian Huang, Fei Si, Luo Ni, Yuan Xie, Guotong Sui, Zhifang Chang, Baobao Zong, Hui Yuan, Zheng Li, Linfeng Yan, Jun Zan, Hongying Zhang, Kunli Tang, Buzhou Chen, Qingcai |
Affiliation | Zhejiang Univ, AZFT Joint Lab Knowledge Engine, Hangzhou, Peoples R China Alibaba Grp, Hangzhou, Peoples R China Zhejiang Univ, Sch Math Sci, Hangzhou, Peoples R China Pingan Hlth Technol, Hong Kong, Peoples R China Ping An Hlth Cloud Co Ltd, Hong Kong, Peoples R China Ping An Int Smart City Technol Co Ltd, Hong Kong, Peoples R China Peking Univ, Key Lab Computat Linguist, Minist Educ, Beijing, Peoples R China Tongji Univ, Sch Life Sci & Technol, Shanghai, Peoples R China Tsinghua Univ, Beijing, Peoples R China Yidu Cloud Technol Inc, Beijing, Peoples R China Zhengzhou Univ, Sch Informat Engn, Zhengzhou, Peoples R China Harbin Inst Technol Shenzhen, Shenzhen, Peoples R China Peng Cheng Lab, Shenzhen, Peoples R China Philips Res China, Shanghai, Peoples R China |
Issue Date | 2022 |
Publisher | PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) |
Abstract | With the development of biomedical language understanding benchmarks, Artificial Intelligence applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform far worse than the human ceiling(1). |
URI | http://hdl.handle.net/20.500.11897/654032 |
ISBN | 978-1-955917-21-6 |
Indexed | CPCI-SSH(ISSHP) CPCI-S(ISTP) |
Appears in Collections: | 计算语言学教育部重点实验室 |