TY - JOUR
T1 - Tweet classification toward twitter-based disease surveillance
T2 - New data, methods, and evaluations
AU - Wakamiya, Shoko
AU - Morita, Mizuki
AU - Kano, Yoshinobu
AU - Ohkuma, Tomoko
AU - Aramaki, Eiji
N1 - Funding Information:
This work was supported by the Japan Agency for Medical Research and Development (Grant Number: JP16768699) and JST ACT-I (JPMJPR16UU). The authors appreciate annotators in Social Computing laboratory at Nara Institute of Science and Technology for their efforts on generating the corpus. The authors also greatly appreciate the NTCIR-13 chairs for their efforts on organizing the NTCIR-13 workshop. Finally, the authors thank all participants for their contributions to the NTCIR-13 MedWeb task.
Publisher Copyright:
©Shoko Wakamiya, Mizuki Morita, Yoshinobu Kano, Tomoko Ohkuma, Eiji Aramaki.
PY - 2019/2
Y1 - 2019/2
N2 - Background: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. Objective: This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. Methods: In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. Results: The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. Conclusions: This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
AB - Background: The amount of medical and clinical-related information on the Web is increasing. Among the different types of information available, social media–based data obtained directly from people are particularly valuable and are attracting significant attention. To encourage medical natural language processing (NLP) research exploiting social media data, the 13th NII Testbeds and Community for Information access Research (NTCIR-13) Medical natural language processing for Web document (MedWeb) provides pseudo-Twitter messages in a cross-language and multi-label corpus, covering 3 languages (Japanese, English, and Chinese) and annotated with 8 symptom labels (such as cold, fever, and flu). Then, participants classify each tweet into 1 of the 2 categories: those containing a patient’s symptom and those that do not. Objective: This study aimed to present the results of groups participating in a Japanese subtask, English subtask, and Chinese subtask along with discussions, to clarify the issues that need to be resolved in the field of medical NLP. Methods: In summary, 8 groups (19 systems) participated in the Japanese subtask, 4 groups (12 systems) participated in the English subtask, and 2 groups (6 systems) participated in the Chinese subtask. In total, 2 baseline systems were constructed for each subtask. The performance of the participant and baseline systems was assessed using the exact match accuracy, F-measure based on precision and recall, and Hamming loss. Results: The best system achieved exactly 0.880 match accuracy, 0.920 F-measure, and 0.019 Hamming loss. The averages of match accuracy, F-measure, and Hamming loss for the Japanese subtask were 0.720, 0.820, and 0.051; those for the English subtask were 0.770, 0.850, and 0.037; and those for the Chinese subtask were 0.810, 0.880, and 0.032, respectively. Conclusions: This paper presented and discussed the performance of systems participating in the NTCIR-13 MedWeb task. As the MedWeb task settings can be formalized as the factualization of text, the achievement of this task could be directly applied to practical clinical applications.
KW - Artificial intelligence
KW - Machine learning
KW - Natural language processing
KW - Social media
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85061863441&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061863441&partnerID=8YFLogxK
U2 - 10.2196/12783
DO - 10.2196/12783
M3 - Article
C2 - 30785407
AN - SCOPUS:85061863441
SN - 1439-4456
VL - 21
JO - Journal of medical Internet research
JF - Journal of medical Internet research
IS - 2
M1 - e12783
ER -