TY - GEN
T1 - Cost evaluation of CRF-based bibliography extraction from reference strings
AU - Kawakami, Naomichi
AU - Ohta, Manabu
AU - Takasu, Atsuhiro
AU - Adachi, Jun
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2014.
PY - 2014
Y1 - 2014
N2 - The effective use of digital libraries demands maintenance of bibliographic databases. Especially, the reference fields of academic papers are full of useful bibliographic information such as authors' names and paper titles. We, therefore, propose a method of automatically extracting bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary for training the CRF to achieve high extraction accuracies. As described herein, we propose the use of active sampling and pseudo-training data to reduce the amount of training data. Then we evaluate the associated training costs by experimentation.
AB - The effective use of digital libraries demands maintenance of bibliographic databases. Especially, the reference fields of academic papers are full of useful bibliographic information such as authors' names and paper titles. We, therefore, propose a method of automatically extracting bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary for training the CRF to achieve high extraction accuracies. As described herein, we propose the use of active sampling and pseudo-training data to reduce the amount of training data. Then we evaluate the associated training costs by experimentation.
KW - Active sampling
KW - CRF
KW - Information extraction
KW - Pseudo-training data
KW - Reference string
UR - http://www.scopus.com/inward/record.url?scp=84909643322&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84909643322&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-12823-8_28
DO - 10.1007/978-3-319-12823-8_28
M3 - Conference contribution
AN - SCOPUS:84909643322
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 268
EP - 278
BT - The Emergence of Digital Libraries - Research and Practices - 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014, Proceedings
A2 - Tuamsuk, Kulthida
A2 - Jatowt, Adam
A2 - Rasmussen, Edie
PB - Springer Verlag
T2 - 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014
Y2 - 5 November 2014 through 7 November 2014
ER -