TY - GEN
T1 - CRF-based bibliography extraction from reference strings using a small amount of training data
AU - Namikoshi, Daiki
AU - Ohta, Manabu
AU - Takasu, Atsuhiro
AU - Adachi, Jun
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/6/28
Y1 - 2017/6/28
N2 - The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.
AB - The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.
KW - CRF
KW - active learning
KW - bibliography extraction
KW - confidence measure
KW - transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85049371656&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049371656&partnerID=8YFLogxK
U2 - 10.1109/ICDIM.2017.8244665
DO - 10.1109/ICDIM.2017.8244665
M3 - Conference contribution
AN - SCOPUS:85049371656
T3 - 2017 12th International Conference on Digital Information Management, ICDIM 2017
SP - 59
EP - 64
BT - 2017 12th International Conference on Digital Information Management, ICDIM 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on Digital Information Management, ICDIM 2017
Y2 - 12 September 2017 through 14 September 2017
ER -