CRF-based bibliography extraction from reference strings using a small amount of training data

Daiki Namikoshi, Manabu Ohta, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The effective use of digital libraries demands maintenance of bibliographic databases. Useful bibliographic information appears in the reference fields of academic papers, so we are developing a method for automatic extraction of bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary to learn an accurate CRF. In this paper, we propose active learning and transfer learning techniques to reduce the required training data for CRFs. We evaluate extraction accuracies and the associated training cost by experiments.

Original languageEnglish
Title of host publication2017 12th International Conference on Digital Information Management, ICDIM 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages59-64
Number of pages6
ISBN (Electronic)9781538606643
DOIs
Publication statusPublished - Jun 28 2017
Event12th International Conference on Digital Information Management, ICDIM 2017 - Fukuoka, Japan
Duration: Sept 12 2017Sept 14 2017

Publication series

Name2017 12th International Conference on Digital Information Management, ICDIM 2017
Volume2018-January

Other

Other12th International Conference on Digital Information Management, ICDIM 2017
Country/TerritoryJapan
CityFukuoka
Period9/12/179/14/17

Keywords

  • CRF
  • active learning
  • bibliography extraction
  • confidence measure
  • transfer learning

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'CRF-based bibliography extraction from reference strings using a small amount of training data'. Together they form a unique fingerprint.

Cite this