Cost evaluation of CRF-based bibliography extraction from reference strings

Naomichi Kawakami, Manabu Ohta, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The effective use of digital libraries demands maintenance of bibliographic databases. Especially, the reference fields of academic papers are full of useful bibliographic information such as authors' names and paper titles. We, therefore, propose a method of automatically extracting bibliographic information from reference strings using a conditional random field (CRF). However, at least a few hundred reference strings are necessary for training the CRF to achieve high extraction accuracies. As described herein, we propose the use of active sampling and pseudo-training data to reduce the amount of training data. Then we evaluate the associated training costs by experimentation.

Original languageEnglish
Title of host publicationThe Emergence of Digital Libraries - Research and Practices - 16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014, Proceedings
EditorsKulthida Tuamsuk, Adam Jatowt, Edie Rasmussen
PublisherSpringer Verlag
Pages268-278
Number of pages11
ISBN (Electronic)9783319128221
DOIs
Publication statusPublished - 2014
Event16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014 - Chiang Mai, Thailand
Duration: Nov 5 2014Nov 7 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8839
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th International Conference on Asia-Pacific Digital Libraries, ICADL 2014
Country/TerritoryThailand
CityChiang Mai
Period11/5/1411/7/14

Keywords

  • Active sampling
  • CRF
  • Information extraction
  • Pseudo-training data
  • Reference string

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Cost evaluation of CRF-based bibliography extraction from reference strings'. Together they form a unique fingerprint.

Cite this