Error detection of CRF-based bibliography extraction from reference strings

Manabu Ohta, Daiki Arauchi, Atsuhiro Takasu, Jun Adachi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We proposed a parsing method for reference strings usually listed at the end of research papers to extract important bibliographies such as a title from them. The method uses a conditional random field (CRF) to estimate the correct bibliographic label for each token in the token sequence generated from a reference string. Although we achieved reasonable parsing accuracies for a Japanese academic journal, errors are inevitable. Therefore, this paper proposes ways to increase confidence for CRF-based bibliography parsing to detect such parsing errors. This paper also reports an empirical evaluation of the proposed parsing on the basis not only of its accuracies but also of how easy it is to detect errors. The experiments showed that the proposed measures reasonably indicated parsing errors and could be used to improve the quality of extracted bibliographies at a moderate manual post-editing cost.

Original languageEnglish
Title of host publicationThe Outreach of Digital Libraries
Subtitle of host publicationA Globalized Resource Network - 14th International Conference on Asia-Pacific Digital Libraries, ICADL 2012, Proceedings
Pages229-238
Number of pages10
DOIs
Publication statusPublished - 2012
Event14th International Conference on Asia-Pacific Digital Libraries, ICADL 2012 - Taipei, Taiwan, Province of China
Duration: Nov 12 2012Nov 15 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7634 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other14th International Conference on Asia-Pacific Digital Libraries, ICADL 2012
Country/TerritoryTaiwan, Province of China
CityTaipei
Period11/12/1211/15/12

Keywords

  • bibliography extraction
  • conditional random field (CRF)
  • confidence measure
  • digital library
  • error detection
  • reference string

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Error detection of CRF-based bibliography extraction from reference strings'. Together they form a unique fingerprint.

Cite this