TY - GEN
T1 - An approach to estimating cited sentences in academic papers using Doc2vec
AU - Tanabe, Shunsuke
AU - Takasu, Atsuhiro
AU - Ohta, Manabu
AU - Adachi, Jun
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/9/25
Y1 - 2018/9/25
N2 - Most academic authors refer to the literature when introducing their proposed methods and the data used in their experiments. These references can be very helpful when trying to understand a paper; however, some authors do not always state clearly the specific part of the referenced work they are referring the reader to and it can be quite labor-intensive to have to read the whole document to identify the relevant information. In this paper, we propose a method for estimating the appropriate parts of a referenced work as the “cited parts,” with the aim of reducing this burden. We first extract sentences in an academic paper that cites references to the literature as “citing sentences.” We then vectorize the citing sentences and all the sentences in the cited papers using doc2vec and estimate the most appropriate cited part as the sentence that has the most similar feature vector to that of the citing sentence. To evaluate the proposed method, we conducted experiments using English-language papers and a questionnaire survey that asked subjects to evaluate the appropriateness of the cited parts estimated by the method. The experiments showed that this approach’s success in estimating the appropriate parts of a cited paper as the cited parts depended on the citation intention of the citing sentences.
AB - Most academic authors refer to the literature when introducing their proposed methods and the data used in their experiments. These references can be very helpful when trying to understand a paper; however, some authors do not always state clearly the specific part of the referenced work they are referring the reader to and it can be quite labor-intensive to have to read the whole document to identify the relevant information. In this paper, we propose a method for estimating the appropriate parts of a referenced work as the “cited parts,” with the aim of reducing this burden. We first extract sentences in an academic paper that cites references to the literature as “citing sentences.” We then vectorize the citing sentences and all the sentences in the cited papers using doc2vec and estimate the most appropriate cited part as the sentence that has the most similar feature vector to that of the citing sentence. To evaluate the proposed method, we conducted experiments using English-language papers and a questionnaire survey that asked subjects to evaluate the appropriateness of the cited parts estimated by the method. The experiments showed that this approach’s success in estimating the appropriate parts of a cited paper as the cited parts depended on the citation intention of the citing sentences.
KW - Academic paper
KW - Browsing support
KW - Citation
KW - Doc2vec
KW - Reference
UR - http://www.scopus.com/inward/record.url?scp=85058649582&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058649582&partnerID=8YFLogxK
U2 - 10.1145/3281375.3281391
DO - 10.1145/3281375.3281391
M3 - Conference contribution
AN - SCOPUS:85058649582
T3 - MEDES 2018 - 10th International Conference on Management of Digital EcoSystems
SP - 118
EP - 125
BT - MEDES 2018 - 10th International Conference on Management of Digital EcoSystems
PB - Association for Computing Machinery, Inc
T2 - 10th International Conference on Management of Digital EcoSystems, MEDES 2018
Y2 - 25 September 2018 through 28 September 2018
ER -