CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs

Shota Fujii, Nobutaka Kawaguchi, Tomohiro Shigemoto, Toshihiro Yamauchi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Cybersecurity threats have been increasing and growing more sophisticated year by year. In such circumstances, gathering Cyber Threat Intelligence (CTI) and following up with up-to-date threat information is crucial. Structured CTI such as Structured Threat Information eXpression (STIX) is particularly useful because it can automate security operations such as updating FW/IDS rules and analyzing attack trends. However, as most CTIs are written in natural language, manual analysis with domain knowledge is required, which becomes quite time-consuming. In this work, we propose CyNER, a method for automatically structuring CTIs and converting them into STIX format. CyNER extracts named entities in the context of CTI and then extracts the relations between named entities and IOCs in order to convert them into STIX. In addition, by using key phrase extraction, CyNER can extract relations between IOCs that lack contextual information, such as those listed at the bottom of a CTI, and named entities. We describe our design and implementation of CyNER and demonstrate that it can extract named entities with the F-measure of 0.80 and extract relations between named entities and IOCs with the maximum accuracy of 81.6%. Our analysis of structured CTI showed that CyNER can extract IOCs that are not included in existing reputation sites, and that it can automatically extract IOCs that have been exploited for a long time and across multiple attack groups. CyNER is thus expected to contribute to the efficiency of CTI analysis.

Original languageEnglish
Title of host publicationAdvances in Information and Computer Security - 17th International Workshop on Security, IWSEC 2022, Proceedings
EditorsChen-Mou Cheng, Mitsuaki Akiyama
PublisherSpringer Science and Business Media Deutschland GmbH
Pages85-104
Number of pages20
ISBN (Print)9783031152542
DOIs
Publication statusPublished - 2022
Event17th International Workshop on Security, IWSEC 2022 - Tokyo, Japan
Duration: Aug 31 2022Sept 2 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13504 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Workshop on Security, IWSEC 2022
Country/TerritoryJapan
CityTokyo
Period8/31/229/2/22

Keywords

  • Cyber Threat Intelligence
  • Information Extraction
  • Named Entity Recognition
  • Relation Extraction
  • STIX

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs'. Together they form a unique fingerprint.

Cite this