Loading…

Improving Chinese named entity recognition with lexical information

Named entity recognition (NER) plays a critical role in many natural language processing applications. Chinese NER is usually formalized as a chunking task. However, most formulations do not distinguish named entities from common words. This makes it difficult to explore lexical cues for NER. In thi...

Full description

Saved in:
Bibliographic Details
Main Author: Guo-Hong Fu
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Named entity recognition (NER) plays a critical role in many natural language processing applications. Chinese NER is usually formalized as a chunking task. However, most formulations do not distinguish named entities from common words. This makes it difficult to explore lexical cues for NER. In this paper we propose a two-level IOB2 representation to merge lexical chunks and entity chunks, and develop a morpheme-based chunking system for Chinese NER. It works in three main steps: Given a plain Chinese sentence, a morpheme segmenter first segments it into a sequence of morphemes, then a lexical chunker is applied to tag each segmented morpheme with a proper lexical chunk tag indicating its position pattern in forming a word of a special type, and finally an entity chunker continues to label each morpheme with a hybrid chunk tag, containing the related entity boundary and category information if any. Our experiments on the IEER-99 and MET2 data demonstrate a significant enhancement of NER performance after using entity-internal part-of-speech information. We also show that lexical chunking quality is of importance for NER results.
ISSN:2160-133X
DOI:10.1109/ICMLC.2009.5212793