Loading…
Improving Chinese named entity recognition with lexical information
Named entity recognition (NER) plays a critical role in many natural language processing applications. Chinese NER is usually formalized as a chunking task. However, most formulations do not distinguish named entities from common words. This makes it difficult to explore lexical cues for NER. In thi...
Saved in:
Main Author: | |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Named entity recognition (NER) plays a critical role in many natural language processing applications. Chinese NER is usually formalized as a chunking task. However, most formulations do not distinguish named entities from common words. This makes it difficult to explore lexical cues for NER. In this paper we propose a two-level IOB2 representation to merge lexical chunks and entity chunks, and develop a morpheme-based chunking system for Chinese NER. It works in three main steps: Given a plain Chinese sentence, a morpheme segmenter first segments it into a sequence of morphemes, then a lexical chunker is applied to tag each segmented morpheme with a proper lexical chunk tag indicating its position pattern in forming a word of a special type, and finally an entity chunker continues to label each morpheme with a hybrid chunk tag, containing the related entity boundary and category information if any. Our experiments on the IEER-99 and MET2 data demonstrate a significant enhancement of NER performance after using entity-internal part-of-speech information. We also show that lexical chunking quality is of importance for NER results. |
---|---|
ISSN: | 2160-133X |
DOI: | 10.1109/ICMLC.2009.5212793 |