Loading…
Automatic Technical Term Extraction Based on Term Association
This paper proposes a new automatic Chinese term extracting algorithm combining both statistics-based and rule-based methods. This algorithm firstly uses a statistical method to extract two-word candidates from raw corpus, and then extends these candidates forward to obtain multi-word candidate term...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper proposes a new automatic Chinese term extracting algorithm combining both statistics-based and rule-based methods. This algorithm firstly uses a statistical method to extract two-word candidates from raw corpus, and then extends these candidates forward to obtain multi-word candidate terms. We propose a new metric named term association (TA) that can measure the combining degree between words in a string very well. In the second subsystem it filters these candidates to get domain-specific technical terms based on defined rules. Our purpose is to achieve a higher precision of the domain-specific Chinese term extraction task by the hybrid method than the previous approaches. This algorithm implements an extractor with an unprocessed corpus as input for technical papers of ethanol fuels. The results of experiments are analyzed and evaluated, and the precision and recall are 84.26% and 63.86% respectively. |
---|---|
DOI: | 10.1109/FSKD.2008.40 |