Loading…

Using WordNet synonym substitution to enhance UMLS source integration

Summary Objective Synonym-substitution algorithms have been developed for the purpose of matching source vocabulary terms with existing Unified Medical Language System (UMLS) terms during the integration process. A drawback is the possible explosion in the number of newly generated (potential) synon...

Full description

Saved in:
Bibliographic Details
Published in:Artificial intelligence in medicine 2009-06, Vol.46 (2), p.97-109
Main Authors: Huang, Kuo-Chuan, Geller, James, Halper, Michael, Perl, Yehoshua, Xu, Junchuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary Objective Synonym-substitution algorithms have been developed for the purpose of matching source vocabulary terms with existing Unified Medical Language System (UMLS) terms during the integration process. A drawback is the possible explosion in the number of newly generated (potential) synonyms, which can tax computational and expert review resources. Experiments are run using a synonym-substitution approach based on WordNet to see how constraining two methodological parameters, namely, “maximum number of substitutions per term” and “maximum term length,” affects performance. Our hypothesis is that these values can be constrained rather tightly—thus greatly speeding up the methodology—without a marked decline in the additional matches produced. Furthermore, we investigate whether a limitation on only the first of the two parameters is sufficient to achieve the same results. Methods A four-stage synonym-substitution methodology using WordNet is presented. A group of experiments is carried out in which the two methodological parameters “maximum number of substitutions per term” and “maximum term length” are varied. The purpose is to examine their effect on the growth in the number of potential synonyms generated and the associated loss of results. The experiments are based on the re-integration of the “Minimal Standard Terminology” (MST) into the UMLS. Synonym-substitution matches found to be inconsistent with the current content of the UMLS and thus deemed to be incorrect are further manually scrutinized as an audit of the original integration of the MST. Results An increase of 11% in the number of “MST term/UMLS term” matches was achieved using the synonym-substitution methodology. Importantly, this result prevailed when tight threshold values (such as a maximum of two synonym substitutions per term) were imposed on the parameters. Furthermore, it was found that limiting only the “maximum number of substitutions per term” parameter was sufficient to obtain the performance enhancement. During the additional audit phase, a number of the reported mismatches were actually seen to be correct, representing an additional 10% increase in the number of matches obtained. Conclusion A synonym-substitution methodology that utilizes WordNet is a useful automated aide in UMLS source integration. Experiments showed that there was a significant speed-up but no degradation in match results when the methodology's “maximum number of substitutions per term” parameter
ISSN:0933-3657
1873-2860
DOI:10.1016/j.artmed.2008.11.008