Loading…

Integration of multiple terminology bases: a multi-view alignment method using the hierarchical structure

Abstract Motivation In the medical field, multiple terminology bases coexist across different institutions and contexts, often resulting in the presence of redundant terms. The identification of overlapping terms among these bases holds significant potential for harmonizing multiple standards and es...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2023-11, Vol.39 (11)
Main Authors: Hu, Peihong, Ye, Qi, Zhang, Weiyan, Liu, Jingping, Ruan, Tong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Motivation In the medical field, multiple terminology bases coexist across different institutions and contexts, often resulting in the presence of redundant terms. The identification of overlapping terms among these bases holds significant potential for harmonizing multiple standards and establishing unified framework, which enhances user access to comprehensive and well-structured medical information. However, the majority of terminology bases exhibit differences not only in semantic aspects but also in the hierarchy of their classification systems. The conventional approaches that rely on neighborhood-based methods such as GCN may introduce errors due to the presence of different superordinate and subordinate terms. Therefore, it is imperative to explore novel methods to tackle this structural challenge. Results To address this heterogeneity issue, this paper proposes a multi-view alignment approach that incorporates the hierarchical structure of terminologies. We utilize BERT-based model to capture the recursive relationships among different levels of hierarchy and consider the interaction information of name, neighbors, and hierarchy between different terminologies. We test our method on mapping files of three medical open terminologies, and the experimental results demonstrate that our method outperforms baseline methods in terms of Hits@1 and Hits@10 metrics by 2%. Availability and implementation The source code will be available at https://github.com/Ulricab/Bert-Path upon publication.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btad689