Loading…

Unsupervised Semantic Mapping for Healthcare Data Storage Schema

Data, information, and knowledge processing systems, in the domain of healthcare, are currently plagued by heterogeneity at various levels. Current solutions have focused on developing a standard-based, manual intervention mechanism, which requires a large number of human resources and necessitates...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2021, Vol.9, p.107267-107278
Main Authors: Satti, Fahad Ahmed, Hussain, Musarrat, Hussain, Jamil, Ali, Syed Imran, Ali, Taqdir, Bilal, Hafiz Syed Muhammad, Chung, Taechoong, Lee, Sungyoung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Data, information, and knowledge processing systems, in the domain of healthcare, are currently plagued by heterogeneity at various levels. Current solutions have focused on developing a standard-based, manual intervention mechanism, which requires a large number of human resources and necessitates the realignment of existing systems. State-of-the-art methodologies in the field of natural language processing and machine learning can help to partially automate this process, reducing the resource requirements and providing a relatively good multi-class-based classification algorithm. We present a novel methodology for bridging the gap between various healthcare data management solutions by leveraging the strength of transformer-based machine learning models, to create mappings between the data elements. Additionally, the annotated data, collected against five medical schemas and labeled by four annotators is made available for helping future researchers. Our results indicate, that for biased, dependent multi-class text classification, transformer-based models provide better results than linguistic and other classical models. In particular, the Robustly Optimized BERT Pretraining Approach (RoBERTa) provides the best schema matching performance by achieving a Cohen's kappa score of 0.47 and Matthews Correlation Coefficient (MCC) score of 0.48, with human-annotated data.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3100686