Loading…

Czech medical coding assistant based on transformer networks

The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, th...

Full description

Saved in:
Bibliographic Details
Published in:Computers in biology and medicine 2024-08, Vol.178, p.108672, Article 108672
Main Authors: Lenc, Ladislav, Martínek, Jiří, Baloun, Josef, Přibáň, Pavel, Prantl, Martin, Taylor, Stephen Eugene, Král, Pavel, Kyliš, Jiří
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c262t-8a8c1f3ff396e0889f54aa8638e849c1ad3d6d5a8641dd8519b8f6b52bbc3d123
container_end_page
container_issue
container_start_page 108672
container_title Computers in biology and medicine
container_volume 178
creator Lenc, Ladislav
Martínek, Jiří
Baloun, Josef
Přibáň, Pavel
Prantl, Martin
Taylor, Stephen Eugene
Král, Pavel
Kyliš, Jiří
description The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, this process is error-prone and expensive. The coder needs to be properly trained and spends significant effort on each report, leading to occasional mistakes. The main goal of this paper is to propose and implement a system that serves as an assistant to the coder and automatically predicts diagnosis codes. These predictions are then presented to the coder for approval or correction, aiming to enhance efficiency and accuracy. We consider two classification tasks: main (principal) diagnosis; and all diagnoses. Crucial requirements for the implementation include minimal memory consumption, generality, ease of portability, and sustainability. The main contribution lies in the proposal and evaluation of ICD classification models for the Czech language with relatively few training parameters, allowing swift utilisation on the prevalent computer systems within Czech hospitals and enabling easy retraining or fine-tuning with newly available data. First, we introduce a small transformer-based model for each task followed by the design of a transformer-based “Four-headed” model incorporating four distinct classification heads. This model achieves comparable, sometimes even better results, against four individual models. Moreover this novel model significantly economises memory usage and learning time. We also show that our models achieve comparable results against state-of-the-art English models on the Mimic IV dataset even though our models are significantly smaller. •Design and implementation of an Automatic Medical Coding Assistant for prediction of diagnosis codes (ICD) in the Czech language.•Proposal and evaluation of several transformer-based models for ICD coding with a relatively few training parameters.•Obtaining new state-of-the-art results in ICD coding task on Czech data.
doi_str_mv 10.1016/j.compbiomed.2024.108672
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3068755387</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0010482524007571</els_id><sourcerecordid>3068755387</sourcerecordid><originalsourceid>FETCH-LOGICAL-c262t-8a8c1f3ff396e0889f54aa8638e849c1ad3d6d5a8641dd8519b8f6b52bbc3d123</originalsourceid><addsrcrecordid>eNqFkM1LwzAUwIMobk7_BSl48dL50rTpK3jR4RcMvOg5pEmqmWszk1bRv96MbQhePD147_e-foQkFKYUKL9YTJVrV7V1rdHTDLI8ppGX2R4ZUyyrFAqW75MxAIU0x6wYkaMQFgCQA4NDMmKIZVEBH5PL2bdRr0mcY5VcJspp270kMgQbetn1SS2D0Ynrkt7LLjTOt8Ynnek_nX8Lx-SgkctgTrZxQp5vb55m9-n88e5hdjVPVcazPkWJijasaVjFDSBWTZFLiZyhwbxSVGqmuS5iJqdaY0GrGhteF1ldK6ZpxibkfDN35d37YEIvWhuUWS5lZ9wQBAMe_ykYlhE9-4Mu3OC7eF2kkJaQ0yhnQnBDKe9C8KYRK29b6b8EBbE2LBbi17BYGxYbw7H1dLtgqNe1XeNOaQSuN4CJRj6s8SIoazoVDXujeqGd_X_LD2o_kM0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3081704105</pqid></control><display><type>article</type><title>Czech medical coding assistant based on transformer networks</title><source>ScienceDirect Freedom Collection</source><creator>Lenc, Ladislav ; Martínek, Jiří ; Baloun, Josef ; Přibáň, Pavel ; Prantl, Martin ; Taylor, Stephen Eugene ; Král, Pavel ; Kyliš, Jiří</creator><creatorcontrib>Lenc, Ladislav ; Martínek, Jiří ; Baloun, Josef ; Přibáň, Pavel ; Prantl, Martin ; Taylor, Stephen Eugene ; Král, Pavel ; Kyliš, Jiří</creatorcontrib><description>The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, this process is error-prone and expensive. The coder needs to be properly trained and spends significant effort on each report, leading to occasional mistakes. The main goal of this paper is to propose and implement a system that serves as an assistant to the coder and automatically predicts diagnosis codes. These predictions are then presented to the coder for approval or correction, aiming to enhance efficiency and accuracy. We consider two classification tasks: main (principal) diagnosis; and all diagnoses. Crucial requirements for the implementation include minimal memory consumption, generality, ease of portability, and sustainability. The main contribution lies in the proposal and evaluation of ICD classification models for the Czech language with relatively few training parameters, allowing swift utilisation on the prevalent computer systems within Czech hospitals and enabling easy retraining or fine-tuning with newly available data. First, we introduce a small transformer-based model for each task followed by the design of a transformer-based “Four-headed” model incorporating four distinct classification heads. This model achieves comparable, sometimes even better results, against four individual models. Moreover this novel model significantly economises memory usage and learning time. We also show that our models achieve comparable results against state-of-the-art English models on the Mimic IV dataset even though our models are significantly smaller. •Design and implementation of an Automatic Medical Coding Assistant for prediction of diagnosis codes (ICD) in the Czech language.•Proposal and evaluation of several transformer-based models for ICD coding with a relatively few training parameters.•Obtaining new state-of-the-art results in ICD coding task on Czech data.</description><identifier>ISSN: 0010-4825</identifier><identifier>ISSN: 1879-0534</identifier><identifier>EISSN: 1879-0534</identifier><identifier>DOI: 10.1016/j.compbiomed.2024.108672</identifier><identifier>PMID: 38875906</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>Classification ; Clinical Coding ; Coders ; Coding ; Czech Republic ; Diagnosis ; Diagnosis coding ; Electronic Health Records ; Error correction ; Human factors ; Humans ; ICD ; International Classification of Diseases ; Medical ; Medical coding ; Taxonomy ; Text classification ; Transformers ; Unstructured data</subject><ispartof>Computers in biology and medicine, 2024-08, Vol.178, p.108672, Article 108672</ispartof><rights>2024 Elsevier Ltd</rights><rights>Copyright © 2024 Elsevier Ltd. All rights reserved.</rights><rights>2024. Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c262t-8a8c1f3ff396e0889f54aa8638e849c1ad3d6d5a8641dd8519b8f6b52bbc3d123</cites><orcidid>0000-0002-1066-7269 ; 0000-0003-2981-1723 ; 0000-0002-8744-8726</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27915,27916</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38875906$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lenc, Ladislav</creatorcontrib><creatorcontrib>Martínek, Jiří</creatorcontrib><creatorcontrib>Baloun, Josef</creatorcontrib><creatorcontrib>Přibáň, Pavel</creatorcontrib><creatorcontrib>Prantl, Martin</creatorcontrib><creatorcontrib>Taylor, Stephen Eugene</creatorcontrib><creatorcontrib>Král, Pavel</creatorcontrib><creatorcontrib>Kyliš, Jiří</creatorcontrib><title>Czech medical coding assistant based on transformer networks</title><title>Computers in biology and medicine</title><addtitle>Comput Biol Med</addtitle><description>The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, this process is error-prone and expensive. The coder needs to be properly trained and spends significant effort on each report, leading to occasional mistakes. The main goal of this paper is to propose and implement a system that serves as an assistant to the coder and automatically predicts diagnosis codes. These predictions are then presented to the coder for approval or correction, aiming to enhance efficiency and accuracy. We consider two classification tasks: main (principal) diagnosis; and all diagnoses. Crucial requirements for the implementation include minimal memory consumption, generality, ease of portability, and sustainability. The main contribution lies in the proposal and evaluation of ICD classification models for the Czech language with relatively few training parameters, allowing swift utilisation on the prevalent computer systems within Czech hospitals and enabling easy retraining or fine-tuning with newly available data. First, we introduce a small transformer-based model for each task followed by the design of a transformer-based “Four-headed” model incorporating four distinct classification heads. This model achieves comparable, sometimes even better results, against four individual models. Moreover this novel model significantly economises memory usage and learning time. We also show that our models achieve comparable results against state-of-the-art English models on the Mimic IV dataset even though our models are significantly smaller. •Design and implementation of an Automatic Medical Coding Assistant for prediction of diagnosis codes (ICD) in the Czech language.•Proposal and evaluation of several transformer-based models for ICD coding with a relatively few training parameters.•Obtaining new state-of-the-art results in ICD coding task on Czech data.</description><subject>Classification</subject><subject>Clinical Coding</subject><subject>Coders</subject><subject>Coding</subject><subject>Czech Republic</subject><subject>Diagnosis</subject><subject>Diagnosis coding</subject><subject>Electronic Health Records</subject><subject>Error correction</subject><subject>Human factors</subject><subject>Humans</subject><subject>ICD</subject><subject>International Classification of Diseases</subject><subject>Medical</subject><subject>Medical coding</subject><subject>Taxonomy</subject><subject>Text classification</subject><subject>Transformers</subject><subject>Unstructured data</subject><issn>0010-4825</issn><issn>1879-0534</issn><issn>1879-0534</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNqFkM1LwzAUwIMobk7_BSl48dL50rTpK3jR4RcMvOg5pEmqmWszk1bRv96MbQhePD147_e-foQkFKYUKL9YTJVrV7V1rdHTDLI8ppGX2R4ZUyyrFAqW75MxAIU0x6wYkaMQFgCQA4NDMmKIZVEBH5PL2bdRr0mcY5VcJspp270kMgQbetn1SS2D0Ynrkt7LLjTOt8Ynnek_nX8Lx-SgkctgTrZxQp5vb55m9-n88e5hdjVPVcazPkWJijasaVjFDSBWTZFLiZyhwbxSVGqmuS5iJqdaY0GrGhteF1ldK6ZpxibkfDN35d37YEIvWhuUWS5lZ9wQBAMe_ykYlhE9-4Mu3OC7eF2kkJaQ0yhnQnBDKe9C8KYRK29b6b8EBbE2LBbi17BYGxYbw7H1dLtgqNe1XeNOaQSuN4CJRj6s8SIoazoVDXujeqGd_X_LD2o_kM0</recordid><startdate>202408</startdate><enddate>202408</enddate><creator>Lenc, Ladislav</creator><creator>Martínek, Jiří</creator><creator>Baloun, Josef</creator><creator>Přibáň, Pavel</creator><creator>Prantl, Martin</creator><creator>Taylor, Stephen Eugene</creator><creator>Král, Pavel</creator><creator>Kyliš, Jiří</creator><general>Elsevier Ltd</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>M7Z</scope><scope>NAPCQ</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1066-7269</orcidid><orcidid>https://orcid.org/0000-0003-2981-1723</orcidid><orcidid>https://orcid.org/0000-0002-8744-8726</orcidid></search><sort><creationdate>202408</creationdate><title>Czech medical coding assistant based on transformer networks</title><author>Lenc, Ladislav ; Martínek, Jiří ; Baloun, Josef ; Přibáň, Pavel ; Prantl, Martin ; Taylor, Stephen Eugene ; Král, Pavel ; Kyliš, Jiří</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c262t-8a8c1f3ff396e0889f54aa8638e849c1ad3d6d5a8641dd8519b8f6b52bbc3d123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Classification</topic><topic>Clinical Coding</topic><topic>Coders</topic><topic>Coding</topic><topic>Czech Republic</topic><topic>Diagnosis</topic><topic>Diagnosis coding</topic><topic>Electronic Health Records</topic><topic>Error correction</topic><topic>Human factors</topic><topic>Humans</topic><topic>ICD</topic><topic>International Classification of Diseases</topic><topic>Medical</topic><topic>Medical coding</topic><topic>Taxonomy</topic><topic>Text classification</topic><topic>Transformers</topic><topic>Unstructured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lenc, Ladislav</creatorcontrib><creatorcontrib>Martínek, Jiří</creatorcontrib><creatorcontrib>Baloun, Josef</creatorcontrib><creatorcontrib>Přibáň, Pavel</creatorcontrib><creatorcontrib>Prantl, Martin</creatorcontrib><creatorcontrib>Taylor, Stephen Eugene</creatorcontrib><creatorcontrib>Král, Pavel</creatorcontrib><creatorcontrib>Kyliš, Jiří</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biochemistry Abstracts 1</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Computers in biology and medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lenc, Ladislav</au><au>Martínek, Jiří</au><au>Baloun, Josef</au><au>Přibáň, Pavel</au><au>Prantl, Martin</au><au>Taylor, Stephen Eugene</au><au>Král, Pavel</au><au>Kyliš, Jiří</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Czech medical coding assistant based on transformer networks</atitle><jtitle>Computers in biology and medicine</jtitle><addtitle>Comput Biol Med</addtitle><date>2024-08</date><risdate>2024</risdate><volume>178</volume><spage>108672</spage><pages>108672-</pages><artnum>108672</artnum><issn>0010-4825</issn><issn>1879-0534</issn><eissn>1879-0534</eissn><abstract>The International Classification of Diseases (ICD) hierarchical taxonomy is used for so-called clinical coding of medical reports, typically presented in unstructured text. In the Czech Republic, it is currently carried out manually by a so-called clinical coder. However, due to the human factor, this process is error-prone and expensive. The coder needs to be properly trained and spends significant effort on each report, leading to occasional mistakes. The main goal of this paper is to propose and implement a system that serves as an assistant to the coder and automatically predicts diagnosis codes. These predictions are then presented to the coder for approval or correction, aiming to enhance efficiency and accuracy. We consider two classification tasks: main (principal) diagnosis; and all diagnoses. Crucial requirements for the implementation include minimal memory consumption, generality, ease of portability, and sustainability. The main contribution lies in the proposal and evaluation of ICD classification models for the Czech language with relatively few training parameters, allowing swift utilisation on the prevalent computer systems within Czech hospitals and enabling easy retraining or fine-tuning with newly available data. First, we introduce a small transformer-based model for each task followed by the design of a transformer-based “Four-headed” model incorporating four distinct classification heads. This model achieves comparable, sometimes even better results, against four individual models. Moreover this novel model significantly economises memory usage and learning time. We also show that our models achieve comparable results against state-of-the-art English models on the Mimic IV dataset even though our models are significantly smaller. •Design and implementation of an Automatic Medical Coding Assistant for prediction of diagnosis codes (ICD) in the Czech language.•Proposal and evaluation of several transformer-based models for ICD coding with a relatively few training parameters.•Obtaining new state-of-the-art results in ICD coding task on Czech data.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>38875906</pmid><doi>10.1016/j.compbiomed.2024.108672</doi><orcidid>https://orcid.org/0000-0002-1066-7269</orcidid><orcidid>https://orcid.org/0000-0003-2981-1723</orcidid><orcidid>https://orcid.org/0000-0002-8744-8726</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0010-4825
ispartof Computers in biology and medicine, 2024-08, Vol.178, p.108672, Article 108672
issn 0010-4825
1879-0534
1879-0534
language eng
recordid cdi_proquest_miscellaneous_3068755387
source ScienceDirect Freedom Collection
subjects Classification
Clinical Coding
Coders
Coding
Czech Republic
Diagnosis
Diagnosis coding
Electronic Health Records
Error correction
Human factors
Humans
ICD
International Classification of Diseases
Medical
Medical coding
Taxonomy
Text classification
Transformers
Unstructured data
title Czech medical coding assistant based on transformer networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T23%3A48%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Czech%20medical%20coding%20assistant%20based%20on%20transformer%20networks&rft.jtitle=Computers%20in%20biology%20and%20medicine&rft.au=Lenc,%20Ladislav&rft.date=2024-08&rft.volume=178&rft.spage=108672&rft.pages=108672-&rft.artnum=108672&rft.issn=0010-4825&rft.eissn=1879-0534&rft_id=info:doi/10.1016/j.compbiomed.2024.108672&rft_dat=%3Cproquest_cross%3E3068755387%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c262t-8a8c1f3ff396e0889f54aa8638e849c1ad3d6d5a8641dd8519b8f6b52bbc3d123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3081704105&rft_id=info:pmid/38875906&rfr_iscdi=true