Loading…

A pre-trained BERT for Korean medical natural language processing

With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text...

Full description

Saved in:
Bibliographic Details
Published in:Scientific reports 2022-08, Vol.12 (1), p.13847-13847, Article 13847
Main Authors: Kim, Yoojoong, Kim, Jong-Ho, Lee, Jeong Moon, Jang, Moon Joung, Yum, Yun Jin, Kim, Seongtae, Shin, Unsub, Kim, Young-Min, Joo, Hyung Joon, Song, Sanghoun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793
cites cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793
container_end_page 13847
container_issue 1
container_start_page 13847
container_title Scientific reports
container_volume 12
creator Kim, Yoojoong
Kim, Jong-Ho
Lee, Jeong Moon
Jang, Moon Joung
Yum, Yun Jin
Kim, Seongtae
Shin, Unsub
Kim, Young-Min
Joo, Hyung Joon
Song, Sanghoun
description With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.
doi_str_mv 10.1038/s41598-022-17806-8
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_84437343de9f49f194075f11074fce14</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_84437343de9f49f194075f11074fce14</doaj_id><sourcerecordid>2702715004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</originalsourceid><addsrcrecordid>eNp9kU1rFTEUhoMottT-ARcy4MbNaD4nyUa4lqrFgiB1HTLJyTiXuck1mSn47829U2vrwmxOSN7znI8XoZcEvyWYqXeFE6FViyltiVS4a9UTdEoxFy1llD59cD9B56VscT2Cak70c3TChJacEHaKNptmn6Gdsx0j-ObD5bebJqTcfEkZbGx24Ednpybaeck1TjYOix2gJiUHpYxxeIGeBTsVOL-LZ-j7x8ubi8_t9ddPVxeb69YJjueWca47q3vnPe6pksoTp4FxRTR3tX-iwfaWax86wbpeM5AUdzxg6jFQqdkZulq5Ptmt2edxZ_Mvk-xojg8pD8bmeXQTGMU5k4wzDzpwHWoFLEUgBEseHBBeWe9X1n7p64gOYl3A9Aj6-CeOP8yQbo1misgj4M0dIKefC5TZ7MbiYKr7gbQUQyVmnNTZRJW-_ke6TUuOdVUHFZVEYHwA0lXlciolQ7hvhmBzMNyshptquDkablRNevVwjPuUP_ZWAVsFpX7FAfLf2v_B_gbF-LNl</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2702715004</pqid></control><display><type>article</type><title>A pre-trained BERT for Korean medical natural language processing</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</creator><creatorcontrib>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</creatorcontrib><description>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-022-17806-8</identifier><identifier>PMID: 35974113</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>639/166/985 ; 639/705/117 ; Deep learning ; Humanities and Social Sciences ; Language ; multidisciplinary ; Natural Language Processing ; Recognition, Psychology ; Republic of Korea ; Science ; Science (multidisciplinary) ; Semantics</subject><ispartof>Scientific reports, 2022-08, Vol.12 (1), p.13847-13847, Article 13847</ispartof><rights>The Author(s) 2022. corrected publication 2023</rights><rights>2022. The Author(s).</rights><rights>The Author(s) 2022. corrected publication 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</citedby><cites>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2702715004/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2702715004?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25732,27903,27904,36991,36992,44569,53770,53772,74873</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35974113$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kim, Yoojoong</creatorcontrib><creatorcontrib>Kim, Jong-Ho</creatorcontrib><creatorcontrib>Lee, Jeong Moon</creatorcontrib><creatorcontrib>Jang, Moon Joung</creatorcontrib><creatorcontrib>Yum, Yun Jin</creatorcontrib><creatorcontrib>Kim, Seongtae</creatorcontrib><creatorcontrib>Shin, Unsub</creatorcontrib><creatorcontrib>Kim, Young-Min</creatorcontrib><creatorcontrib>Joo, Hyung Joon</creatorcontrib><creatorcontrib>Song, Sanghoun</creatorcontrib><title>A pre-trained BERT for Korean medical natural language processing</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</description><subject>639/166/985</subject><subject>639/705/117</subject><subject>Deep learning</subject><subject>Humanities and Social Sciences</subject><subject>Language</subject><subject>multidisciplinary</subject><subject>Natural Language Processing</subject><subject>Recognition, Psychology</subject><subject>Republic of Korea</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><subject>Semantics</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kU1rFTEUhoMottT-ARcy4MbNaD4nyUa4lqrFgiB1HTLJyTiXuck1mSn47829U2vrwmxOSN7znI8XoZcEvyWYqXeFE6FViyltiVS4a9UTdEoxFy1llD59cD9B56VscT2Cak70c3TChJacEHaKNptmn6Gdsx0j-ObD5bebJqTcfEkZbGx24Ednpybaeck1TjYOix2gJiUHpYxxeIGeBTsVOL-LZ-j7x8ubi8_t9ddPVxeb69YJjueWca47q3vnPe6pksoTp4FxRTR3tX-iwfaWax86wbpeM5AUdzxg6jFQqdkZulq5Ptmt2edxZ_Mvk-xojg8pD8bmeXQTGMU5k4wzDzpwHWoFLEUgBEseHBBeWe9X1n7p64gOYl3A9Aj6-CeOP8yQbo1misgj4M0dIKefC5TZ7MbiYKr7gbQUQyVmnNTZRJW-_ke6TUuOdVUHFZVEYHwA0lXlciolQ7hvhmBzMNyshptquDkablRNevVwjPuUP_ZWAVsFpX7FAfLf2v_B_gbF-LNl</recordid><startdate>20220816</startdate><enddate>20220816</enddate><creator>Kim, Yoojoong</creator><creator>Kim, Jong-Ho</creator><creator>Lee, Jeong Moon</creator><creator>Jang, Moon Joung</creator><creator>Yum, Yun Jin</creator><creator>Kim, Seongtae</creator><creator>Shin, Unsub</creator><creator>Kim, Young-Min</creator><creator>Joo, Hyung Joon</creator><creator>Song, Sanghoun</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20220816</creationdate><title>A pre-trained BERT for Korean medical natural language processing</title><author>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>639/166/985</topic><topic>639/705/117</topic><topic>Deep learning</topic><topic>Humanities and Social Sciences</topic><topic>Language</topic><topic>multidisciplinary</topic><topic>Natural Language Processing</topic><topic>Recognition, Psychology</topic><topic>Republic of Korea</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Yoojoong</creatorcontrib><creatorcontrib>Kim, Jong-Ho</creatorcontrib><creatorcontrib>Lee, Jeong Moon</creatorcontrib><creatorcontrib>Jang, Moon Joung</creatorcontrib><creatorcontrib>Yum, Yun Jin</creatorcontrib><creatorcontrib>Kim, Seongtae</creatorcontrib><creatorcontrib>Shin, Unsub</creatorcontrib><creatorcontrib>Kim, Young-Min</creatorcontrib><creatorcontrib>Joo, Hyung Joon</creatorcontrib><creatorcontrib>Song, Sanghoun</creatorcontrib><collection>SpringerOpen</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Health and Medical</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>ProQuest Science Journals</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Yoojoong</au><au>Kim, Jong-Ho</au><au>Lee, Jeong Moon</au><au>Jang, Moon Joung</au><au>Yum, Yun Jin</au><au>Kim, Seongtae</au><au>Shin, Unsub</au><au>Kim, Young-Min</au><au>Joo, Hyung Joon</au><au>Song, Sanghoun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A pre-trained BERT for Korean medical natural language processing</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2022-08-16</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>13847</spage><epage>13847</epage><pages>13847-13847</pages><artnum>13847</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>35974113</pmid><doi>10.1038/s41598-022-17806-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2045-2322
ispartof Scientific reports, 2022-08, Vol.12 (1), p.13847-13847, Article 13847
issn 2045-2322
2045-2322
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_84437343de9f49f194075f11074fce14
source Publicly Available Content Database; PubMed Central; Free Full-Text Journals in Chemistry; Springer Nature - nature.com Journals - Fully Open Access
subjects 639/166/985
639/705/117
Deep learning
Humanities and Social Sciences
Language
multidisciplinary
Natural Language Processing
Recognition, Psychology
Republic of Korea
Science
Science (multidisciplinary)
Semantics
title A pre-trained BERT for Korean medical natural language processing
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T17%3A11%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20pre-trained%20BERT%20for%20Korean%20medical%20natural%20language%20processing&rft.jtitle=Scientific%20reports&rft.au=Kim,%20Yoojoong&rft.date=2022-08-16&rft.volume=12&rft.issue=1&rft.spage=13847&rft.epage=13847&rft.pages=13847-13847&rft.artnum=13847&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-022-17806-8&rft_dat=%3Cproquest_doaj_%3E2702715004%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2702715004&rft_id=info:pmid/35974113&rfr_iscdi=true