Loading…

A pre-trained BERT for Korean medical natural language processing

With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text...

Full description

Saved in:

Bibliographic Details
Published in:	Scientific reports 2022-08, Vol.12 (1), p.13847-13847, Article 13847
Main Authors:	Kim, Yoojoong, Kim, Jong-Ho, Lee, Jeong Moon, Jang, Moon Joung, Yum, Yun Jin, Kim, Seongtae, Shin, Unsub, Kim, Young-Min, Joo, Hyung Joon, Song, Sanghoun
Format:	Article
Language:	English
Subjects:	639/166/985 639/705/117 Deep learning Humanities and Social Sciences Language multidisciplinary Natural Language Processing Recognition, Psychology Republic of Korea Science Science (multidisciplinary) Semantics
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793
cites	cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793
container_end_page	13847
container_issue	1
container_start_page	13847
container_title	Scientific reports
container_volume	12
creator	Kim, Yoojoong Kim, Jong-Ho Lee, Jeong Moon Jang, Moon Joung Yum, Yun Jin Kim, Seongtae Shin, Unsub Kim, Young-Min Joo, Hyung Joon Song, Sanghoun
description	With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.
doi_str_mv	10.1038/s41598-022-17806-8
format	article
fullrecord	<record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_84437343de9f49f194075f11074fce14</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_84437343de9f49f194075f11074fce14</doaj_id><sourcerecordid>2702715004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</originalsourceid><addsrcrecordid>eNp9kU1rFTEUhoMottT-ARcy4MbNaD4nyUa4lqrFgiB1HTLJyTiXuck1mSn47829U2vrwmxOSN7znI8XoZcEvyWYqXeFE6FViyltiVS4a9UTdEoxFy1llD59cD9B56VscT2Cak70c3TChJacEHaKNptmn6Gdsx0j-ObD5bebJqTcfEkZbGx24Ednpybaeck1TjYOix2gJiUHpYxxeIGeBTsVOL-LZ-j7x8ubi8_t9ddPVxeb69YJjueWca47q3vnPe6pksoTp4FxRTR3tX-iwfaWax86wbpeM5AUdzxg6jFQqdkZulq5Ptmt2edxZ_Mvk-xojg8pD8bmeXQTGMU5k4wzDzpwHWoFLEUgBEseHBBeWe9X1n7p64gOYl3A9Aj6-CeOP8yQbo1misgj4M0dIKefC5TZ7MbiYKr7gbQUQyVmnNTZRJW-_ke6TUuOdVUHFZVEYHwA0lXlciolQ7hvhmBzMNyshptquDkablRNevVwjPuUP_ZWAVsFpX7FAfLf2v_B_gbF-LNl</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2702715004</pqid></control><display><type>article</type><title>A pre-trained BERT for Korean medical natural language processing</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</creator><creatorcontrib>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</creatorcontrib><description>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-022-17806-8</identifier><identifier>PMID: 35974113</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>639/166/985 ; 639/705/117 ; Deep learning ; Humanities and Social Sciences ; Language ; multidisciplinary ; Natural Language Processing ; Recognition, Psychology ; Republic of Korea ; Science ; Science (multidisciplinary) ; Semantics</subject><ispartof>Scientific reports, 2022-08, Vol.12 (1), p.13847-13847, Article 13847</ispartof><rights>The Author(s) 2022. corrected publication 2023</rights><rights>2022. The Author(s).</rights><rights>The Author(s) 2022. corrected publication 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</citedby><cites>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2702715004/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2702715004?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25732,27903,27904,36991,36992,44569,53770,53772,74873</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35974113$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kim, Yoojoong</creatorcontrib><creatorcontrib>Kim, Jong-Ho</creatorcontrib><creatorcontrib>Lee, Jeong Moon</creatorcontrib><creatorcontrib>Jang, Moon Joung</creatorcontrib><creatorcontrib>Yum, Yun Jin</creatorcontrib><creatorcontrib>Kim, Seongtae</creatorcontrib><creatorcontrib>Shin, Unsub</creatorcontrib><creatorcontrib>Kim, Young-Min</creatorcontrib><creatorcontrib>Joo, Hyung Joon</creatorcontrib><creatorcontrib>Song, Sanghoun</creatorcontrib><title>A pre-trained BERT for Korean medical natural language processing</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</description><subject>639/166/985</subject><subject>639/705/117</subject><subject>Deep learning</subject><subject>Humanities and Social Sciences</subject><subject>Language</subject><subject>multidisciplinary</subject><subject>Natural Language Processing</subject><subject>Recognition, Psychology</subject><subject>Republic of Korea</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><subject>Semantics</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kU1rFTEUhoMottT-ARcy4MbNaD4nyUa4lqrFgiB1HTLJyTiXuck1mSn47829U2vrwmxOSN7znI8XoZcEvyWYqXeFE6FViyltiVS4a9UTdEoxFy1llD59cD9B56VscT2Cak70c3TChJacEHaKNptmn6Gdsx0j-ObD5bebJqTcfEkZbGx24Ednpybaeck1TjYOix2gJiUHpYxxeIGeBTsVOL-LZ-j7x8ubi8_t9ddPVxeb69YJjueWca47q3vnPe6pksoTp4FxRTR3tX-iwfaWax86wbpeM5AUdzxg6jFQqdkZulq5Ptmt2edxZ_Mvk-xojg8pD8bmeXQTGMU5k4wzDzpwHWoFLEUgBEseHBBeWe9X1n7p64gOYl3A9Aj6-CeOP8yQbo1misgj4M0dIKefC5TZ7MbiYKr7gbQUQyVmnNTZRJW-_ke6TUuOdVUHFZVEYHwA0lXlciolQ7hvhmBzMNyshptquDkablRNevVwjPuUP_ZWAVsFpX7FAfLf2v_B_gbF-LNl</recordid><startdate>20220816</startdate><enddate>20220816</enddate><creator>Kim, Yoojoong</creator><creator>Kim, Jong-Ho</creator><creator>Lee, Jeong Moon</creator><creator>Jang, Moon Joung</creator><creator>Yum, Yun Jin</creator><creator>Kim, Seongtae</creator><creator>Shin, Unsub</creator><creator>Kim, Young-Min</creator><creator>Joo, Hyung Joon</creator><creator>Song, Sanghoun</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20220816</creationdate><title>A pre-trained BERT for Korean medical natural language processing</title><author>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>639/166/985</topic><topic>639/705/117</topic><topic>Deep learning</topic><topic>Humanities and Social Sciences</topic><topic>Language</topic><topic>multidisciplinary</topic><topic>Natural Language Processing</topic><topic>Recognition, Psychology</topic><topic>Republic of Korea</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Yoojoong</creatorcontrib><creatorcontrib>Kim, Jong-Ho</creatorcontrib><creatorcontrib>Lee, Jeong Moon</creatorcontrib><creatorcontrib>Jang, Moon Joung</creatorcontrib><creatorcontrib>Yum, Yun Jin</creatorcontrib><creatorcontrib>Kim, Seongtae</creatorcontrib><creatorcontrib>Shin, Unsub</creatorcontrib><creatorcontrib>Kim, Young-Min</creatorcontrib><creatorcontrib>Joo, Hyung Joon</creatorcontrib><creatorcontrib>Song, Sanghoun</creatorcontrib><collection>SpringerOpen</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Health and Medical</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>ProQuest Science Journals</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Yoojoong</au><au>Kim, Jong-Ho</au><au>Lee, Jeong Moon</au><au>Jang, Moon Joung</au><au>Yum, Yun Jin</au><au>Kim, Seongtae</au><au>Shin, Unsub</au><au>Kim, Young-Min</au><au>Joo, Hyung Joon</au><au>Song, Sanghoun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A pre-trained BERT for Korean medical natural language processing</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2022-08-16</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>13847</spage><epage>13847</epage><pages>13847-13847</pages><artnum>13847</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>35974113</pmid><doi>10.1038/s41598-022-17806-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2045-2322
ispartof	Scientific reports, 2022-08, Vol.12 (1), p.13847-13847, Article 13847
issn	2045-2322 2045-2322
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_84437343de9f49f194075f11074fce14
source	Publicly Available Content Database; PubMed Central; Free Full-Text Journals in Chemistry; Springer Nature - nature.com Journals - Fully Open Access
subjects	639/166/985 639/705/117 Deep learning Humanities and Social Sciences Language multidisciplinary Natural Language Processing Recognition, Psychology Republic of Korea Science Science (multidisciplinary) Semantics
title	A pre-trained BERT for Korean medical natural language processing
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T17%3A11%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20pre-trained%20BERT%20for%20Korean%20medical%20natural%20language%20processing&rft.jtitle=Scientific%20reports&rft.au=Kim,%20Yoojoong&rft.date=2022-08-16&rft.volume=12&rft.issue=1&rft.spage=13847&rft.epage=13847&rft.pages=13847-13847&rft.artnum=13847&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-022-17806-8&rft_dat=%3Cproquest_doaj_%3E2702715004%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2702715004&rft_id=info:pmid/35974113&rfr_iscdi=true