Loading…
A pre-trained BERT for Korean medical natural language processing
With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text...
Saved in:
Published in: | Scientific reports 2022-08, Vol.12 (1), p.13847-13847, Article 13847 |
---|---|
Main Authors: | , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793 |
---|---|
cites | cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793 |
container_end_page | 13847 |
container_issue | 1 |
container_start_page | 13847 |
container_title | Scientific reports |
container_volume | 12 |
creator | Kim, Yoojoong Kim, Jong-Ho Lee, Jeong Moon Jang, Moon Joung Yum, Yun Jin Kim, Seongtae Shin, Unsub Kim, Young-Min Joo, Hyung Joon Song, Sanghoun |
description | With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score. |
doi_str_mv | 10.1038/s41598-022-17806-8 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_84437343de9f49f194075f11074fce14</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_84437343de9f49f194075f11074fce14</doaj_id><sourcerecordid>2702715004</sourcerecordid><originalsourceid>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</originalsourceid><addsrcrecordid>eNp9kU1rFTEUhoMottT-ARcy4MbNaD4nyUa4lqrFgiB1HTLJyTiXuck1mSn47829U2vrwmxOSN7znI8XoZcEvyWYqXeFE6FViyltiVS4a9UTdEoxFy1llD59cD9B56VscT2Cak70c3TChJacEHaKNptmn6Gdsx0j-ObD5bebJqTcfEkZbGx24Ednpybaeck1TjYOix2gJiUHpYxxeIGeBTsVOL-LZ-j7x8ubi8_t9ddPVxeb69YJjueWca47q3vnPe6pksoTp4FxRTR3tX-iwfaWax86wbpeM5AUdzxg6jFQqdkZulq5Ptmt2edxZ_Mvk-xojg8pD8bmeXQTGMU5k4wzDzpwHWoFLEUgBEseHBBeWe9X1n7p64gOYl3A9Aj6-CeOP8yQbo1misgj4M0dIKefC5TZ7MbiYKr7gbQUQyVmnNTZRJW-_ke6TUuOdVUHFZVEYHwA0lXlciolQ7hvhmBzMNyshptquDkablRNevVwjPuUP_ZWAVsFpX7FAfLf2v_B_gbF-LNl</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2702715004</pqid></control><display><type>article</type><title>A pre-trained BERT for Korean medical natural language processing</title><source>Publicly Available Content Database</source><source>PubMed Central</source><source>Free Full-Text Journals in Chemistry</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</creator><creatorcontrib>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</creatorcontrib><description>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-022-17806-8</identifier><identifier>PMID: 35974113</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>639/166/985 ; 639/705/117 ; Deep learning ; Humanities and Social Sciences ; Language ; multidisciplinary ; Natural Language Processing ; Recognition, Psychology ; Republic of Korea ; Science ; Science (multidisciplinary) ; Semantics</subject><ispartof>Scientific reports, 2022-08, Vol.12 (1), p.13847-13847, Article 13847</ispartof><rights>The Author(s) 2022. corrected publication 2023</rights><rights>2022. The Author(s).</rights><rights>The Author(s) 2022. corrected publication 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</citedby><cites>FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2702715004/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2702715004?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25732,27903,27904,36991,36992,44569,53770,53772,74873</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35974113$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kim, Yoojoong</creatorcontrib><creatorcontrib>Kim, Jong-Ho</creatorcontrib><creatorcontrib>Lee, Jeong Moon</creatorcontrib><creatorcontrib>Jang, Moon Joung</creatorcontrib><creatorcontrib>Yum, Yun Jin</creatorcontrib><creatorcontrib>Kim, Seongtae</creatorcontrib><creatorcontrib>Shin, Unsub</creatorcontrib><creatorcontrib>Kim, Young-Min</creatorcontrib><creatorcontrib>Joo, Hyung Joon</creatorcontrib><creatorcontrib>Song, Sanghoun</creatorcontrib><title>A pre-trained BERT for Korean medical natural language processing</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</description><subject>639/166/985</subject><subject>639/705/117</subject><subject>Deep learning</subject><subject>Humanities and Social Sciences</subject><subject>Language</subject><subject>multidisciplinary</subject><subject>Natural Language Processing</subject><subject>Recognition, Psychology</subject><subject>Republic of Korea</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><subject>Semantics</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kU1rFTEUhoMottT-ARcy4MbNaD4nyUa4lqrFgiB1HTLJyTiXuck1mSn47829U2vrwmxOSN7znI8XoZcEvyWYqXeFE6FViyltiVS4a9UTdEoxFy1llD59cD9B56VscT2Cak70c3TChJacEHaKNptmn6Gdsx0j-ObD5bebJqTcfEkZbGx24Ednpybaeck1TjYOix2gJiUHpYxxeIGeBTsVOL-LZ-j7x8ubi8_t9ddPVxeb69YJjueWca47q3vnPe6pksoTp4FxRTR3tX-iwfaWax86wbpeM5AUdzxg6jFQqdkZulq5Ptmt2edxZ_Mvk-xojg8pD8bmeXQTGMU5k4wzDzpwHWoFLEUgBEseHBBeWe9X1n7p64gOYl3A9Aj6-CeOP8yQbo1misgj4M0dIKefC5TZ7MbiYKr7gbQUQyVmnNTZRJW-_ke6TUuOdVUHFZVEYHwA0lXlciolQ7hvhmBzMNyshptquDkablRNevVwjPuUP_ZWAVsFpX7FAfLf2v_B_gbF-LNl</recordid><startdate>20220816</startdate><enddate>20220816</enddate><creator>Kim, Yoojoong</creator><creator>Kim, Jong-Ho</creator><creator>Lee, Jeong Moon</creator><creator>Jang, Moon Joung</creator><creator>Yum, Yun Jin</creator><creator>Kim, Seongtae</creator><creator>Shin, Unsub</creator><creator>Kim, Young-Min</creator><creator>Joo, Hyung Joon</creator><creator>Song, Sanghoun</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20220816</creationdate><title>A pre-trained BERT for Korean medical natural language processing</title><author>Kim, Yoojoong ; Kim, Jong-Ho ; Lee, Jeong Moon ; Jang, Moon Joung ; Yum, Yun Jin ; Kim, Seongtae ; Shin, Unsub ; Kim, Young-Min ; Joo, Hyung Joon ; Song, Sanghoun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>639/166/985</topic><topic>639/705/117</topic><topic>Deep learning</topic><topic>Humanities and Social Sciences</topic><topic>Language</topic><topic>multidisciplinary</topic><topic>Natural Language Processing</topic><topic>Recognition, Psychology</topic><topic>Republic of Korea</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><topic>Semantics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kim, Yoojoong</creatorcontrib><creatorcontrib>Kim, Jong-Ho</creatorcontrib><creatorcontrib>Lee, Jeong Moon</creatorcontrib><creatorcontrib>Jang, Moon Joung</creatorcontrib><creatorcontrib>Yum, Yun Jin</creatorcontrib><creatorcontrib>Kim, Seongtae</creatorcontrib><creatorcontrib>Shin, Unsub</creatorcontrib><creatorcontrib>Kim, Young-Min</creatorcontrib><creatorcontrib>Joo, Hyung Joon</creatorcontrib><creatorcontrib>Song, Sanghoun</creatorcontrib><collection>SpringerOpen</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Health and Medical</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>ProQuest Biological Science Collection</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>ProQuest Science Journals</collection><collection>Biological Science Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Yoojoong</au><au>Kim, Jong-Ho</au><au>Lee, Jeong Moon</au><au>Jang, Moon Joung</au><au>Yum, Yun Jin</au><au>Kim, Seongtae</au><au>Shin, Unsub</au><au>Kim, Young-Min</au><au>Joo, Hyung Joon</au><au>Song, Sanghoun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A pre-trained BERT for Korean medical natural language processing</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2022-08-16</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>13847</spage><epage>13847</epage><pages>13847-13847</pages><artnum>13847</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>With advances in deep learning and natural language processing (NLP), the analysis of medical texts is becoming increasingly important. Nonetheless, despite the importance of processing medical texts, no research on Korean medical-specific language models has been conducted. The Korean medical text is highly difficult to analyze because of the agglutinative characteristics of the language, as well as the complex terminologies in the medical domain. To solve this problem, we collected a Korean medical corpus and used it to train the language models. In this paper, we present a Korean medical language model based on deep learning NLP. The model was trained using the pre-training framework of BERT for the medical context based on a state-of-the-art Korean language model. The pre-trained model showed increased accuracies of 0.147 and 0.148 for the masked language model with next sentence prediction. In the intrinsic evaluation, the next sentence prediction accuracy improved by 0.258, which is a remarkable enhancement. In addition, the extrinsic evaluation of Korean medical semantic textual similarity data showed a 0.046 increase in the Pearson correlation, and the evaluation for the Korean medical named entity recognition showed a 0.053 increase in the F1-score.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>35974113</pmid><doi>10.1038/s41598-022-17806-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2045-2322 |
ispartof | Scientific reports, 2022-08, Vol.12 (1), p.13847-13847, Article 13847 |
issn | 2045-2322 2045-2322 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_84437343de9f49f194075f11074fce14 |
source | Publicly Available Content Database; PubMed Central; Free Full-Text Journals in Chemistry; Springer Nature - nature.com Journals - Fully Open Access |
subjects | 639/166/985 639/705/117 Deep learning Humanities and Social Sciences Language multidisciplinary Natural Language Processing Recognition, Psychology Republic of Korea Science Science (multidisciplinary) Semantics |
title | A pre-trained BERT for Korean medical natural language processing |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T17%3A11%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20pre-trained%20BERT%20for%20Korean%20medical%20natural%20language%20processing&rft.jtitle=Scientific%20reports&rft.au=Kim,%20Yoojoong&rft.date=2022-08-16&rft.volume=12&rft.issue=1&rft.spage=13847&rft.epage=13847&rft.pages=13847-13847&rft.artnum=13847&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-022-17806-8&rft_dat=%3Cproquest_doaj_%3E2702715004%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c540t-34496a9bcdd0b2878d1c9e348194c80619eaba49df6536b93e72064f02d0e2793%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2702715004&rft_id=info:pmid/35974113&rfr_iscdi=true |