Loading…

X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) mig...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-10
Main Authors:	Zhao, Kun, Xiao, Chenghao, Tang, Chen, Yang, Bohao, Ye, Kai, Noura Al Moubayed, Zhan, Liang, Lin, Chenghua
Format:	Article
Language:	English
Subjects:	Datasets Learning Radiology Scaling laws Semantics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Zhao, Kun Xiao, Chenghao Tang, Chen Yang, Bohao Ye, Kai Noura Al Moubayed Zhan, Liang Lin, Chenghua
description	Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3072928917</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3072928917</sourcerecordid><originalsourceid>FETCH-proquest_journals_30729289173</originalsourceid><addsrcrecordid>eNqNikELgjAYQEcQJOV_-KBDJ2Fumdo1rA51yDx0kw9cNZmbbVr47wvqB3R6PN4bEY9xHgbJkrEJ8Z2rKaVsFbMo4h45XQKLAxyxEnCWTavEGnKspFHmNkAuWmM72AktLHbSaEBdQfZE1X_1Jbs7HHBoUC8cFMI2bkbGV1RO-D9OyXybFZt90Frz6IXrytr0Vn9SyWnMUpakYcz_u941JD6i</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3072928917</pqid></control><display><type>article</type><title>X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms</title><source>Publicly Available Content Database</source><creator>Zhao, Kun ; Xiao, Chenghao ; Tang, Chen ; Yang, Bohao ; Ye, Kai ; Noura Al Moubayed ; Zhan, Liang ; Lin, Chenghua</creator><creatorcontrib>Zhao, Kun ; Xiao, Chenghao ; Tang, Chen ; Yang, Bohao ; Ye, Kai ; Noura Al Moubayed ; Zhan, Liang ; Lin, Chenghua</creatorcontrib><description>Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Learning ; Radiology ; Scaling laws ; Semantics</subject><ispartof>arXiv.org, 2024-10</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3072928917?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25731,36989,44566</link.rule.ids></links><search><creatorcontrib>Zhao, Kun</creatorcontrib><creatorcontrib>Xiao, Chenghao</creatorcontrib><creatorcontrib>Tang, Chen</creatorcontrib><creatorcontrib>Yang, Bohao</creatorcontrib><creatorcontrib>Ye, Kai</creatorcontrib><creatorcontrib>Noura Al Moubayed</creatorcontrib><creatorcontrib>Zhan, Liang</creatorcontrib><creatorcontrib>Lin, Chenghua</creatorcontrib><title>X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms</title><title>arXiv.org</title><description>Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}.</description><subject>Datasets</subject><subject>Learning</subject><subject>Radiology</subject><subject>Scaling laws</subject><subject>Semantics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNikELgjAYQEcQJOV_-KBDJ2Fumdo1rA51yDx0kw9cNZmbbVr47wvqB3R6PN4bEY9xHgbJkrEJ8Z2rKaVsFbMo4h45XQKLAxyxEnCWTavEGnKspFHmNkAuWmM72AktLHbSaEBdQfZE1X_1Jbs7HHBoUC8cFMI2bkbGV1RO-D9OyXybFZt90Frz6IXrytr0Vn9SyWnMUpakYcz_u941JD6i</recordid><startdate>20241016</startdate><enddate>20241016</enddate><creator>Zhao, Kun</creator><creator>Xiao, Chenghao</creator><creator>Tang, Chen</creator><creator>Yang, Bohao</creator><creator>Ye, Kai</creator><creator>Noura Al Moubayed</creator><creator>Zhan, Liang</creator><creator>Lin, Chenghua</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241016</creationdate><title>X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms</title><author>Zhao, Kun ; Xiao, Chenghao ; Tang, Chen ; Yang, Bohao ; Ye, Kai ; Noura Al Moubayed ; Zhan, Liang ; Lin, Chenghua</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_30729289173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Datasets</topic><topic>Learning</topic><topic>Radiology</topic><topic>Scaling laws</topic><topic>Semantics</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Kun</creatorcontrib><creatorcontrib>Xiao, Chenghao</creatorcontrib><creatorcontrib>Tang, Chen</creatorcontrib><creatorcontrib>Yang, Bohao</creatorcontrib><creatorcontrib>Ye, Kai</creatorcontrib><creatorcontrib>Noura Al Moubayed</creatorcontrib><creatorcontrib>Zhan, Liang</creatorcontrib><creatorcontrib>Lin, Chenghua</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Kun</au><au>Xiao, Chenghao</au><au>Tang, Chen</au><au>Yang, Bohao</au><au>Ye, Kai</au><au>Noura Al Moubayed</au><au>Zhan, Liang</au><au>Lin, Chenghua</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms</atitle><jtitle>arXiv.org</jtitle><date>2024-10-16</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3072928917
source	Publicly Available Content Database
subjects	Datasets Learning Radiology Scaling laws Semantics
title	X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T21%3A08%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=X-ray%20Made%20Simple:%20Radiology%20Report%20Generation%20and%20Evaluation%20with%20Layman's%20Terms&rft.jtitle=arXiv.org&rft.au=Zhao,%20Kun&rft.date=2024-10-16&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3072928917%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_30729289173%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3072928917&rft_id=info:pmid/&rfr_iscdi=true