Loading…

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-09
Main Authors:	Wang, Bin, Wu, Fan, Linke Ouyang, Gu, Zhuangcheng, Zhang, Rui, Xia, Renqiu, Zhang, Bo, He, Conghui
Format:	Article
Language:	English
Subjects:	Character recognition Formulas (mathematics) Latex Matching Representations
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Wang, Bin Wu, Fan Linke Ouyang Gu, Zhuangcheng Zhang, Rui Xia, Renqiu Zhang, Bo He, Conghui
description	Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly sensitive to the distribution of training data, thereby causing the unfairness in formula recognition evaluation. To this end, we propose a Character Detection Matching (CDM) metric, ensuring the evaluation objectivity by designing a image-level rather than LaTex-level metric score. Specifically, CDM renders both the model-predicted LaTeX and the ground-truth LaTeX formulas into image-formatted formulas, then employs visual feature extraction and localization techniques for precise character-level matching, incorporating spatial position information. Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching. Experimentally, we evaluated various formula recognition models using CDM, BLEU, and ExpRate metrics. Their results demonstrate that the CDM aligns more closely with human evaluation standards and provides a fairer comparison across different models by eliminating discrepancies caused by diverse formula representations.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3101392575</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3101392575</sourcerecordid><originalsourceid>FETCH-proquest_journals_31013925753</originalsourceid><addsrcrecordid>eNqNiksKwjAUAIMgWLR3eOC6kCbGqrtSWwRxI-7LM6aSEhvNx_NbwQO4moGZCUkY53m2WTE2I6n3PaWUrQsmBE_IsdqfdlDCWRmNV6PgpILTEjrroEHtAIcblFJGh0FBY90jGhxvae-DDtoOUL_RRPzqgkw7NF6lP87Jsqkv1SF7OvuKyoe2t9ENY2p5TnO-ZaIQ_L_rAyhHO74</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3101392575</pqid></control><display><type>article</type><title>CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation</title><source>Publicly Available Content Database</source><creator>Wang, Bin ; Wu, Fan ; Linke Ouyang ; Gu, Zhuangcheng ; Zhang, Rui ; Xia, Renqiu ; Zhang, Bo ; He, Conghui</creator><creatorcontrib>Wang, Bin ; Wu, Fan ; Linke Ouyang ; Gu, Zhuangcheng ; Zhang, Rui ; Xia, Renqiu ; Zhang, Bo ; He, Conghui</creatorcontrib><description>Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly sensitive to the distribution of training data, thereby causing the unfairness in formula recognition evaluation. To this end, we propose a Character Detection Matching (CDM) metric, ensuring the evaluation objectivity by designing a image-level rather than LaTex-level metric score. Specifically, CDM renders both the model-predicted LaTeX and the ground-truth LaTeX formulas into image-formatted formulas, then employs visual feature extraction and localization techniques for precise character-level matching, incorporating spatial position information. Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching. Experimentally, we evaluated various formula recognition models using CDM, BLEU, and ExpRate metrics. Their results demonstrate that the CDM aligns more closely with human evaluation standards and provides a fairer comparison across different models by eliminating discrepancies caused by diverse formula representations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Character recognition ; Formulas (mathematics) ; Latex ; Matching ; Representations</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3101392575?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>777,781,25734,36993,44571</link.rule.ids></links><search><creatorcontrib>Wang, Bin</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Linke Ouyang</creatorcontrib><creatorcontrib>Gu, Zhuangcheng</creatorcontrib><creatorcontrib>Zhang, Rui</creatorcontrib><creatorcontrib>Xia, Renqiu</creatorcontrib><creatorcontrib>Zhang, Bo</creatorcontrib><creatorcontrib>He, Conghui</creatorcontrib><title>CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation</title><title>arXiv.org</title><description>Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly sensitive to the distribution of training data, thereby causing the unfairness in formula recognition evaluation. To this end, we propose a Character Detection Matching (CDM) metric, ensuring the evaluation objectivity by designing a image-level rather than LaTex-level metric score. Specifically, CDM renders both the model-predicted LaTeX and the ground-truth LaTeX formulas into image-formatted formulas, then employs visual feature extraction and localization techniques for precise character-level matching, incorporating spatial position information. Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching. Experimentally, we evaluated various formula recognition models using CDM, BLEU, and ExpRate metrics. Their results demonstrate that the CDM aligns more closely with human evaluation standards and provides a fairer comparison across different models by eliminating discrepancies caused by diverse formula representations.</description><subject>Character recognition</subject><subject>Formulas (mathematics)</subject><subject>Latex</subject><subject>Matching</subject><subject>Representations</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNiksKwjAUAIMgWLR3eOC6kCbGqrtSWwRxI-7LM6aSEhvNx_NbwQO4moGZCUkY53m2WTE2I6n3PaWUrQsmBE_IsdqfdlDCWRmNV6PgpILTEjrroEHtAIcblFJGh0FBY90jGhxvae-DDtoOUL_RRPzqgkw7NF6lP87Jsqkv1SF7OvuKyoe2t9ENY2p5TnO-ZaIQ_L_rAyhHO74</recordid><startdate>20240905</startdate><enddate>20240905</enddate><creator>Wang, Bin</creator><creator>Wu, Fan</creator><creator>Linke Ouyang</creator><creator>Gu, Zhuangcheng</creator><creator>Zhang, Rui</creator><creator>Xia, Renqiu</creator><creator>Zhang, Bo</creator><creator>He, Conghui</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240905</creationdate><title>CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation</title><author>Wang, Bin ; Wu, Fan ; Linke Ouyang ; Gu, Zhuangcheng ; Zhang, Rui ; Xia, Renqiu ; Zhang, Bo ; He, Conghui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31013925753</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Character recognition</topic><topic>Formulas (mathematics)</topic><topic>Latex</topic><topic>Matching</topic><topic>Representations</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Bin</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Linke Ouyang</creatorcontrib><creatorcontrib>Gu, Zhuangcheng</creatorcontrib><creatorcontrib>Zhang, Rui</creatorcontrib><creatorcontrib>Xia, Renqiu</creatorcontrib><creatorcontrib>Zhang, Bo</creatorcontrib><creatorcontrib>He, Conghui</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Bin</au><au>Wu, Fan</au><au>Linke Ouyang</au><au>Gu, Zhuangcheng</au><au>Zhang, Rui</au><au>Xia, Renqiu</au><au>Zhang, Bo</au><au>He, Conghui</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation</atitle><jtitle>arXiv.org</jtitle><date>2024-09-05</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Formula recognition presents significant challenges due to the complicated structure and varied notation of mathematical expressions. Despite continuous advancements in formula recognition models, the evaluation metrics employed by these models, such as BLEU and Edit Distance, still exhibit notable limitations. They overlook the fact that the same formula has diverse representations and is highly sensitive to the distribution of training data, thereby causing the unfairness in formula recognition evaluation. To this end, we propose a Character Detection Matching (CDM) metric, ensuring the evaluation objectivity by designing a image-level rather than LaTex-level metric score. Specifically, CDM renders both the model-predicted LaTeX and the ground-truth LaTeX formulas into image-formatted formulas, then employs visual feature extraction and localization techniques for precise character-level matching, incorporating spatial position information. Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching. Experimentally, we evaluated various formula recognition models using CDM, BLEU, and ExpRate metrics. Their results demonstrate that the CDM aligns more closely with human evaluation standards and provides a fairer comparison across different models by eliminating discrepancies caused by diverse formula representations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-09
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3101392575
source	Publicly Available Content Database
subjects	Character recognition Formulas (mathematics) Latex Matching Representations
title	CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T17%3A14%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=CDM:%20A%20Reliable%20Metric%20for%20Fair%20and%20Accurate%20Formula%20Recognition%20Evaluation&rft.jtitle=arXiv.org&rft.au=Wang,%20Bin&rft.date=2024-09-05&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3101392575%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31013925753%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3101392575&rft_id=info:pmid/&rfr_iscdi=true