Loading…

Natural scene text localization and detection using MSER and its variants: a comprehensive survey

Text localization and detection within natural scene images have generated significant interest among researchers due to their inherent complexity and various real-life applications. In the last few decades, various methodologies have been developed for localization and detection of wild scene text...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2024-05, Vol.83 (18), p.55773-55810
Main Authors: Dutta, Kalpita, Sarkhel, Ritesh, Kundu, Mahantapas, Nasipuri, Mita, Das, Nibaran
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c270t-28287af7906431c2986ed9b41ba23a7d4c249f7e67fd3111831a87a12cd231e43
container_end_page 55810
container_issue 18
container_start_page 55773
container_title Multimedia tools and applications
container_volume 83
creator Dutta, Kalpita
Sarkhel, Ritesh
Kundu, Mahantapas
Nasipuri, Mita
Das, Nibaran
description Text localization and detection within natural scene images have generated significant interest among researchers due to their inherent complexity and various real-life applications. In the last few decades, various methodologies have been developed for localization and detection of wild scene text regions. Among them, Maximally Stable Extremal Regions (MSER) based techniques have achieved remarkable success in a significant variety of text localization tasks over the last decade. MSER is a well-known blob detection method, which has been applied with some modifications in many scene text-related researches. In this paper, we have reviewed and evaluated the concept of MSER methods which are combined with traditional machine learning-based methods using hand-crafted features or deep learning-based methods using automatic feature learning for scene text localization. Different MSER methods, such as standard MSER, MSER with stroke width transform, eMSER, enhanced MSER, multi-level MSER, MSER with CNN features, component splitting with MSER tree, MSER with CNN and CRF, CE-MSER have been described in this study. Finally, we have compared and evaluated the performances of those different types of MSER methods on five publicly available standard scene text datasets, like ICDAR 2003, ICDAR 2013, ICDAR 2015, KAIST, and SVT and provided the insights of appropriate selection of MSER method along with its pros and cons.
doi_str_mv 10.1007/s11042-023-17671-1
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3055255019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3055255019</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-28287af7906431c2986ed9b41ba23a7d4c249f7e67fd3111831a87a12cd231e43</originalsourceid><addsrcrecordid>eNp9kE1PwzAMhiMEEmPwBzhF4lyIk7RpuaFpfEgDJD7OUZa6o1OXjiSdGL-esiLBiZNt-X1s6SHkFNg5MKYuAgCTPGFcJKAyBQnskRGkSiRKcdj_0x-SoxCWjEGWcjki5sHEzpuGBosOacSPSJvWmqb-NLFuHTWupCVGtLupC7Vb0Pvn6dNuUcdAN8bXxsVwSQ217Wrt8Q1dqDdIQ-c3uD0mB5VpAp781DF5vZ6-TG6T2ePN3eRqlliuWEx4znNlKlWwTAqwvMgzLIu5hLnhwqhSWi6LSmGmqlIAQC7A9ABwW3IBKMWYnA1317597zBEvWw77_qXWrA05WnKoOhTfEhZ34bgsdJrX6-M32pg-lulHlTqXqXeqdTQQ2KAQh92C_S_p_-hvgCq_3Zp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3055255019</pqid></control><display><type>article</type><title>Natural scene text localization and detection using MSER and its variants: a comprehensive survey</title><source>Springer Link</source><creator>Dutta, Kalpita ; Sarkhel, Ritesh ; Kundu, Mahantapas ; Nasipuri, Mita ; Das, Nibaran</creator><creatorcontrib>Dutta, Kalpita ; Sarkhel, Ritesh ; Kundu, Mahantapas ; Nasipuri, Mita ; Das, Nibaran</creatorcontrib><description>Text localization and detection within natural scene images have generated significant interest among researchers due to their inherent complexity and various real-life applications. In the last few decades, various methodologies have been developed for localization and detection of wild scene text regions. Among them, Maximally Stable Extremal Regions (MSER) based techniques have achieved remarkable success in a significant variety of text localization tasks over the last decade. MSER is a well-known blob detection method, which has been applied with some modifications in many scene text-related researches. In this paper, we have reviewed and evaluated the concept of MSER methods which are combined with traditional machine learning-based methods using hand-crafted features or deep learning-based methods using automatic feature learning for scene text localization. Different MSER methods, such as standard MSER, MSER with stroke width transform, eMSER, enhanced MSER, multi-level MSER, MSER with CNN features, component splitting with MSER tree, MSER with CNN and CRF, CE-MSER have been described in this study. Finally, we have compared and evaluated the performances of those different types of MSER methods on five publicly available standard scene text datasets, like ICDAR 2003, ICDAR 2013, ICDAR 2015, KAIST, and SVT and provided the insights of appropriate selection of MSER method along with its pros and cons.</description><identifier>ISSN: 1573-7721</identifier><identifier>ISSN: 1380-7501</identifier><identifier>EISSN: 1573-7721</identifier><identifier>DOI: 10.1007/s11042-023-17671-1</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Computer Communication Networks ; Computer Science ; Computer vision ; Data Structures and Information Theory ; Deep learning ; Image analysis ; Localization ; Machine learning ; Multimedia Information Systems ; Special Purpose and Application-Based Systems ; Track 6: Computer Vision for Multimedia Applications</subject><ispartof>Multimedia tools and applications, 2024-05, Vol.83 (18), p.55773-55810</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-28287af7906431c2986ed9b41ba23a7d4c249f7e67fd3111831a87a12cd231e43</cites><orcidid>0000-0002-2426-9915</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail></links><search><creatorcontrib>Dutta, Kalpita</creatorcontrib><creatorcontrib>Sarkhel, Ritesh</creatorcontrib><creatorcontrib>Kundu, Mahantapas</creatorcontrib><creatorcontrib>Nasipuri, Mita</creatorcontrib><creatorcontrib>Das, Nibaran</creatorcontrib><title>Natural scene text localization and detection using MSER and its variants: a comprehensive survey</title><title>Multimedia tools and applications</title><addtitle>Multimed Tools Appl</addtitle><description>Text localization and detection within natural scene images have generated significant interest among researchers due to their inherent complexity and various real-life applications. In the last few decades, various methodologies have been developed for localization and detection of wild scene text regions. Among them, Maximally Stable Extremal Regions (MSER) based techniques have achieved remarkable success in a significant variety of text localization tasks over the last decade. MSER is a well-known blob detection method, which has been applied with some modifications in many scene text-related researches. In this paper, we have reviewed and evaluated the concept of MSER methods which are combined with traditional machine learning-based methods using hand-crafted features or deep learning-based methods using automatic feature learning for scene text localization. Different MSER methods, such as standard MSER, MSER with stroke width transform, eMSER, enhanced MSER, multi-level MSER, MSER with CNN features, component splitting with MSER tree, MSER with CNN and CRF, CE-MSER have been described in this study. Finally, we have compared and evaluated the performances of those different types of MSER methods on five publicly available standard scene text datasets, like ICDAR 2003, ICDAR 2013, ICDAR 2015, KAIST, and SVT and provided the insights of appropriate selection of MSER method along with its pros and cons.</description><subject>Computer Communication Networks</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Data Structures and Information Theory</subject><subject>Deep learning</subject><subject>Image analysis</subject><subject>Localization</subject><subject>Machine learning</subject><subject>Multimedia Information Systems</subject><subject>Special Purpose and Application-Based Systems</subject><subject>Track 6: Computer Vision for Multimedia Applications</subject><issn>1573-7721</issn><issn>1380-7501</issn><issn>1573-7721</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE1PwzAMhiMEEmPwBzhF4lyIk7RpuaFpfEgDJD7OUZa6o1OXjiSdGL-esiLBiZNt-X1s6SHkFNg5MKYuAgCTPGFcJKAyBQnskRGkSiRKcdj_0x-SoxCWjEGWcjki5sHEzpuGBosOacSPSJvWmqb-NLFuHTWupCVGtLupC7Vb0Pvn6dNuUcdAN8bXxsVwSQ217Wrt8Q1dqDdIQ-c3uD0mB5VpAp781DF5vZ6-TG6T2ePN3eRqlliuWEx4znNlKlWwTAqwvMgzLIu5hLnhwqhSWi6LSmGmqlIAQC7A9ABwW3IBKMWYnA1317597zBEvWw77_qXWrA05WnKoOhTfEhZ34bgsdJrX6-M32pg-lulHlTqXqXeqdTQQ2KAQh92C_S_p_-hvgCq_3Zp</recordid><startdate>202405</startdate><enddate>202405</enddate><creator>Dutta, Kalpita</creator><creator>Sarkhel, Ritesh</creator><creator>Kundu, Mahantapas</creator><creator>Nasipuri, Mita</creator><creator>Das, Nibaran</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-2426-9915</orcidid></search><sort><creationdate>202405</creationdate><title>Natural scene text localization and detection using MSER and its variants: a comprehensive survey</title><author>Dutta, Kalpita ; Sarkhel, Ritesh ; Kundu, Mahantapas ; Nasipuri, Mita ; Das, Nibaran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-28287af7906431c2986ed9b41ba23a7d4c249f7e67fd3111831a87a12cd231e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Communication Networks</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Data Structures and Information Theory</topic><topic>Deep learning</topic><topic>Image analysis</topic><topic>Localization</topic><topic>Machine learning</topic><topic>Multimedia Information Systems</topic><topic>Special Purpose and Application-Based Systems</topic><topic>Track 6: Computer Vision for Multimedia Applications</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dutta, Kalpita</creatorcontrib><creatorcontrib>Sarkhel, Ritesh</creatorcontrib><creatorcontrib>Kundu, Mahantapas</creatorcontrib><creatorcontrib>Nasipuri, Mita</creatorcontrib><creatorcontrib>Das, Nibaran</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Multimedia tools and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dutta, Kalpita</au><au>Sarkhel, Ritesh</au><au>Kundu, Mahantapas</au><au>Nasipuri, Mita</au><au>Das, Nibaran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Natural scene text localization and detection using MSER and its variants: a comprehensive survey</atitle><jtitle>Multimedia tools and applications</jtitle><stitle>Multimed Tools Appl</stitle><date>2024-05</date><risdate>2024</risdate><volume>83</volume><issue>18</issue><spage>55773</spage><epage>55810</epage><pages>55773-55810</pages><issn>1573-7721</issn><issn>1380-7501</issn><eissn>1573-7721</eissn><abstract>Text localization and detection within natural scene images have generated significant interest among researchers due to their inherent complexity and various real-life applications. In the last few decades, various methodologies have been developed for localization and detection of wild scene text regions. Among them, Maximally Stable Extremal Regions (MSER) based techniques have achieved remarkable success in a significant variety of text localization tasks over the last decade. MSER is a well-known blob detection method, which has been applied with some modifications in many scene text-related researches. In this paper, we have reviewed and evaluated the concept of MSER methods which are combined with traditional machine learning-based methods using hand-crafted features or deep learning-based methods using automatic feature learning for scene text localization. Different MSER methods, such as standard MSER, MSER with stroke width transform, eMSER, enhanced MSER, multi-level MSER, MSER with CNN features, component splitting with MSER tree, MSER with CNN and CRF, CE-MSER have been described in this study. Finally, we have compared and evaluated the performances of those different types of MSER methods on five publicly available standard scene text datasets, like ICDAR 2003, ICDAR 2013, ICDAR 2015, KAIST, and SVT and provided the insights of appropriate selection of MSER method along with its pros and cons.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11042-023-17671-1</doi><tpages>38</tpages><orcidid>https://orcid.org/0000-0002-2426-9915</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1573-7721
ispartof Multimedia tools and applications, 2024-05, Vol.83 (18), p.55773-55810
issn 1573-7721
1380-7501
1573-7721
language eng
recordid cdi_proquest_journals_3055255019
source Springer Link
subjects Computer Communication Networks
Computer Science
Computer vision
Data Structures and Information Theory
Deep learning
Image analysis
Localization
Machine learning
Multimedia Information Systems
Special Purpose and Application-Based Systems
Track 6: Computer Vision for Multimedia Applications
title Natural scene text localization and detection using MSER and its variants: a comprehensive survey
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-06T03%3A43%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Natural%20scene%20text%20localization%20and%20detection%20using%20MSER%20and%20its%20variants:%20a%20comprehensive%20survey&rft.jtitle=Multimedia%20tools%20and%20applications&rft.au=Dutta,%20Kalpita&rft.date=2024-05&rft.volume=83&rft.issue=18&rft.spage=55773&rft.epage=55810&rft.pages=55773-55810&rft.issn=1573-7721&rft.eissn=1573-7721&rft_id=info:doi/10.1007/s11042-023-17671-1&rft_dat=%3Cproquest_cross%3E3055255019%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c270t-28287af7906431c2986ed9b41ba23a7d4c249f7e67fd3111831a87a12cd231e43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3055255019&rft_id=info:pmid/&rfr_iscdi=true