Loading…

Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks

For the automation of cotton harvesting operations, precise segmentation of cotton bolls is important. In the past, various handcrafted image processing-based algorithms and convolutional neural networks (CNNs) have been developed for this purpose. Handcrafted algorithms often only extract low-dimen...

Full description

Saved in:

Bibliographic Details
Published in:	Industrial crops and products 2025-01, Vol.223, p.120241, Article 120241
Main Authors:	Singh, Naseeb, Tewari, V.K., Biswas, P.K.
Format:	Article
Language:	English
Subjects:	Automated harvesting automation Cotton Deep learning neural networks Semantic segmentation vision Vision transformers
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c267t-7597db8f9241344e28578b752e6aa143bedc10b7127d13ac0a7b1976411245233
container_end_page
container_issue
container_start_page	120241
container_title	Industrial crops and products
container_volume	223
creator	Singh, Naseeb Tewari, V.K. Biswas, P.K.
description	For the automation of cotton harvesting operations, precise segmentation of cotton bolls is important. In the past, various handcrafted image processing-based algorithms and convolutional neural networks (CNNs) have been developed for this purpose. Handcrafted algorithms often only extract low-dimensional features, while CNNs have limitations to capture global features due to their small receptive fields. However, in recent times, Vision Transformers (ViTs) have proven to have the ability to capture long-range dependencies through the self-attention mechanism, thus resulting in superior segmentation accuracy. In this study, ViTs were utilized to segment cotton bolls, and the impact of various hyperparameters on their efficacy was investigated. Different ViT variants were developed using varying combinations of hyperparameters. Among all developed ViT variants, the model with a patch size of 16, hidden dimensions of 8, 6 no. of Multi-head Self attention (MHSA) heads, 12 transformer layers, and multilayer perceptron (MLP) dimension of 128 outperformed the others. This optimal configuration achieved precision, recall, mean Intersection over Union (m-IoU), and cotton-IoU values of 0.94, 0.94, 0.93, and 0.89, respectively. The findings show that increasing hidden dimensions and the number of attention heads increased model complexity but did not necessarily improve performance. The cotton-IoU score was found to be higher for the best-performing ViT model (cotton-IoU = 0.89) compared to the CNN model (cotton-IoU = 0.84). These results indicate that the ViT model outperforms the CNN model (having a comparable number of trainable parameters) for the segmentation of cotton bolls. Hence, ViTs can be effectively utilized for semantic segmentation tasks in agriculture with higher segmentation performance while requiring lower computational power. This makes ViTs a suitable technique for the automation of the cotton harvesting process on resource-constrained devices without compromising performance. Future work should include the use of pure transformer architectures, incorporating advanced techniques to further optimize performance and efficiency in various agricultural tasks. •Developed ViTs for cotton boll segmentation.•Effects of hyperparameters on ViT performance were analyzed.•Achieved 0.94 precision, 0.93 m-IoU, and 0.89 cotton-IoU with optimal ViT.•ViTs showed better performance with 0.89 cotton-IoU vs. CNN's 0.84.•Increasing hidden dimensions and attention hea
doi_str_mv	10.1016/j.indcrop.2024.120241
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3154269711</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0926669024022180</els_id><sourcerecordid>3154269711</sourcerecordid><originalsourceid>FETCH-LOGICAL-c267t-7597db8f9241344e28578b752e6aa143bedc10b7127d13ac0a7b1976411245233</originalsourceid><addsrcrecordid>eNqFkD9PwzAQxTOARCl8BKSMLCk-548bFoQqoEiVWIDVcpwLuCR2sN1WZeKj4zTdWfx0fu-ddL8ougIyAwLFzXqmdC2t6WeU0GwGwwsn0YSUtEiKoiRn0blza0KAEcom0e-7csro2FuhXWNsh9bFQWNpvA__lWnb2OFHh9oLH5K38XLfo-2FFR36IW16rzr1c3BjoetQ7YKtXBh3yn-GWW9Nuxl80cYaN_Ygfmfsl7uIThvROrw86jR6e3x4XSyT1cvT8-J-lUhaMJ-wvGR1NW_KcE6aZUjnOZtXLKdYCAFZWmEtgVQMKKshFZIIVkHJigyAZjlN02l0Pe7trfneoPO8U05i2wqNZuN4CnlGi5IBhGg-RgNH5yw2vLeqE3bPgfCBMl_zI2U-8OUj5dC7G3sY7tgqtNxJhVpirSxKz2uj_tnwByWEjgE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3154269711</pqid></control><display><type>article</type><title>Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks</title><source>Elsevier</source><creator>Singh, Naseeb ; Tewari, V.K. ; Biswas, P.K.</creator><creatorcontrib>Singh, Naseeb ; Tewari, V.K. ; Biswas, P.K.</creatorcontrib><description>For the automation of cotton harvesting operations, precise segmentation of cotton bolls is important. In the past, various handcrafted image processing-based algorithms and convolutional neural networks (CNNs) have been developed for this purpose. Handcrafted algorithms often only extract low-dimensional features, while CNNs have limitations to capture global features due to their small receptive fields. However, in recent times, Vision Transformers (ViTs) have proven to have the ability to capture long-range dependencies through the self-attention mechanism, thus resulting in superior segmentation accuracy. In this study, ViTs were utilized to segment cotton bolls, and the impact of various hyperparameters on their efficacy was investigated. Different ViT variants were developed using varying combinations of hyperparameters. Among all developed ViT variants, the model with a patch size of 16, hidden dimensions of 8, 6 no. of Multi-head Self attention (MHSA) heads, 12 transformer layers, and multilayer perceptron (MLP) dimension of 128 outperformed the others. This optimal configuration achieved precision, recall, mean Intersection over Union (m-IoU), and cotton-IoU values of 0.94, 0.94, 0.93, and 0.89, respectively. The findings show that increasing hidden dimensions and the number of attention heads increased model complexity but did not necessarily improve performance. The cotton-IoU score was found to be higher for the best-performing ViT model (cotton-IoU = 0.89) compared to the CNN model (cotton-IoU = 0.84). These results indicate that the ViT model outperforms the CNN model (having a comparable number of trainable parameters) for the segmentation of cotton bolls. Hence, ViTs can be effectively utilized for semantic segmentation tasks in agriculture with higher segmentation performance while requiring lower computational power. This makes ViTs a suitable technique for the automation of the cotton harvesting process on resource-constrained devices without compromising performance. Future work should include the use of pure transformer architectures, incorporating advanced techniques to further optimize performance and efficiency in various agricultural tasks. •Developed ViTs for cotton boll segmentation.•Effects of hyperparameters on ViT performance were analyzed.•Achieved 0.94 precision, 0.93 m-IoU, and 0.89 cotton-IoU with optimal ViT.•ViTs showed better performance with 0.89 cotton-IoU vs. CNN's 0.84.•Increasing hidden dimensions and attention heads not necessarily improve performance.</description><identifier>ISSN: 0926-6690</identifier><identifier>DOI: 10.1016/j.indcrop.2024.120241</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Automated harvesting ; automation ; Cotton ; Deep learning ; neural networks ; Semantic segmentation ; vision ; Vision transformers</subject><ispartof>Industrial crops and products, 2025-01, Vol.223, p.120241, Article 120241</ispartof><rights>2024 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c267t-7597db8f9241344e28578b752e6aa143bedc10b7127d13ac0a7b1976411245233</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Singh, Naseeb</creatorcontrib><creatorcontrib>Tewari, V.K.</creatorcontrib><creatorcontrib>Biswas, P.K.</creatorcontrib><title>Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks</title><title>Industrial crops and products</title><description>For the automation of cotton harvesting operations, precise segmentation of cotton bolls is important. In the past, various handcrafted image processing-based algorithms and convolutional neural networks (CNNs) have been developed for this purpose. Handcrafted algorithms often only extract low-dimensional features, while CNNs have limitations to capture global features due to their small receptive fields. However, in recent times, Vision Transformers (ViTs) have proven to have the ability to capture long-range dependencies through the self-attention mechanism, thus resulting in superior segmentation accuracy. In this study, ViTs were utilized to segment cotton bolls, and the impact of various hyperparameters on their efficacy was investigated. Different ViT variants were developed using varying combinations of hyperparameters. Among all developed ViT variants, the model with a patch size of 16, hidden dimensions of 8, 6 no. of Multi-head Self attention (MHSA) heads, 12 transformer layers, and multilayer perceptron (MLP) dimension of 128 outperformed the others. This optimal configuration achieved precision, recall, mean Intersection over Union (m-IoU), and cotton-IoU values of 0.94, 0.94, 0.93, and 0.89, respectively. The findings show that increasing hidden dimensions and the number of attention heads increased model complexity but did not necessarily improve performance. The cotton-IoU score was found to be higher for the best-performing ViT model (cotton-IoU = 0.89) compared to the CNN model (cotton-IoU = 0.84). These results indicate that the ViT model outperforms the CNN model (having a comparable number of trainable parameters) for the segmentation of cotton bolls. Hence, ViTs can be effectively utilized for semantic segmentation tasks in agriculture with higher segmentation performance while requiring lower computational power. This makes ViTs a suitable technique for the automation of the cotton harvesting process on resource-constrained devices without compromising performance. Future work should include the use of pure transformer architectures, incorporating advanced techniques to further optimize performance and efficiency in various agricultural tasks. •Developed ViTs for cotton boll segmentation.•Effects of hyperparameters on ViT performance were analyzed.•Achieved 0.94 precision, 0.93 m-IoU, and 0.89 cotton-IoU with optimal ViT.•ViTs showed better performance with 0.89 cotton-IoU vs. CNN's 0.84.•Increasing hidden dimensions and attention heads not necessarily improve performance.</description><subject>Automated harvesting</subject><subject>automation</subject><subject>Cotton</subject><subject>Deep learning</subject><subject>neural networks</subject><subject>Semantic segmentation</subject><subject>vision</subject><subject>Vision transformers</subject><issn>0926-6690</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNqFkD9PwzAQxTOARCl8BKSMLCk-548bFoQqoEiVWIDVcpwLuCR2sN1WZeKj4zTdWfx0fu-ddL8ougIyAwLFzXqmdC2t6WeU0GwGwwsn0YSUtEiKoiRn0blza0KAEcom0e-7csro2FuhXWNsh9bFQWNpvA__lWnb2OFHh9oLH5K38XLfo-2FFR36IW16rzr1c3BjoetQ7YKtXBh3yn-GWW9Nuxl80cYaN_Ygfmfsl7uIThvROrw86jR6e3x4XSyT1cvT8-J-lUhaMJ-wvGR1NW_KcE6aZUjnOZtXLKdYCAFZWmEtgVQMKKshFZIIVkHJigyAZjlN02l0Pe7trfneoPO8U05i2wqNZuN4CnlGi5IBhGg-RgNH5yw2vLeqE3bPgfCBMl_zI2U-8OUj5dC7G3sY7tgqtNxJhVpirSxKz2uj_tnwByWEjgE</recordid><startdate>20250101</startdate><enddate>20250101</enddate><creator>Singh, Naseeb</creator><creator>Tewari, V.K.</creator><creator>Biswas, P.K.</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7S9</scope><scope>L.6</scope></search><sort><creationdate>20250101</creationdate><title>Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks</title><author>Singh, Naseeb ; Tewari, V.K. ; Biswas, P.K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c267t-7597db8f9241344e28578b752e6aa143bedc10b7127d13ac0a7b1976411245233</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Automated harvesting</topic><topic>automation</topic><topic>Cotton</topic><topic>Deep learning</topic><topic>neural networks</topic><topic>Semantic segmentation</topic><topic>vision</topic><topic>Vision transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Singh, Naseeb</creatorcontrib><creatorcontrib>Tewari, V.K.</creatorcontrib><creatorcontrib>Biswas, P.K.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><jtitle>Industrial crops and products</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Singh, Naseeb</au><au>Tewari, V.K.</au><au>Biswas, P.K.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks</atitle><jtitle>Industrial crops and products</jtitle><date>2025-01-01</date><risdate>2025</risdate><volume>223</volume><spage>120241</spage><pages>120241-</pages><artnum>120241</artnum><issn>0926-6690</issn><abstract>For the automation of cotton harvesting operations, precise segmentation of cotton bolls is important. In the past, various handcrafted image processing-based algorithms and convolutional neural networks (CNNs) have been developed for this purpose. Handcrafted algorithms often only extract low-dimensional features, while CNNs have limitations to capture global features due to their small receptive fields. However, in recent times, Vision Transformers (ViTs) have proven to have the ability to capture long-range dependencies through the self-attention mechanism, thus resulting in superior segmentation accuracy. In this study, ViTs were utilized to segment cotton bolls, and the impact of various hyperparameters on their efficacy was investigated. Different ViT variants were developed using varying combinations of hyperparameters. Among all developed ViT variants, the model with a patch size of 16, hidden dimensions of 8, 6 no. of Multi-head Self attention (MHSA) heads, 12 transformer layers, and multilayer perceptron (MLP) dimension of 128 outperformed the others. This optimal configuration achieved precision, recall, mean Intersection over Union (m-IoU), and cotton-IoU values of 0.94, 0.94, 0.93, and 0.89, respectively. The findings show that increasing hidden dimensions and the number of attention heads increased model complexity but did not necessarily improve performance. The cotton-IoU score was found to be higher for the best-performing ViT model (cotton-IoU = 0.89) compared to the CNN model (cotton-IoU = 0.84). These results indicate that the ViT model outperforms the CNN model (having a comparable number of trainable parameters) for the segmentation of cotton bolls. Hence, ViTs can be effectively utilized for semantic segmentation tasks in agriculture with higher segmentation performance while requiring lower computational power. This makes ViTs a suitable technique for the automation of the cotton harvesting process on resource-constrained devices without compromising performance. Future work should include the use of pure transformer architectures, incorporating advanced techniques to further optimize performance and efficiency in various agricultural tasks. •Developed ViTs for cotton boll segmentation.•Effects of hyperparameters on ViT performance were analyzed.•Achieved 0.94 precision, 0.93 m-IoU, and 0.89 cotton-IoU with optimal ViT.•ViTs showed better performance with 0.89 cotton-IoU vs. CNN's 0.84.•Increasing hidden dimensions and attention heads not necessarily improve performance.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.indcrop.2024.120241</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0926-6690
ispartof	Industrial crops and products, 2025-01, Vol.223, p.120241, Article 120241
issn	0926-6690
language	eng
recordid	cdi_proquest_miscellaneous_3154269711
source	Elsevier
subjects	Automated harvesting automation Cotton Deep learning neural networks Semantic segmentation vision Vision transformers
title	Vision transformers for cotton boll segmentation: Hyperparameters optimization and comparison with convolutional neural networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T08%3A12%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Vision%20transformers%20for%20cotton%20boll%20segmentation:%20Hyperparameters%20optimization%20and%20comparison%20with%20convolutional%20neural%20networks&rft.jtitle=Industrial%20crops%20and%20products&rft.au=Singh,%20Naseeb&rft.date=2025-01-01&rft.volume=223&rft.spage=120241&rft.pages=120241-&rft.artnum=120241&rft.issn=0926-6690&rft_id=info:doi/10.1016/j.indcrop.2024.120241&rft_dat=%3Cproquest_cross%3E3154269711%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c267t-7597db8f9241344e28578b752e6aa143bedc10b7127d13ac0a7b1976411245233%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3154269711&rft_id=info:pmid/&rfr_iscdi=true