Loading…

Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal

In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple f...

Full description

Saved in:
Bibliographic Details
Published in:Ecological informatics 2024-09, Vol.82, p.102691, Article 102691
Main Authors: Kaukab, Shaghaf, Komal, Ghodki, Bhupendra M, Ray, Hena, Kalnar, Yogesh B., Narsaiah, Kairam, Brar, Jaskaran S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3
container_end_page
container_issue
container_start_page 102691
container_title Ecological informatics
container_volume 82
creator Kaukab, Shaghaf
Komal
Ghodki, Bhupendra M
Ray, Hena
Kalnar, Yogesh B.
Narsaiah, Kairam
Brar, Jaskaran S.
description In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information. •The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detect
doi_str_mv 10.1016/j.ecoinf.2024.102691
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3153756554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1574954124002334</els_id><sourcerecordid>3153756554</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3</originalsourceid><addsrcrecordid>eNp9kDtPxDAQhF2AxPH4BxQuaXLEju1LKJAQ4iUdooHacuz14SOJc7ZziH-PT6Gm2tXuzEjzIXRJyiUpibjeLkF7N9glLSnLJyoacoQWhK9Y0XBGTtBpjNuyZFVd0wXavfRj8Hs3bHAA1RXJ9YDVOHaAbZhcwgYS6OT8cINfpy65ovdGddiopLAaTP6P6RPbKWYJ_nZ5H_xQJBU22Whwq_TXJvgpKwP0fq-6c3RsVRfh4m-eoY_Hh_f752L99vRyf7cudFWxVCjWNkoTYSsuatNSQpTg0DDgDQXVClILMALqpjG2pZaVtSVkxW1NGKW1MNUZuppzc7_dBDHJ3kUNXacG8FOUFeHVigvOWZayWaqDjzGAlWNwvQo_kpTyQFVu5UxVHqjKmWq23c42yDX2DoKM2sGgwbiQmUnj3f8Bv_jPhe0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3153756554</pqid></control><display><type>article</type><title>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</title><source>Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)</source><creator>Kaukab, Shaghaf ; Komal ; Ghodki, Bhupendra M ; Ray, Hena ; Kalnar, Yogesh B. ; Narsaiah, Kairam ; Brar, Jaskaran S.</creator><creatorcontrib>Kaukab, Shaghaf ; Komal ; Ghodki, Bhupendra M ; Ray, Hena ; Kalnar, Yogesh B. ; Narsaiah, Kairam ; Brar, Jaskaran S.</creatorcontrib><description>In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information. •The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detection model, as NBR-DF-YOLOv5.•A pipeline works to generate point cloud filtration, segmentation and object extraction from depth images.•AP0.5 of NBR-DF-YOLOv5 is 0.964 as compared to 0.925 achieved with YOLOv5.</description><identifier>ISSN: 1574-9541</identifier><identifier>DOI: 10.1016/j.ecoinf.2024.102691</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>3D localization ; Apple ; apples ; automation ; data collection ; Depth sensor ; Fruit detection ; fruits ; orchards ; RGB-D images ; YOLO network</subject><ispartof>Ecological informatics, 2024-09, Vol.82, p.102691, Article 102691</ispartof><rights>2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail></links><search><creatorcontrib>Kaukab, Shaghaf</creatorcontrib><creatorcontrib>Komal</creatorcontrib><creatorcontrib>Ghodki, Bhupendra M</creatorcontrib><creatorcontrib>Ray, Hena</creatorcontrib><creatorcontrib>Kalnar, Yogesh B.</creatorcontrib><creatorcontrib>Narsaiah, Kairam</creatorcontrib><creatorcontrib>Brar, Jaskaran S.</creatorcontrib><title>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</title><title>Ecological informatics</title><description>In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information. •The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detection model, as NBR-DF-YOLOv5.•A pipeline works to generate point cloud filtration, segmentation and object extraction from depth images.•AP0.5 of NBR-DF-YOLOv5 is 0.964 as compared to 0.925 achieved with YOLOv5.</description><subject>3D localization</subject><subject>Apple</subject><subject>apples</subject><subject>automation</subject><subject>data collection</subject><subject>Depth sensor</subject><subject>Fruit detection</subject><subject>fruits</subject><subject>orchards</subject><subject>RGB-D images</subject><subject>YOLO network</subject><issn>1574-9541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDtPxDAQhF2AxPH4BxQuaXLEju1LKJAQ4iUdooHacuz14SOJc7ZziH-PT6Gm2tXuzEjzIXRJyiUpibjeLkF7N9glLSnLJyoacoQWhK9Y0XBGTtBpjNuyZFVd0wXavfRj8Hs3bHAA1RXJ9YDVOHaAbZhcwgYS6OT8cINfpy65ovdGddiopLAaTP6P6RPbKWYJ_nZ5H_xQJBU22Whwq_TXJvgpKwP0fq-6c3RsVRfh4m-eoY_Hh_f752L99vRyf7cudFWxVCjWNkoTYSsuatNSQpTg0DDgDQXVClILMALqpjG2pZaVtSVkxW1NGKW1MNUZuppzc7_dBDHJ3kUNXacG8FOUFeHVigvOWZayWaqDjzGAlWNwvQo_kpTyQFVu5UxVHqjKmWq23c42yDX2DoKM2sGgwbiQmUnj3f8Bv_jPhe0</recordid><startdate>202409</startdate><enddate>202409</enddate><creator>Kaukab, Shaghaf</creator><creator>Komal</creator><creator>Ghodki, Bhupendra M</creator><creator>Ray, Hena</creator><creator>Kalnar, Yogesh B.</creator><creator>Narsaiah, Kairam</creator><creator>Brar, Jaskaran S.</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7S9</scope><scope>L.6</scope></search><sort><creationdate>202409</creationdate><title>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</title><author>Kaukab, Shaghaf ; Komal ; Ghodki, Bhupendra M ; Ray, Hena ; Kalnar, Yogesh B. ; Narsaiah, Kairam ; Brar, Jaskaran S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D localization</topic><topic>Apple</topic><topic>apples</topic><topic>automation</topic><topic>data collection</topic><topic>Depth sensor</topic><topic>Fruit detection</topic><topic>fruits</topic><topic>orchards</topic><topic>RGB-D images</topic><topic>YOLO network</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaukab, Shaghaf</creatorcontrib><creatorcontrib>Komal</creatorcontrib><creatorcontrib>Ghodki, Bhupendra M</creatorcontrib><creatorcontrib>Ray, Hena</creatorcontrib><creatorcontrib>Kalnar, Yogesh B.</creatorcontrib><creatorcontrib>Narsaiah, Kairam</creatorcontrib><creatorcontrib>Brar, Jaskaran S.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><jtitle>Ecological informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kaukab, Shaghaf</au><au>Komal</au><au>Ghodki, Bhupendra M</au><au>Ray, Hena</au><au>Kalnar, Yogesh B.</au><au>Narsaiah, Kairam</au><au>Brar, Jaskaran S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</atitle><jtitle>Ecological informatics</jtitle><date>2024-09</date><risdate>2024</risdate><volume>82</volume><spage>102691</spage><pages>102691-</pages><artnum>102691</artnum><issn>1574-9541</issn><abstract>In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information. •The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detection model, as NBR-DF-YOLOv5.•A pipeline works to generate point cloud filtration, segmentation and object extraction from depth images.•AP0.5 of NBR-DF-YOLOv5 is 0.964 as compared to 0.925 achieved with YOLOv5.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.ecoinf.2024.102691</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1574-9541
ispartof Ecological informatics, 2024-09, Vol.82, p.102691, Article 102691
issn 1574-9541
language eng
recordid cdi_proquest_miscellaneous_3153756554
source Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)
subjects 3D localization
Apple
apples
automation
data collection
Depth sensor
Fruit detection
fruits
orchards
RGB-D images
YOLO network
title Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-06T13%3A31%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20real-time%20apple%20fruit%20detection:%20Multi-modal%20data%20and%20depth%20fusion%20with%20non-targeted%20background%20removal&rft.jtitle=Ecological%20informatics&rft.au=Kaukab,%20Shaghaf&rft.date=2024-09&rft.volume=82&rft.spage=102691&rft.pages=102691-&rft.artnum=102691&rft.issn=1574-9541&rft_id=info:doi/10.1016/j.ecoinf.2024.102691&rft_dat=%3Cproquest_cross%3E3153756554%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3153756554&rft_id=info:pmid/&rfr_iscdi=true