Loading…
Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal
In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple f...
Saved in:
Published in: | Ecological informatics 2024-09, Vol.82, p.102691, Article 102691 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3 |
container_end_page | |
container_issue | |
container_start_page | 102691 |
container_title | Ecological informatics |
container_volume | 82 |
creator | Kaukab, Shaghaf Komal Ghodki, Bhupendra M Ray, Hena Kalnar, Yogesh B. Narsaiah, Kairam Brar, Jaskaran S. |
description | In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information.
•The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detect |
doi_str_mv | 10.1016/j.ecoinf.2024.102691 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_3153756554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1574954124002334</els_id><sourcerecordid>3153756554</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3</originalsourceid><addsrcrecordid>eNp9kDtPxDAQhF2AxPH4BxQuaXLEju1LKJAQ4iUdooHacuz14SOJc7ZziH-PT6Gm2tXuzEjzIXRJyiUpibjeLkF7N9glLSnLJyoacoQWhK9Y0XBGTtBpjNuyZFVd0wXavfRj8Hs3bHAA1RXJ9YDVOHaAbZhcwgYS6OT8cINfpy65ovdGddiopLAaTP6P6RPbKWYJ_nZ5H_xQJBU22Whwq_TXJvgpKwP0fq-6c3RsVRfh4m-eoY_Hh_f752L99vRyf7cudFWxVCjWNkoTYSsuatNSQpTg0DDgDQXVClILMALqpjG2pZaVtSVkxW1NGKW1MNUZuppzc7_dBDHJ3kUNXacG8FOUFeHVigvOWZayWaqDjzGAlWNwvQo_kpTyQFVu5UxVHqjKmWq23c42yDX2DoKM2sGgwbiQmUnj3f8Bv_jPhe0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3153756554</pqid></control><display><type>article</type><title>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</title><source>Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list)</source><creator>Kaukab, Shaghaf ; Komal ; Ghodki, Bhupendra M ; Ray, Hena ; Kalnar, Yogesh B. ; Narsaiah, Kairam ; Brar, Jaskaran S.</creator><creatorcontrib>Kaukab, Shaghaf ; Komal ; Ghodki, Bhupendra M ; Ray, Hena ; Kalnar, Yogesh B. ; Narsaiah, Kairam ; Brar, Jaskaran S.</creatorcontrib><description>In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information.
•The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detection model, as NBR-DF-YOLOv5.•A pipeline works to generate point cloud filtration, segmentation and object extraction from depth images.•AP0.5 of NBR-DF-YOLOv5 is 0.964 as compared to 0.925 achieved with YOLOv5.</description><identifier>ISSN: 1574-9541</identifier><identifier>DOI: 10.1016/j.ecoinf.2024.102691</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>3D localization ; Apple ; apples ; automation ; data collection ; Depth sensor ; Fruit detection ; fruits ; orchards ; RGB-D images ; YOLO network</subject><ispartof>Ecological informatics, 2024-09, Vol.82, p.102691, Article 102691</ispartof><rights>2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail></links><search><creatorcontrib>Kaukab, Shaghaf</creatorcontrib><creatorcontrib>Komal</creatorcontrib><creatorcontrib>Ghodki, Bhupendra M</creatorcontrib><creatorcontrib>Ray, Hena</creatorcontrib><creatorcontrib>Kalnar, Yogesh B.</creatorcontrib><creatorcontrib>Narsaiah, Kairam</creatorcontrib><creatorcontrib>Brar, Jaskaran S.</creatorcontrib><title>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</title><title>Ecological informatics</title><description>In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information.
•The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detection model, as NBR-DF-YOLOv5.•A pipeline works to generate point cloud filtration, segmentation and object extraction from depth images.•AP0.5 of NBR-DF-YOLOv5 is 0.964 as compared to 0.925 achieved with YOLOv5.</description><subject>3D localization</subject><subject>Apple</subject><subject>apples</subject><subject>automation</subject><subject>data collection</subject><subject>Depth sensor</subject><subject>Fruit detection</subject><subject>fruits</subject><subject>orchards</subject><subject>RGB-D images</subject><subject>YOLO network</subject><issn>1574-9541</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kDtPxDAQhF2AxPH4BxQuaXLEju1LKJAQ4iUdooHacuz14SOJc7ZziH-PT6Gm2tXuzEjzIXRJyiUpibjeLkF7N9glLSnLJyoacoQWhK9Y0XBGTtBpjNuyZFVd0wXavfRj8Hs3bHAA1RXJ9YDVOHaAbZhcwgYS6OT8cINfpy65ovdGddiopLAaTP6P6RPbKWYJ_nZ5H_xQJBU22Whwq_TXJvgpKwP0fq-6c3RsVRfh4m-eoY_Hh_f752L99vRyf7cudFWxVCjWNkoTYSsuatNSQpTg0DDgDQXVClILMALqpjG2pZaVtSVkxW1NGKW1MNUZuppzc7_dBDHJ3kUNXacG8FOUFeHVigvOWZayWaqDjzGAlWNwvQo_kpTyQFVu5UxVHqjKmWq23c42yDX2DoKM2sGgwbiQmUnj3f8Bv_jPhe0</recordid><startdate>202409</startdate><enddate>202409</enddate><creator>Kaukab, Shaghaf</creator><creator>Komal</creator><creator>Ghodki, Bhupendra M</creator><creator>Ray, Hena</creator><creator>Kalnar, Yogesh B.</creator><creator>Narsaiah, Kairam</creator><creator>Brar, Jaskaran S.</creator><general>Elsevier B.V</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7S9</scope><scope>L.6</scope></search><sort><creationdate>202409</creationdate><title>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</title><author>Kaukab, Shaghaf ; Komal ; Ghodki, Bhupendra M ; Ray, Hena ; Kalnar, Yogesh B. ; Narsaiah, Kairam ; Brar, Jaskaran S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D localization</topic><topic>Apple</topic><topic>apples</topic><topic>automation</topic><topic>data collection</topic><topic>Depth sensor</topic><topic>Fruit detection</topic><topic>fruits</topic><topic>orchards</topic><topic>RGB-D images</topic><topic>YOLO network</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaukab, Shaghaf</creatorcontrib><creatorcontrib>Komal</creatorcontrib><creatorcontrib>Ghodki, Bhupendra M</creatorcontrib><creatorcontrib>Ray, Hena</creatorcontrib><creatorcontrib>Kalnar, Yogesh B.</creatorcontrib><creatorcontrib>Narsaiah, Kairam</creatorcontrib><creatorcontrib>Brar, Jaskaran S.</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><jtitle>Ecological informatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kaukab, Shaghaf</au><au>Komal</au><au>Ghodki, Bhupendra M</au><au>Ray, Hena</au><au>Kalnar, Yogesh B.</au><au>Narsaiah, Kairam</au><au>Brar, Jaskaran S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal</atitle><jtitle>Ecological informatics</jtitle><date>2024-09</date><risdate>2024</risdate><volume>82</volume><spage>102691</spage><pages>102691-</pages><artnum>102691</artnum><issn>1574-9541</issn><abstract>In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information.
•The non-targeted background removal using depth fusion (NBR-DF) is developed to enhance apple fruit detection accuracy.•The NBR-DF used as pipeline with YOLOv5 detection model, as NBR-DF-YOLOv5.•A pipeline works to generate point cloud filtration, segmentation and object extraction from depth images.•AP0.5 of NBR-DF-YOLOv5 is 0.964 as compared to 0.925 achieved with YOLOv5.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.ecoinf.2024.102691</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1574-9541 |
ispartof | Ecological informatics, 2024-09, Vol.82, p.102691, Article 102691 |
issn | 1574-9541 |
language | eng |
recordid | cdi_proquest_miscellaneous_3153756554 |
source | Elsevier:Jisc Collections:Elsevier Read and Publish Agreement 2022-2024:Freedom Collection (Reading list) |
subjects | 3D localization Apple apples automation data collection Depth sensor Fruit detection fruits orchards RGB-D images YOLO network |
title | Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-03-06T13%3A31%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Improving%20real-time%20apple%20fruit%20detection:%20Multi-modal%20data%20and%20depth%20fusion%20with%20non-targeted%20background%20removal&rft.jtitle=Ecological%20informatics&rft.au=Kaukab,%20Shaghaf&rft.date=2024-09&rft.volume=82&rft.spage=102691&rft.pages=102691-&rft.artnum=102691&rft.issn=1574-9541&rft_id=info:doi/10.1016/j.ecoinf.2024.102691&rft_dat=%3Cproquest_cross%3E3153756554%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c334t-a4b9ac16f3568db211a65e94e592eab6186ed6e899dfb2f408f1175f8142286d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3153756554&rft_id=info:pmid/&rfr_iscdi=true |