Loading…
Advancing Crowd Object Detection: A Review of YOLO, CNN, and Vision Transformers Hybrid Approach
One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progress in object detection. This research investigates in greater detail how object det...
Saved in:
Published in: | International journal for research in applied science and engineering technology 2024-06, Vol.12 (6), p.1240-1268 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progress in object detection. This research investigates in greater detail how object detection has changed in the recent years in the deep learning age. We provide an overview of the literature on a range of cutting-edge object identification algorithms and the theoretical underpinnings of these techniques. Deep learning technologies are contributing to substantial innovations in the field of object detection. While Convolutional Neural Networks (CNNs) have laid a solid foundation, new models such as YOLO and Vision Transformers (ViTs) have expanded the possibilities even further by providing high accuracy and fast detection in a variety of settings. Even with these developments, integrating CNNs, ViTs, and YOLO into a coherent framework still poses challenges with juggling computing demand, speed, and accuracy—especially in dynamic contexts. Real-time processing in applications like surveillance and autonomous driving necessitates improvements that take use of each model type's advantages |
---|---|
ISSN: | 2321-9653 2321-9653 |
DOI: | 10.22214/ijraset.2024.63293 |