Loading…
Benchmarking the speed-accuracy tradeoff in object recognition by humans and neural networks
Active object recognition, fundamental to tasks like reading and driving, relies on the ability to make time-sensitive decisions. People exhibit a flexible tradeoff between speed and accuracy, a crucial human skill. However, current computational models struggle to incorporate time. To address this...
Saved in:
Published in: | Journal of vision (Charlottesville, Va.) Va.), 2025-01, Vol.25 (1), p.4 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Active object recognition, fundamental to tasks like reading and driving, relies on the ability to make time-sensitive decisions. People exhibit a flexible tradeoff between speed and accuracy, a crucial human skill. However, current computational models struggle to incorporate time. To address this gap, we present the first dataset (with 148 observers) exploring the speed-accuracy tradeoff (SAT) in ImageNet object recognition. Participants performed a 16-way ImageNet categorization task where their responses counted only if they occurred near the time of a fixed-delay beep. Each block of trials allowed one reaction time. As expected, human accuracy increases with reaction time. We compare human performance with that of dynamic neural networks that adapt their computation to the available inference time. Time is a scarce resource for human object recognition, and finding an appropriate analog in neural networks is challenging. Networks can repeat operations by using layers, recurrent cycles, or early exits. We use the repetition count as a network's analog for time. In our analysis, the number of layers, recurrent cycles, and early exits correlates strongly with floating-point operations, making them suitable time analogs. Comparing networks and humans on SAT-fit error, category-wise correlation, and SAT-curve steepness, we find cascaded dynamic neural networks most promising in modeling human speed and accuracy. Surprisingly, convolutional recurrent networks, typically favored in human object recognition modeling, perform the worst on our benchmark. |
---|---|
ISSN: | 1534-7362 1534-7362 |
DOI: | 10.1167/jov.25.1.4 |