Loading…

Using Transfer Learning to Refine Object Detection Models for Blind and Low Vision Users

Object detection models available on smartphones such as YOLOv8 can potentially help identify and locate objects of interest to people who are blind or low vision (pBLV). However, current models may miss crucial objects for pBLV. Here, we compared 5 transfer learning methods for adding new classes o...

Full description

Saved in:
Bibliographic Details
Main Authors: Bhandari, Aradhita, Batutis, Gail S., Jain, Aryan, Sico, Mallory C., Hamilton-Fletcher, Giles, Feng, Chen, Hudson, Todd E., Rizzo, John-Ross, Chan, Kevin C.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Object detection models available on smartphones such as YOLOv8 can potentially help identify and locate objects of interest to people who are blind or low vision (pBLV). However, current models may miss crucial objects for pBLV. Here, we compared 5 transfer learning methods for adding new classes of interest to pBLV navigation that are absent from the Common Objects in Context (COCO) training dataset. Using a rebalanced COCO dataset with these new classes, we revised public YOLOv8s models via the following methods: revising all pretrained weights; freezing 22, 21 or 15 layers; and few-shot learning. These approaches achieved overall mean average precision (mAP-50) from 0.342 (20 min training time; few-shot learning) to 0.420 (9.2 hrs; revising all pretrained weights). Among the frozen layer models, the 15 frozen layer model had the best mAP-50 performance of 0.419 (7.7 hrs); hyperparameter tuning on this model increased mAP-50 to 0.423. When applied to a larger YOLOv8xl model, mAP-50 reached 0.511 after 50 epochs. Our results highlight how object detection models can be adapted for the benefit of pBLV users even when developers have limited training data or computational resources.
ISSN:2694-0604
DOI:10.1109/EMBC53108.2024.10782343