Loading…

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and ac...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-07
Main Authors:	Bimbraw, Keshav, Wang, Ye, Liu, Jing, Koike-Akino, Toshiaki
Format:	Article
Language:	English
Subjects:	Decoding Forearm
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning.
ISSN:	2331-8422