Loading…
Using artificial intelligence to estimate the nutritional content of meal photos: an evaluation of ChatGPT-4
Dietary intake assessment is an essential part of nutrition research and practice, with the use of digital technology now well established(1) and artificial intelligence (AI) in the form of image recognition readily available in research and commercial settings(2). Recent advances in large language...
Saved in:
Published in: | Proceedings of the Nutrition Society 2024-11, Vol.83 (OCE4) |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Dietary intake assessment is an essential part of nutrition research and practice, with the use of digital technology now well established(1) and artificial intelligence (AI) in the form of image recognition readily available in research and commercial settings(2). Recent advances in large language models (LLMs), such as ChatGPT, allow computers to converse in a human-like way providing text responses to typed queries. No studies, however, have utilised both the LLM and image recognition components of ChatGPT-4 to evaluate its accuracy to estimate nutritional content of meals. The aim of this study was to evaluate the accuracy of the ChatGPT-4 LLM and image recognition model in estimating the nutritional content of meals. Thirty-eight meal photographs with known nutritional content (from McCance and Widdowson’s Composition of Foods) were uploaded to ChatGPT, and it was asked to provide point estimates for each of the meals for each of the following: energy (kcal), protein (g), total carbohydrate (g), dietary fibre (g), total sugar (g), total fat (g), saturated fat (g), monounsaturated fat (g), polyunsaturated fat (g), calcium (mg), iron (mg), sodium (mg), potassium (mg), vitamin D (mcg), folate (mcg), and vitamin C (mg). Comparisons were made between ChatGPT estimates and those from McCance and Widdowson using the Wilcoxon signed rank test, percent difference, Spearman’s correlation, and cross-classification of quartiles. Interpretation of statistical measures was based on Lombard et al.(3). For estimating the content of meals, differences (p < 0.05) existed between the methods for 11 of the 16 nutrients, and 12 nutrients had a percent difference of >10%, indicating poor agreement for most nutrients. ChatGPT underestimated 15 of the 16 nutrients. Conversely, when considering the ranking of meals, all nutrients had correlation coefficients which indicated good (rs ≥ 0.50)(11 of 16) or acceptable (0.20 < rs < 0.49)(5 of 16) agreement. In the cross-classification of quartiles, ≥50% of meals were classified into the same quartile by both methods for 9 nutrients and 10% of meals were classified into opposite quartiles for 14 nutrients, indicating good agreement. ChatGPT also provided caveats regarding its estimations such as “the caloric estimate assumes the butter is spread thinly” and “cornflakes can often be fortified with vitamins and minerals […] and exact content could also vary based on the brand of cornflakes”. ChatGPT showed poor agreement for estimati |
---|---|
ISSN: | 0029-6651 1475-2719 |
DOI: | 10.1017/S0029665124005743 |