Loading…
Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations
Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derive...
Saved in:
Published in: | arXiv.org 2024-12 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Xie, Yudi Huang, Weichen Alter, Esther Schwartz, Jeremy Tenenbaum, Joshua B DiCarlo, James J |
description | Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3144196982</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3144196982</sourcerecordid><originalsourceid>FETCH-proquest_journals_31441969823</originalsourceid><addsrcrecordid>eNqNjcEKwjAQRIMgWLT_EPBcaJO2tueiePIkXsuCW0lJk7qb-v2m4Ad4GmbeDLMRidK6yJpSqZ1Imcc8z1V9UlWlEzE8DBvvZHe7sQwExuFTBi-Rg5kgoOQZggErbTQusLQItHbYTMYCyU9MCWzGgRCmDKx5rZhwJuTI4to7PojtAJYx_eleHC_ne3fNZvLvJZ71o1_IRdTroiyLtm4bpf9rfQEzPEhK</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3144196982</pqid></control><display><type>article</type><title>Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations</title><source>Publicly Available Content Database</source><creator>Xie, Yudi ; Huang, Weichen ; Alter, Esther ; Schwartz, Jeremy ; Tenenbaum, Joshua B ; DiCarlo, James J</creator><creatorcontrib>Xie, Yudi ; Huang, Weichen ; Alter, Esther ; Schwartz, Jeremy ; Tenenbaum, Joshua B ; DiCarlo, James J</creatorcontrib><description>Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Alignment ; Artificial neural networks ; Classification ; Estimation ; Graphical representations ; Questions ; Synthetic data</subject><ispartof>arXiv.org, 2024-12</ispartof><rights>2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/3144196982?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Xie, Yudi</creatorcontrib><creatorcontrib>Huang, Weichen</creatorcontrib><creatorcontrib>Alter, Esther</creatorcontrib><creatorcontrib>Schwartz, Jeremy</creatorcontrib><creatorcontrib>Tenenbaum, Joshua B</creatorcontrib><creatorcontrib>DiCarlo, James J</creatorcontrib><title>Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations</title><title>arXiv.org</title><description>Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.</description><subject>Alignment</subject><subject>Artificial neural networks</subject><subject>Classification</subject><subject>Estimation</subject><subject>Graphical representations</subject><subject>Questions</subject><subject>Synthetic data</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNjcEKwjAQRIMgWLT_EPBcaJO2tueiePIkXsuCW0lJk7qb-v2m4Ad4GmbeDLMRidK6yJpSqZ1Imcc8z1V9UlWlEzE8DBvvZHe7sQwExuFTBi-Rg5kgoOQZggErbTQusLQItHbYTMYCyU9MCWzGgRCmDKx5rZhwJuTI4to7PojtAJYx_eleHC_ne3fNZvLvJZ71o1_IRdTroiyLtm4bpf9rfQEzPEhK</recordid><startdate>20241212</startdate><enddate>20241212</enddate><creator>Xie, Yudi</creator><creator>Huang, Weichen</creator><creator>Alter, Esther</creator><creator>Schwartz, Jeremy</creator><creator>Tenenbaum, Joshua B</creator><creator>DiCarlo, James J</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241212</creationdate><title>Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations</title><author>Xie, Yudi ; Huang, Weichen ; Alter, Esther ; Schwartz, Jeremy ; Tenenbaum, Joshua B ; DiCarlo, James J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31441969823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Alignment</topic><topic>Artificial neural networks</topic><topic>Classification</topic><topic>Estimation</topic><topic>Graphical representations</topic><topic>Questions</topic><topic>Synthetic data</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Yudi</creatorcontrib><creatorcontrib>Huang, Weichen</creatorcontrib><creatorcontrib>Alter, Esther</creatorcontrib><creatorcontrib>Schwartz, Jeremy</creatorcontrib><creatorcontrib>Tenenbaum, Joshua B</creatorcontrib><creatorcontrib>DiCarlo, James J</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xie, Yudi</au><au>Huang, Weichen</au><au>Alter, Esther</au><au>Schwartz, Jeremy</au><au>Tenenbaum, Joshua B</au><au>DiCarlo, James J</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations</atitle><jtitle>arXiv.org</jtitle><date>2024-12-12</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also derived under such an objective. Here, we explore an alternative hypothesis: Might the ventral stream be optimized for estimating spatial latents? And a closely related question: How different -- if at all -- are representations learned from spatial latent estimation compared to categorization? To ask these questions, we leveraged synthetic image datasets generated by a 3D graphic engine and trained convolutional neural networks (CNNs) to estimate different combinations of spatial and category latents. We found that models trained to estimate just a few spatial latents achieve neural alignment scores comparable to those trained on hundreds of categories, and the spatial latent performance of models strongly correlates with their neural alignment. Spatial latent and category-trained models have very similar -- but not identical -- internal representations, especially in their early and middle layers. We provide evidence that this convergence is partly driven by non-target latent variability in the training data, which facilitates the implicit learning of representations of those non-target latents. Taken together, these results suggest that many training objectives, such as spatial latents, can lead to similar models aligned neurally with the ventral stream. Thus, one should not assume that the ventral stream is optimized for object categorization only. As a field, we need to continue to sharpen our measures of comparing models to brains to better understand the functional roles of the ventral stream.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-12 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3144196982 |
source | Publicly Available Content Database |
subjects | Alignment Artificial neural networks Classification Estimation Graphical representations Questions Synthetic data |
title | Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-19T04%3A19%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Vision%20CNNs%20trained%20to%20estimate%20spatial%20latents%20learned%20similar%20ventral-stream-aligned%20representations&rft.jtitle=arXiv.org&rft.au=Xie,%20Yudi&rft.date=2024-12-12&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3144196982%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_31441969823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3144196982&rft_id=info:pmid/&rfr_iscdi=true |