Loading…

Robust Semantic Interpretability: Revisiting Concept Activation Vectors

Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do no...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2021-04
Main Authors:	Pfau, Jacob, Young, Albert T, Wei, Jerome, Wei, Maria L, Keiser, Michael J
Format:	Article
Language:	English
Subjects:	Datasets Hypothesis testing Image classification Linearity Robustness Salience Semantics
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Pfau, Jacob Young, Albert T Wei, Jerome Wei, Maria L Keiser, Michael J
description	Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2509912025</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2509912025</sourcerecordid><originalsourceid>FETCH-proquest_journals_25099120253</originalsourceid><addsrcrecordid>eNqNyr0KwjAUQOEgCBbtOwScC2li1LpJ8W9VcS1puMotNanJbcG318EHcDrDd0YskUrl2Xoh5YSlMTZCCLlcSa1Vwg5nX_eR-AWexhFafnIEoQtApsYW6b3hZxgwIqF78NI7Cx3xrSUcDKF3_AaWfIgzNr6bNkL665TN97trecy64F89RKoa3wf3pUpqURS5FFKr_64PWxE8Ug</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2509912025</pqid></control><display><type>article</type><title>Robust Semantic Interpretability: Revisiting Concept Activation Vectors</title><source>Publicly Available Content (ProQuest)</source><creator>Pfau, Jacob ; Young, Albert T ; Wei, Jerome ; Wei, Maria L ; Keiser, Michael J</creator><creatorcontrib>Pfau, Jacob ; Young, Albert T ; Wei, Jerome ; Wei, Maria L ; Keiser, Michael J</creatorcontrib><description>Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Datasets ; Hypothesis testing ; Image classification ; Linearity ; Robustness ; Salience ; Semantics</subject><ispartof>arXiv.org, 2021-04</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2509912025?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>776,780,25732,36991,44569</link.rule.ids></links><search><creatorcontrib>Pfau, Jacob</creatorcontrib><creatorcontrib>Young, Albert T</creatorcontrib><creatorcontrib>Wei, Jerome</creatorcontrib><creatorcontrib>Wei, Maria L</creatorcontrib><creatorcontrib>Keiser, Michael J</creatorcontrib><title>Robust Semantic Interpretability: Revisiting Concept Activation Vectors</title><title>arXiv.org</title><description>Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods.</description><subject>Datasets</subject><subject>Hypothesis testing</subject><subject>Image classification</subject><subject>Linearity</subject><subject>Robustness</subject><subject>Salience</subject><subject>Semantics</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyr0KwjAUQOEgCBbtOwScC2li1LpJ8W9VcS1puMotNanJbcG318EHcDrDd0YskUrl2Xoh5YSlMTZCCLlcSa1Vwg5nX_eR-AWexhFafnIEoQtApsYW6b3hZxgwIqF78NI7Cx3xrSUcDKF3_AaWfIgzNr6bNkL665TN97trecy64F89RKoa3wf3pUpqURS5FFKr_64PWxE8Ug</recordid><startdate>20210406</startdate><enddate>20210406</enddate><creator>Pfau, Jacob</creator><creator>Young, Albert T</creator><creator>Wei, Jerome</creator><creator>Wei, Maria L</creator><creator>Keiser, Michael J</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20210406</creationdate><title>Robust Semantic Interpretability: Revisiting Concept Activation Vectors</title><author>Pfau, Jacob ; Young, Albert T ; Wei, Jerome ; Wei, Maria L ; Keiser, Michael J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_25099120253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Datasets</topic><topic>Hypothesis testing</topic><topic>Image classification</topic><topic>Linearity</topic><topic>Robustness</topic><topic>Salience</topic><topic>Semantics</topic><toplevel>online_resources</toplevel><creatorcontrib>Pfau, Jacob</creatorcontrib><creatorcontrib>Young, Albert T</creatorcontrib><creatorcontrib>Wei, Jerome</creatorcontrib><creatorcontrib>Wei, Maria L</creatorcontrib><creatorcontrib>Keiser, Michael J</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pfau, Jacob</au><au>Young, Albert T</au><au>Wei, Jerome</au><au>Wei, Maria L</au><au>Keiser, Michael J</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Robust Semantic Interpretability: Revisiting Concept Activation Vectors</atitle><jtitle>arXiv.org</jtitle><date>2021-04-06</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution dominate the interpretability literature, but these methods do not address semantic concepts such as the textures, colors, or genders of objects within an image. Our proposed Robust Concept Activation Vectors (RCAV) quantifies the effects of semantic concepts on individual model predictions and on model behavior as a whole. RCAV calculates a concept gradient and takes a gradient ascent step to assess model sensitivity to the given concept. By generalizing previous work on concept activation vectors to account for model non-linearity, and by introducing stricter hypothesis testing, we show that RCAV yields interpretations which are both more accurate at the image level and robust at the dataset level. RCAV, like saliency methods, supports the interpretation of individual predictions. To evaluate the practical use of interpretability methods as debugging tools, and the scientific use of interpretability methods for identifying inductive biases (e.g. texture over shape), we construct two datasets and accompanying metrics for realistic benchmarking of semantic interpretability methods. Our benchmarks expose the importance of counterfactual augmentation and negative controls for quantifying the practical usability of interpretability methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-04
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2509912025
source	Publicly Available Content (ProQuest)
subjects	Datasets Hypothesis testing Image classification Linearity Robustness Salience Semantics
title	Robust Semantic Interpretability: Revisiting Concept Activation Vectors
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T01%3A02%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Robust%20Semantic%20Interpretability:%20Revisiting%20Concept%20Activation%20Vectors&rft.jtitle=arXiv.org&rft.au=Pfau,%20Jacob&rft.date=2021-04-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2509912025%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_25099120253%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2509912025&rft_id=info:pmid/&rfr_iscdi=true