Loading…

GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal toke...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhang, Zhibo, Bai, Wuxia, Li, Yuxi, Meng, Mark Huasong, Wang, Kailong, Shi, Ling, Li, Li, Wang, Jun, Wang, Haoyu
Format:	Conference Proceeding
Language:	English
Subjects:	Computing methodologies > Artificial intelligence > Knowledge representation and reasoning Feature extraction Glitch token Large language models LLM analysis LLM security Maintenance engineering Prevention and mitigation Principal component analysis Reliability Software engineering Support vector machines Systematics Vocabulary
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	655
container_issue
container_start_page	643
container_title
container_volume
creator	Zhang, Zhibo Bai, Wuxia Li, Yuxi Meng, Mark Huasong Wang, Kailong Shi, Ling Li, Li Wang, Jun Wang, Haoyu
description	Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.
doi_str_mv	10.1145/3691620.3695060
format	conference_proceeding
fullrecord	<record><control><sourceid>acm_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10765056</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10765056</ieee_id><sourcerecordid>acm_books_10_1145_3691620_3695060</sourcerecordid><originalsourceid>FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23</originalsourceid><addsrcrecordid>eNqNkL1PAkEUxFeNiQSpbSy2tDl8-83aEUQ0gWhBf9mPd-cGuDN3J4n_PYdQWdm8mckv84oh5I7BmDGpHoW2THMY96pAwwUZWWMnEsAwLifmkgy4liJjyvCrP-yGjNo2eeiR0ozpAYmLberC50dTe2ye6DTuXRVSVdJ5UWDo0h7pM3ZHV1fUVZGuUpdK9xvrgp7adF1vsGppqujSNSX2tyq_XW9WdcRte0uuC7dtcXTWIVm_zNez12z5vnibTZeZs0JnymmvjdBBRrRWgELLFXgP0TGpoxeRe-6dDV4WIThtIrggC-GMBwTPxZDcn94mRMy_mrRzzU_OwGgFSvd4fMIu7HJf15u2Z_lx0vw8aX6eNPdNwqIvPPyzIA7dAnJa</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</creator><creatorcontrib>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</creatorcontrib><description>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</description><identifier>ISBN: 9798400712487</identifier><identifier>EISSN: 2643-1572</identifier><identifier>EISBN: 9798400712487</identifier><identifier>DOI: 10.1145/3691620.3695060</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning ; Feature extraction ; Glitch token ; Large language models ; LLM analysis ; LLM security ; Maintenance engineering ; Prevention and mitigation ; Principal component analysis ; Reliability ; Software engineering ; Support vector machines ; Systematics ; Vocabulary</subject><ispartof>IEEE/ACM International Conference on Automated Software Engineering : [proceedings], 2024, p.643-655</ispartof><rights>2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-1100-8633 ; 0000-0003-2990-1614 ; 0009-0008-6447-1756 ; 0000-0001-5727-4326 ; 0009-0008-8032-3841 ; 0000-0002-3977-6573 ; 0000-0002-2023-0247 ; 0000-0003-1039-2151 ; 0009-0009-5332-9890</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10765056$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10765056$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhibo</creatorcontrib><creatorcontrib>Bai, Wuxia</creatorcontrib><creatorcontrib>Li, Yuxi</creatorcontrib><creatorcontrib>Meng, Mark Huasong</creatorcontrib><creatorcontrib>Wang, Kailong</creatorcontrib><creatorcontrib>Shi, Ling</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Wang, Haoyu</creatorcontrib><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><title>IEEE/ACM International Conference on Automated Software Engineering : [proceedings]</title><addtitle>ASE</addtitle><description>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</description><subject>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning</subject><subject>Feature extraction</subject><subject>Glitch token</subject><subject>Large language models</subject><subject>LLM analysis</subject><subject>LLM security</subject><subject>Maintenance engineering</subject><subject>Prevention and mitigation</subject><subject>Principal component analysis</subject><subject>Reliability</subject><subject>Software engineering</subject><subject>Support vector machines</subject><subject>Systematics</subject><subject>Vocabulary</subject><issn>2643-1572</issn><isbn>9798400712487</isbn><isbn>9798400712487</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkL1PAkEUxFeNiQSpbSy2tDl8-83aEUQ0gWhBf9mPd-cGuDN3J4n_PYdQWdm8mckv84oh5I7BmDGpHoW2THMY96pAwwUZWWMnEsAwLifmkgy4liJjyvCrP-yGjNo2eeiR0ozpAYmLberC50dTe2ye6DTuXRVSVdJ5UWDo0h7pM3ZHV1fUVZGuUpdK9xvrgp7adF1vsGppqujSNSX2tyq_XW9WdcRte0uuC7dtcXTWIVm_zNez12z5vnibTZeZs0JnymmvjdBBRrRWgELLFXgP0TGpoxeRe-6dDV4WIThtIrggC-GMBwTPxZDcn94mRMy_mrRzzU_OwGgFSvd4fMIu7HJf15u2Z_lx0vw8aX6eNPdNwqIvPPyzIA7dAnJa</recordid><startdate>20241027</startdate><enddate>20241027</enddate><creator>Zhang, Zhibo</creator><creator>Bai, Wuxia</creator><creator>Li, Yuxi</creator><creator>Meng, Mark Huasong</creator><creator>Wang, Kailong</creator><creator>Shi, Ling</creator><creator>Li, Li</creator><creator>Wang, Jun</creator><creator>Wang, Haoyu</creator><general>ACM</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0003-1100-8633</orcidid><orcidid>https://orcid.org/0000-0003-2990-1614</orcidid><orcidid>https://orcid.org/0009-0008-6447-1756</orcidid><orcidid>https://orcid.org/0000-0001-5727-4326</orcidid><orcidid>https://orcid.org/0009-0008-8032-3841</orcidid><orcidid>https://orcid.org/0000-0002-3977-6573</orcidid><orcidid>https://orcid.org/0000-0002-2023-0247</orcidid><orcidid>https://orcid.org/0000-0003-1039-2151</orcidid><orcidid>https://orcid.org/0009-0009-5332-9890</orcidid></search><sort><creationdate>20241027</creationdate><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><author>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning</topic><topic>Feature extraction</topic><topic>Glitch token</topic><topic>Large language models</topic><topic>LLM analysis</topic><topic>LLM security</topic><topic>Maintenance engineering</topic><topic>Prevention and mitigation</topic><topic>Principal component analysis</topic><topic>Reliability</topic><topic>Software engineering</topic><topic>Support vector machines</topic><topic>Systematics</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhibo</creatorcontrib><creatorcontrib>Bai, Wuxia</creatorcontrib><creatorcontrib>Li, Yuxi</creatorcontrib><creatorcontrib>Meng, Mark Huasong</creatorcontrib><creatorcontrib>Wang, Kailong</creatorcontrib><creatorcontrib>Shi, Ling</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Wang, Haoyu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhibo</au><au>Bai, Wuxia</au><au>Li, Yuxi</au><au>Meng, Mark Huasong</au><au>Wang, Kailong</au><au>Shi, Ling</au><au>Li, Li</au><au>Wang, Jun</au><au>Wang, Haoyu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</atitle><btitle>IEEE/ACM International Conference on Automated Software Engineering : [proceedings]</btitle><stitle>ASE</stitle><date>2024-10-27</date><risdate>2024</risdate><spage>643</spage><epage>655</epage><pages>643-655</pages><eissn>2643-1572</eissn><isbn>9798400712487</isbn><eisbn>9798400712487</eisbn><coden>IEEPAD</coden><abstract>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3691620.3695060</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-1100-8633</orcidid><orcidid>https://orcid.org/0000-0003-2990-1614</orcidid><orcidid>https://orcid.org/0009-0008-6447-1756</orcidid><orcidid>https://orcid.org/0000-0001-5727-4326</orcidid><orcidid>https://orcid.org/0009-0008-8032-3841</orcidid><orcidid>https://orcid.org/0000-0002-3977-6573</orcidid><orcidid>https://orcid.org/0000-0002-2023-0247</orcidid><orcidid>https://orcid.org/0000-0003-1039-2151</orcidid><orcidid>https://orcid.org/0009-0009-5332-9890</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9798400712487
ispartof	IEEE/ACM International Conference on Automated Software Engineering : [proceedings], 2024, p.643-655
issn	2643-1572
language	eng
recordid	cdi_ieee_primary_10765056
source	IEEE Xplore All Conference Series
subjects	Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning Feature extraction Glitch token Large language models LLM analysis LLM security Maintenance engineering Prevention and mitigation Principal component analysis Reliability Software engineering Support vector machines Systematics Vocabulary
title	GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T04%3A28%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=GlitchProber:%20Advancing%20Effective%20Detection%20and%20Mitigation%20of%20Glitch%20Tokens%20in%20Large%20Language%20Models&rft.btitle=IEEE/ACM%20International%20Conference%20on%20Automated%20Software%20Engineering%20:%20%5Bproceedings%5D&rft.au=Zhang,%20Zhibo&rft.date=2024-10-27&rft.spage=643&rft.epage=655&rft.pages=643-655&rft.eissn=2643-1572&rft.isbn=9798400712487&rft.coden=IEEPAD&rft_id=info:doi/10.1145/3691620.3695060&rft.eisbn=9798400712487&rft_dat=%3Cacm_CHZPO%3Eacm_books_10_1145_3691620_3695060%3C/acm_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10765056&rfr_iscdi=true