Loading…

GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal toke...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Zhibo, Bai, Wuxia, Li, Yuxi, Meng, Mark Huasong, Wang, Kailong, Shi, Ling, Li, Li, Wang, Jun, Wang, Haoyu
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 655
container_issue
container_start_page 643
container_title
container_volume
creator Zhang, Zhibo
Bai, Wuxia
Li, Yuxi
Meng, Mark Huasong
Wang, Kailong
Shi, Ling
Li, Li
Wang, Jun
Wang, Haoyu
description Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.
doi_str_mv 10.1145/3691620.3695060
format conference_proceeding
fullrecord <record><control><sourceid>acm_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10765056</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10765056</ieee_id><sourcerecordid>acm_books_10_1145_3691620_3695060</sourcerecordid><originalsourceid>FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23</originalsourceid><addsrcrecordid>eNqNkL1PAkEUxFeNiQSpbSy2tDl8-83aEUQ0gWhBf9mPd-cGuDN3J4n_PYdQWdm8mckv84oh5I7BmDGpHoW2THMY96pAwwUZWWMnEsAwLifmkgy4liJjyvCrP-yGjNo2eeiR0ozpAYmLberC50dTe2ye6DTuXRVSVdJ5UWDo0h7pM3ZHV1fUVZGuUpdK9xvrgp7adF1vsGppqujSNSX2tyq_XW9WdcRte0uuC7dtcXTWIVm_zNez12z5vnibTZeZs0JnymmvjdBBRrRWgELLFXgP0TGpoxeRe-6dDV4WIThtIrggC-GMBwTPxZDcn94mRMy_mrRzzU_OwGgFSvd4fMIu7HJf15u2Z_lx0vw8aX6eNPdNwqIvPPyzIA7dAnJa</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</creator><creatorcontrib>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</creatorcontrib><description>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</description><identifier>ISBN: 9798400712487</identifier><identifier>EISSN: 2643-1572</identifier><identifier>EISBN: 9798400712487</identifier><identifier>DOI: 10.1145/3691620.3695060</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning ; Feature extraction ; Glitch token ; Large language models ; LLM analysis ; LLM security ; Maintenance engineering ; Prevention and mitigation ; Principal component analysis ; Reliability ; Software engineering ; Support vector machines ; Systematics ; Vocabulary</subject><ispartof>IEEE/ACM International Conference on Automated Software Engineering : [proceedings], 2024, p.643-655</ispartof><rights>2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-1100-8633 ; 0000-0003-2990-1614 ; 0009-0008-6447-1756 ; 0000-0001-5727-4326 ; 0009-0008-8032-3841 ; 0000-0002-3977-6573 ; 0000-0002-2023-0247 ; 0000-0003-1039-2151 ; 0009-0009-5332-9890</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10765056$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10765056$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhibo</creatorcontrib><creatorcontrib>Bai, Wuxia</creatorcontrib><creatorcontrib>Li, Yuxi</creatorcontrib><creatorcontrib>Meng, Mark Huasong</creatorcontrib><creatorcontrib>Wang, Kailong</creatorcontrib><creatorcontrib>Shi, Ling</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Wang, Haoyu</creatorcontrib><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><title>IEEE/ACM International Conference on Automated Software Engineering : [proceedings]</title><addtitle>ASE</addtitle><description>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</description><subject>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning</subject><subject>Feature extraction</subject><subject>Glitch token</subject><subject>Large language models</subject><subject>LLM analysis</subject><subject>LLM security</subject><subject>Maintenance engineering</subject><subject>Prevention and mitigation</subject><subject>Principal component analysis</subject><subject>Reliability</subject><subject>Software engineering</subject><subject>Support vector machines</subject><subject>Systematics</subject><subject>Vocabulary</subject><issn>2643-1572</issn><isbn>9798400712487</isbn><isbn>9798400712487</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkL1PAkEUxFeNiQSpbSy2tDl8-83aEUQ0gWhBf9mPd-cGuDN3J4n_PYdQWdm8mckv84oh5I7BmDGpHoW2THMY96pAwwUZWWMnEsAwLifmkgy4liJjyvCrP-yGjNo2eeiR0ozpAYmLberC50dTe2ye6DTuXRVSVdJ5UWDo0h7pM3ZHV1fUVZGuUpdK9xvrgp7adF1vsGppqujSNSX2tyq_XW9WdcRte0uuC7dtcXTWIVm_zNez12z5vnibTZeZs0JnymmvjdBBRrRWgELLFXgP0TGpoxeRe-6dDV4WIThtIrggC-GMBwTPxZDcn94mRMy_mrRzzU_OwGgFSvd4fMIu7HJf15u2Z_lx0vw8aX6eNPdNwqIvPPyzIA7dAnJa</recordid><startdate>20241027</startdate><enddate>20241027</enddate><creator>Zhang, Zhibo</creator><creator>Bai, Wuxia</creator><creator>Li, Yuxi</creator><creator>Meng, Mark Huasong</creator><creator>Wang, Kailong</creator><creator>Shi, Ling</creator><creator>Li, Li</creator><creator>Wang, Jun</creator><creator>Wang, Haoyu</creator><general>ACM</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0003-1100-8633</orcidid><orcidid>https://orcid.org/0000-0003-2990-1614</orcidid><orcidid>https://orcid.org/0009-0008-6447-1756</orcidid><orcidid>https://orcid.org/0000-0001-5727-4326</orcidid><orcidid>https://orcid.org/0009-0008-8032-3841</orcidid><orcidid>https://orcid.org/0000-0002-3977-6573</orcidid><orcidid>https://orcid.org/0000-0002-2023-0247</orcidid><orcidid>https://orcid.org/0000-0003-1039-2151</orcidid><orcidid>https://orcid.org/0009-0009-5332-9890</orcidid></search><sort><creationdate>20241027</creationdate><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><author>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning</topic><topic>Feature extraction</topic><topic>Glitch token</topic><topic>Large language models</topic><topic>LLM analysis</topic><topic>LLM security</topic><topic>Maintenance engineering</topic><topic>Prevention and mitigation</topic><topic>Principal component analysis</topic><topic>Reliability</topic><topic>Software engineering</topic><topic>Support vector machines</topic><topic>Systematics</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhibo</creatorcontrib><creatorcontrib>Bai, Wuxia</creatorcontrib><creatorcontrib>Li, Yuxi</creatorcontrib><creatorcontrib>Meng, Mark Huasong</creatorcontrib><creatorcontrib>Wang, Kailong</creatorcontrib><creatorcontrib>Shi, Ling</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Wang, Haoyu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhibo</au><au>Bai, Wuxia</au><au>Li, Yuxi</au><au>Meng, Mark Huasong</au><au>Wang, Kailong</au><au>Shi, Ling</au><au>Li, Li</au><au>Wang, Jun</au><au>Wang, Haoyu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</atitle><btitle>IEEE/ACM International Conference on Automated Software Engineering : [proceedings]</btitle><stitle>ASE</stitle><date>2024-10-27</date><risdate>2024</risdate><spage>643</spage><epage>655</epage><pages>643-655</pages><eissn>2643-1572</eissn><isbn>9798400712487</isbn><eisbn>9798400712487</eisbn><coden>IEEPAD</coden><abstract>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs. In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3691620.3695060</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-1100-8633</orcidid><orcidid>https://orcid.org/0000-0003-2990-1614</orcidid><orcidid>https://orcid.org/0009-0008-6447-1756</orcidid><orcidid>https://orcid.org/0000-0001-5727-4326</orcidid><orcidid>https://orcid.org/0009-0008-8032-3841</orcidid><orcidid>https://orcid.org/0000-0002-3977-6573</orcidid><orcidid>https://orcid.org/0000-0002-2023-0247</orcidid><orcidid>https://orcid.org/0000-0003-1039-2151</orcidid><orcidid>https://orcid.org/0009-0009-5332-9890</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9798400712487
ispartof IEEE/ACM International Conference on Automated Software Engineering : [proceedings], 2024, p.643-655
issn 2643-1572
language eng
recordid cdi_ieee_primary_10765056
source IEEE Xplore All Conference Series
subjects Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning
Feature extraction
Glitch token
Large language models
LLM analysis
LLM security
Maintenance engineering
Prevention and mitigation
Principal component analysis
Reliability
Software engineering
Support vector machines
Systematics
Vocabulary
title GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T04%3A28%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=GlitchProber:%20Advancing%20Effective%20Detection%20and%20Mitigation%20of%20Glitch%20Tokens%20in%20Large%20Language%20Models&rft.btitle=IEEE/ACM%20International%20Conference%20on%20Automated%20Software%20Engineering%20:%20%5Bproceedings%5D&rft.au=Zhang,%20Zhibo&rft.date=2024-10-27&rft.spage=643&rft.epage=655&rft.pages=643-655&rft.eissn=2643-1572&rft.isbn=9798400712487&rft.coden=IEEPAD&rft_id=info:doi/10.1145/3691620.3695060&rft.eisbn=9798400712487&rft_dat=%3Cacm_CHZPO%3Eacm_books_10_1145_3691620_3695060%3C/acm_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10765056&rfr_iscdi=true