Loading…
GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal toke...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 655 |
container_issue | |
container_start_page | 643 |
container_title | |
container_volume | |
creator | Zhang, Zhibo Bai, Wuxia Li, Yuxi Meng, Mark Huasong Wang, Kailong Shi, Ling Li, Li Wang, Jun Wang, Haoyu |
description | Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.
In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber. |
doi_str_mv | 10.1145/3691620.3695060 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>acm_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10765056</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10765056</ieee_id><sourcerecordid>acm_books_10_1145_3691620_3695060</sourcerecordid><originalsourceid>FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23</originalsourceid><addsrcrecordid>eNqNkL1PAkEUxFeNiQSpbSy2tDl8-83aEUQ0gWhBf9mPd-cGuDN3J4n_PYdQWdm8mckv84oh5I7BmDGpHoW2THMY96pAwwUZWWMnEsAwLifmkgy4liJjyvCrP-yGjNo2eeiR0ozpAYmLberC50dTe2ye6DTuXRVSVdJ5UWDo0h7pM3ZHV1fUVZGuUpdK9xvrgp7adF1vsGppqujSNSX2tyq_XW9WdcRte0uuC7dtcXTWIVm_zNez12z5vnibTZeZs0JnymmvjdBBRrRWgELLFXgP0TGpoxeRe-6dDV4WIThtIrggC-GMBwTPxZDcn94mRMy_mrRzzU_OwGgFSvd4fMIu7HJf15u2Z_lx0vw8aX6eNPdNwqIvPPyzIA7dAnJa</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</creator><creatorcontrib>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</creatorcontrib><description>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.
In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</description><identifier>ISBN: 9798400712487</identifier><identifier>EISSN: 2643-1572</identifier><identifier>EISBN: 9798400712487</identifier><identifier>DOI: 10.1145/3691620.3695060</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning ; Feature extraction ; Glitch token ; Large language models ; LLM analysis ; LLM security ; Maintenance engineering ; Prevention and mitigation ; Principal component analysis ; Reliability ; Software engineering ; Support vector machines ; Systematics ; Vocabulary</subject><ispartof>IEEE/ACM International Conference on Automated Software Engineering : [proceedings], 2024, p.643-655</ispartof><rights>2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-1100-8633 ; 0000-0003-2990-1614 ; 0009-0008-6447-1756 ; 0000-0001-5727-4326 ; 0009-0008-8032-3841 ; 0000-0002-3977-6573 ; 0000-0002-2023-0247 ; 0000-0003-1039-2151 ; 0009-0009-5332-9890</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10765056$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10765056$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Zhibo</creatorcontrib><creatorcontrib>Bai, Wuxia</creatorcontrib><creatorcontrib>Li, Yuxi</creatorcontrib><creatorcontrib>Meng, Mark Huasong</creatorcontrib><creatorcontrib>Wang, Kailong</creatorcontrib><creatorcontrib>Shi, Ling</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Wang, Haoyu</creatorcontrib><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><title>IEEE/ACM International Conference on Automated Software Engineering : [proceedings]</title><addtitle>ASE</addtitle><description>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.
In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</description><subject>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning</subject><subject>Feature extraction</subject><subject>Glitch token</subject><subject>Large language models</subject><subject>LLM analysis</subject><subject>LLM security</subject><subject>Maintenance engineering</subject><subject>Prevention and mitigation</subject><subject>Principal component analysis</subject><subject>Reliability</subject><subject>Software engineering</subject><subject>Support vector machines</subject><subject>Systematics</subject><subject>Vocabulary</subject><issn>2643-1572</issn><isbn>9798400712487</isbn><isbn>9798400712487</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkL1PAkEUxFeNiQSpbSy2tDl8-83aEUQ0gWhBf9mPd-cGuDN3J4n_PYdQWdm8mckv84oh5I7BmDGpHoW2THMY96pAwwUZWWMnEsAwLifmkgy4liJjyvCrP-yGjNo2eeiR0ozpAYmLberC50dTe2ye6DTuXRVSVdJ5UWDo0h7pM3ZHV1fUVZGuUpdK9xvrgp7adF1vsGppqujSNSX2tyq_XW9WdcRte0uuC7dtcXTWIVm_zNez12z5vnibTZeZs0JnymmvjdBBRrRWgELLFXgP0TGpoxeRe-6dDV4WIThtIrggC-GMBwTPxZDcn94mRMy_mrRzzU_OwGgFSvd4fMIu7HJf15u2Z_lx0vw8aX6eNPdNwqIvPPyzIA7dAnJa</recordid><startdate>20241027</startdate><enddate>20241027</enddate><creator>Zhang, Zhibo</creator><creator>Bai, Wuxia</creator><creator>Li, Yuxi</creator><creator>Meng, Mark Huasong</creator><creator>Wang, Kailong</creator><creator>Shi, Ling</creator><creator>Li, Li</creator><creator>Wang, Jun</creator><creator>Wang, Haoyu</creator><general>ACM</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0003-1100-8633</orcidid><orcidid>https://orcid.org/0000-0003-2990-1614</orcidid><orcidid>https://orcid.org/0009-0008-6447-1756</orcidid><orcidid>https://orcid.org/0000-0001-5727-4326</orcidid><orcidid>https://orcid.org/0009-0008-8032-3841</orcidid><orcidid>https://orcid.org/0000-0002-3977-6573</orcidid><orcidid>https://orcid.org/0000-0002-2023-0247</orcidid><orcidid>https://orcid.org/0000-0003-1039-2151</orcidid><orcidid>https://orcid.org/0009-0009-5332-9890</orcidid></search><sort><creationdate>20241027</creationdate><title>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</title><author>Zhang, Zhibo ; Bai, Wuxia ; Li, Yuxi ; Meng, Mark Huasong ; Wang, Kailong ; Shi, Ling ; Li, Li ; Wang, Jun ; Wang, Haoyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning</topic><topic>Feature extraction</topic><topic>Glitch token</topic><topic>Large language models</topic><topic>LLM analysis</topic><topic>LLM security</topic><topic>Maintenance engineering</topic><topic>Prevention and mitigation</topic><topic>Principal component analysis</topic><topic>Reliability</topic><topic>Software engineering</topic><topic>Support vector machines</topic><topic>Systematics</topic><topic>Vocabulary</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Zhibo</creatorcontrib><creatorcontrib>Bai, Wuxia</creatorcontrib><creatorcontrib>Li, Yuxi</creatorcontrib><creatorcontrib>Meng, Mark Huasong</creatorcontrib><creatorcontrib>Wang, Kailong</creatorcontrib><creatorcontrib>Shi, Ling</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Wang, Jun</creatorcontrib><creatorcontrib>Wang, Haoyu</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Zhibo</au><au>Bai, Wuxia</au><au>Li, Yuxi</au><au>Meng, Mark Huasong</au><au>Wang, Kailong</au><au>Shi, Ling</au><au>Li, Li</au><au>Wang, Jun</au><au>Wang, Haoyu</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models</atitle><btitle>IEEE/ACM International Conference on Automated Software Engineering : [proceedings]</btitle><stitle>ASE</stitle><date>2024-10-27</date><risdate>2024</risdate><spage>643</spage><epage>655</epage><pages>643-655</pages><eissn>2643-1572</eissn><isbn>9798400712487</isbn><eisbn>9798400712487</eisbn><coden>IEEPAD</coden><abstract>Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.
In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research toward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3691620.3695060</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-1100-8633</orcidid><orcidid>https://orcid.org/0000-0003-2990-1614</orcidid><orcidid>https://orcid.org/0009-0008-6447-1756</orcidid><orcidid>https://orcid.org/0000-0001-5727-4326</orcidid><orcidid>https://orcid.org/0009-0008-8032-3841</orcidid><orcidid>https://orcid.org/0000-0002-3977-6573</orcidid><orcidid>https://orcid.org/0000-0002-2023-0247</orcidid><orcidid>https://orcid.org/0000-0003-1039-2151</orcidid><orcidid>https://orcid.org/0009-0009-5332-9890</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISBN: 9798400712487 |
ispartof | IEEE/ACM International Conference on Automated Software Engineering : [proceedings], 2024, p.643-655 |
issn | 2643-1572 |
language | eng |
recordid | cdi_ieee_primary_10765056 |
source | IEEE Xplore All Conference Series |
subjects | Computing methodologies -- Artificial intelligence -- Knowledge representation and reasoning Feature extraction Glitch token Large language models LLM analysis LLM security Maintenance engineering Prevention and mitigation Principal component analysis Reliability Software engineering Support vector machines Systematics Vocabulary |
title | GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T04%3A28%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=GlitchProber:%20Advancing%20Effective%20Detection%20and%20Mitigation%20of%20Glitch%20Tokens%20in%20Large%20Language%20Models&rft.btitle=IEEE/ACM%20International%20Conference%20on%20Automated%20Software%20Engineering%20:%20%5Bproceedings%5D&rft.au=Zhang,%20Zhibo&rft.date=2024-10-27&rft.spage=643&rft.epage=655&rft.pages=643-655&rft.eissn=2643-1572&rft.isbn=9798400712487&rft.coden=IEEPAD&rft_id=info:doi/10.1145/3691620.3695060&rft.eisbn=9798400712487&rft_dat=%3Cacm_CHZPO%3Eacm_books_10_1145_3691620_3695060%3C/acm_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a936-5a6b6736c4de99305e9250bb0da146db3d2b2ba9cb4fcca67d0ac4f3a7b0e0b23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10765056&rfr_iscdi=true |