Loading…
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more div...
Saved in:
Published in: | IEEE transactions on software engineering 2024-07, Vol.50 (7), p.1921-1948 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c175t-2d9ca8e9b8ccf7f4c23bf00ffb1170e8f5be6c0e0deea67d15c166b7ac2927413 |
container_end_page | 1948 |
container_issue | 7 |
container_start_page | 1921 |
container_title | IEEE transactions on software engineering |
container_volume | 50 |
creator | Song, Da Xie, Xuan Song, Jiayang Zhu, Derui Huang, Yuheng Juefei-Xu, Felix Ma, Lei |
description | Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs, e.g., robustness and hallucination, have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large neural network scale, and autoregressive generation usage contexts, differ from classic AI software based on Convolutional Neural Networks and Recurrent Neural Networks and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand across diverse domains. Towards bridging such a gap, in this paper, we initiate an early exploratory study and propose a universal analysis framework for LLMs, named LUNA , which is designed to be general and extensible and enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset and proxy, which is empowered by various abstract model construction methods built-in LUNA . To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both the abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes, e.g., abnormal behavior detection. To better understand the potential usefulness of our analysis framework LUNA , we conduct a large-scale evaluation, the results of which demonstrate that 1) the abstract model has the potential to distinguish normal and abnormal behavior in LLM, 2) LUNA is effective for the real-world analysis of LLMs in practice, and the hyperparameter se |
doi_str_mv | 10.1109/TSE.2024.3411928 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3081870448</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10562221</ieee_id><sourcerecordid>3081870448</sourcerecordid><originalsourceid>FETCH-LOGICAL-c175t-2d9ca8e9b8ccf7f4c23bf00ffb1170e8f5be6c0e0deea67d15c166b7ac2927413</originalsourceid><addsrcrecordid>eNpNkD1PwzAQhi0EEqWwMzBEYk45O3Fss4Wq5UMBBtrZcpxzlZI2xW5A_fektAPLvTc87-n0EHJNYUQpqLvZx2TEgKWjJKVUMXlCBlQlKk44g1MyAFAy5lyqc3IRwhIAuBB8QF6K-Vt-H-XRa1thEz-YgFU0X9ff6INponxtml2oQzT1ZoU_rf-MXOujwvgF9nO96Ey__HXDJTlzpgl4dcwhmU8ns_FTXLw_Po_zIrZU8G3MKmWNRFVKa51wqWVJ6QCcKykVgNLxEjMLCBWiyURFuaVZVgpjmWIipcmQ3B7ubnz71WHY6mXb-f7RoBOQVApIU9lTcKCsb0Pw6PTG1yvjd5qC3hvTvTG9N6aPxvrKzaFSI-I_nGeMMZr8AgHQZng</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3081870448</pqid></control><display><type>article</type><title>LUNA: A Model-Based Universal Analysis Framework for Large Language Models</title><source>IEEE Xplore (Online service)</source><creator>Song, Da ; Xie, Xuan ; Song, Jiayang ; Zhu, Derui ; Huang, Yuheng ; Juefei-Xu, Felix ; Ma, Lei</creator><creatorcontrib>Song, Da ; Xie, Xuan ; Song, Jiayang ; Zhu, Derui ; Huang, Yuheng ; Juefei-Xu, Felix ; Ma, Lei</creatorcontrib><description>Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs, e.g., robustness and hallucination, have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large neural network scale, and autoregressive generation usage contexts, differ from classic AI software based on Convolutional Neural Networks and Recurrent Neural Networks and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand across diverse domains. Towards bridging such a gap, in this paper, we initiate an early exploratory study and propose a universal analysis framework for LLMs, named LUNA , which is designed to be general and extensible and enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset and proxy, which is empowered by various abstract model construction methods built-in LUNA . To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both the abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes, e.g., abnormal behavior detection. To better understand the potential usefulness of our analysis framework LUNA , we conduct a large-scale evaluation, the results of which demonstrate that 1) the abstract model has the potential to distinguish normal and abnormal behavior in LLM, 2) LUNA is effective for the real-world analysis of LLMs in practice, and the hyperparameter settings influence the performance, 3) different evaluation metrics are in different correlations with the analysis performance. In order to encourage further studies in the quality assurance of LLMs, we made all of the code and more detailed experimental results data available on the supplementary website of this paper https://sites.google.com/view/llm-luna .</description><identifier>ISSN: 0098-5589</identifier><identifier>EISSN: 1939-3520</identifier><identifier>DOI: 10.1109/TSE.2024.3411928</identifier><identifier>CODEN: IESEDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Analytical models ; Artificial intelligence ; Artificial neural networks ; Codes ; Data analysis ; deep neural networks ; Demand analysis ; Hidden Markov models ; Large language models ; Measurement ; model-based analysis ; Natural language processing ; Neural networks ; Performance evaluation ; Quality assurance ; Recurrent neural networks ; Semantics ; Software ; Software engineering ; Task analysis ; Transformers ; Trustworthiness</subject><ispartof>IEEE transactions on software engineering, 2024-07, Vol.50 (7), p.1921-1948</ispartof><rights>Copyright IEEE Computer Society 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c175t-2d9ca8e9b8ccf7f4c23bf00ffb1170e8f5be6c0e0deea67d15c166b7ac2927413</cites><orcidid>0000-0002-9552-0097 ; 0009-0008-7093-9781 ; 0000-0003-3981-8515 ; 0000-0003-3666-4020 ; 0000-0001-9267-4229 ; 0000-0002-8621-2420 ; 0000-0002-0857-8611</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10562221$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Song, Da</creatorcontrib><creatorcontrib>Xie, Xuan</creatorcontrib><creatorcontrib>Song, Jiayang</creatorcontrib><creatorcontrib>Zhu, Derui</creatorcontrib><creatorcontrib>Huang, Yuheng</creatorcontrib><creatorcontrib>Juefei-Xu, Felix</creatorcontrib><creatorcontrib>Ma, Lei</creatorcontrib><title>LUNA: A Model-Based Universal Analysis Framework for Large Language Models</title><title>IEEE transactions on software engineering</title><addtitle>TSE</addtitle><description>Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs, e.g., robustness and hallucination, have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large neural network scale, and autoregressive generation usage contexts, differ from classic AI software based on Convolutional Neural Networks and Recurrent Neural Networks and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand across diverse domains. Towards bridging such a gap, in this paper, we initiate an early exploratory study and propose a universal analysis framework for LLMs, named LUNA , which is designed to be general and extensible and enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset and proxy, which is empowered by various abstract model construction methods built-in LUNA . To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both the abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes, e.g., abnormal behavior detection. To better understand the potential usefulness of our analysis framework LUNA , we conduct a large-scale evaluation, the results of which demonstrate that 1) the abstract model has the potential to distinguish normal and abnormal behavior in LLM, 2) LUNA is effective for the real-world analysis of LLMs in practice, and the hyperparameter settings influence the performance, 3) different evaluation metrics are in different correlations with the analysis performance. In order to encourage further studies in the quality assurance of LLMs, we made all of the code and more detailed experimental results data available on the supplementary website of this paper https://sites.google.com/view/llm-luna .</description><subject>Analytical models</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Codes</subject><subject>Data analysis</subject><subject>deep neural networks</subject><subject>Demand analysis</subject><subject>Hidden Markov models</subject><subject>Large language models</subject><subject>Measurement</subject><subject>model-based analysis</subject><subject>Natural language processing</subject><subject>Neural networks</subject><subject>Performance evaluation</subject><subject>Quality assurance</subject><subject>Recurrent neural networks</subject><subject>Semantics</subject><subject>Software</subject><subject>Software engineering</subject><subject>Task analysis</subject><subject>Transformers</subject><subject>Trustworthiness</subject><issn>0098-5589</issn><issn>1939-3520</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkD1PwzAQhi0EEqWwMzBEYk45O3Fss4Wq5UMBBtrZcpxzlZI2xW5A_fektAPLvTc87-n0EHJNYUQpqLvZx2TEgKWjJKVUMXlCBlQlKk44g1MyAFAy5lyqc3IRwhIAuBB8QF6K-Vt-H-XRa1thEz-YgFU0X9ff6INponxtml2oQzT1ZoU_rf-MXOujwvgF9nO96Ey__HXDJTlzpgl4dcwhmU8ns_FTXLw_Po_zIrZU8G3MKmWNRFVKa51wqWVJ6QCcKykVgNLxEjMLCBWiyURFuaVZVgpjmWIipcmQ3B7ubnz71WHY6mXb-f7RoBOQVApIU9lTcKCsb0Pw6PTG1yvjd5qC3hvTvTG9N6aPxvrKzaFSI-I_nGeMMZr8AgHQZng</recordid><startdate>20240701</startdate><enddate>20240701</enddate><creator>Song, Da</creator><creator>Xie, Xuan</creator><creator>Song, Jiayang</creator><creator>Zhu, Derui</creator><creator>Huang, Yuheng</creator><creator>Juefei-Xu, Felix</creator><creator>Ma, Lei</creator><general>IEEE</general><general>IEEE Computer Society</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0002-9552-0097</orcidid><orcidid>https://orcid.org/0009-0008-7093-9781</orcidid><orcidid>https://orcid.org/0000-0003-3981-8515</orcidid><orcidid>https://orcid.org/0000-0003-3666-4020</orcidid><orcidid>https://orcid.org/0000-0001-9267-4229</orcidid><orcidid>https://orcid.org/0000-0002-8621-2420</orcidid><orcidid>https://orcid.org/0000-0002-0857-8611</orcidid></search><sort><creationdate>20240701</creationdate><title>LUNA: A Model-Based Universal Analysis Framework for Large Language Models</title><author>Song, Da ; Xie, Xuan ; Song, Jiayang ; Zhu, Derui ; Huang, Yuheng ; Juefei-Xu, Felix ; Ma, Lei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c175t-2d9ca8e9b8ccf7f4c23bf00ffb1170e8f5be6c0e0deea67d15c166b7ac2927413</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Analytical models</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Codes</topic><topic>Data analysis</topic><topic>deep neural networks</topic><topic>Demand analysis</topic><topic>Hidden Markov models</topic><topic>Large language models</topic><topic>Measurement</topic><topic>model-based analysis</topic><topic>Natural language processing</topic><topic>Neural networks</topic><topic>Performance evaluation</topic><topic>Quality assurance</topic><topic>Recurrent neural networks</topic><topic>Semantics</topic><topic>Software</topic><topic>Software engineering</topic><topic>Task analysis</topic><topic>Transformers</topic><topic>Trustworthiness</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Song, Da</creatorcontrib><creatorcontrib>Xie, Xuan</creatorcontrib><creatorcontrib>Song, Jiayang</creatorcontrib><creatorcontrib>Zhu, Derui</creatorcontrib><creatorcontrib>Huang, Yuheng</creatorcontrib><creatorcontrib>Juefei-Xu, Felix</creatorcontrib><creatorcontrib>Ma, Lei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><jtitle>IEEE transactions on software engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Song, Da</au><au>Xie, Xuan</au><au>Song, Jiayang</au><au>Zhu, Derui</au><au>Huang, Yuheng</au><au>Juefei-Xu, Felix</au><au>Ma, Lei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LUNA: A Model-Based Universal Analysis Framework for Large Language Models</atitle><jtitle>IEEE transactions on software engineering</jtitle><stitle>TSE</stitle><date>2024-07-01</date><risdate>2024</risdate><volume>50</volume><issue>7</issue><spage>1921</spage><epage>1948</epage><pages>1921-1948</pages><issn>0098-5589</issn><eissn>1939-3520</eissn><coden>IESEDJ</coden><abstract>Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs, e.g., robustness and hallucination, have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large neural network scale, and autoregressive generation usage contexts, differ from classic AI software based on Convolutional Neural Networks and Recurrent Neural Networks and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand across diverse domains. Towards bridging such a gap, in this paper, we initiate an early exploratory study and propose a universal analysis framework for LLMs, named LUNA , which is designed to be general and extensible and enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset and proxy, which is empowered by various abstract model construction methods built-in LUNA . To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both the abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes, e.g., abnormal behavior detection. To better understand the potential usefulness of our analysis framework LUNA , we conduct a large-scale evaluation, the results of which demonstrate that 1) the abstract model has the potential to distinguish normal and abnormal behavior in LLM, 2) LUNA is effective for the real-world analysis of LLMs in practice, and the hyperparameter settings influence the performance, 3) different evaluation metrics are in different correlations with the analysis performance. In order to encourage further studies in the quality assurance of LLMs, we made all of the code and more detailed experimental results data available on the supplementary website of this paper https://sites.google.com/view/llm-luna .</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TSE.2024.3411928</doi><tpages>28</tpages><orcidid>https://orcid.org/0000-0002-9552-0097</orcidid><orcidid>https://orcid.org/0009-0008-7093-9781</orcidid><orcidid>https://orcid.org/0000-0003-3981-8515</orcidid><orcidid>https://orcid.org/0000-0003-3666-4020</orcidid><orcidid>https://orcid.org/0000-0001-9267-4229</orcidid><orcidid>https://orcid.org/0000-0002-8621-2420</orcidid><orcidid>https://orcid.org/0000-0002-0857-8611</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0098-5589 |
ispartof | IEEE transactions on software engineering, 2024-07, Vol.50 (7), p.1921-1948 |
issn | 0098-5589 1939-3520 |
language | eng |
recordid | cdi_proquest_journals_3081870448 |
source | IEEE Xplore (Online service) |
subjects | Analytical models Artificial intelligence Artificial neural networks Codes Data analysis deep neural networks Demand analysis Hidden Markov models Large language models Measurement model-based analysis Natural language processing Neural networks Performance evaluation Quality assurance Recurrent neural networks Semantics Software Software engineering Task analysis Transformers Trustworthiness |
title | LUNA: A Model-Based Universal Analysis Framework for Large Language Models |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T10%3A55%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LUNA:%20A%20Model-Based%20Universal%20Analysis%20Framework%20for%20Large%20Language%20Models&rft.jtitle=IEEE%20transactions%20on%20software%20engineering&rft.au=Song,%20Da&rft.date=2024-07-01&rft.volume=50&rft.issue=7&rft.spage=1921&rft.epage=1948&rft.pages=1921-1948&rft.issn=0098-5589&rft.eissn=1939-3520&rft.coden=IESEDJ&rft_id=info:doi/10.1109/TSE.2024.3411928&rft_dat=%3Cproquest_cross%3E3081870448%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c175t-2d9ca8e9b8ccf7f4c23bf00ffb1170e8f5be6c0e0deea67d15c166b7ac2927413%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3081870448&rft_id=info:pmid/&rft_ieee_id=10562221&rfr_iscdi=true |