Loading…

LUNA: A Model-Based Universal Analysis Framework for Large Language Models

Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more div...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on software engineering 2024-07, Vol.50 (7), p.1921-1948
Main Authors:	Song, Da, Xie, Xuan, Song, Jiayang, Zhu, Derui, Huang, Yuheng, Juefei-Xu, Felix, Ma, Lei
Format:	Article
Language:	English
Subjects:	Analytical models Artificial intelligence Artificial neural networks Codes Data analysis deep neural networks Demand analysis Hidden Markov models Large language models Measurement model-based analysis Natural language processing Neural networks Performance evaluation Quality assurance Recurrent neural networks Semantics Software Software engineering Task analysis Transformers Trustworthiness
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Over the past decade, Artificial Intelligence (AI) has had great success recently and is being used in a wide range of academic and industrial fields. More recently, Large Language Models (LLMs) have made rapid advancements that have propelled AI to a new level, enabling and empowering even more diverse applications and industrial domains with intelligence, particularly in areas like software engineering and natural language processing. Nevertheless, a number of emerging trustworthiness concerns and issues exhibited in LLMs, e.g., robustness and hallucination, have already recently received much attention, without properly solving which the widespread adoption of LLMs could be greatly hindered in practice. The distinctive characteristics of LLMs, such as the self-attention mechanism, extremely large neural network scale, and autoregressive generation usage contexts, differ from classic AI software based on Convolutional Neural Networks and Recurrent Neural Networks and present new challenges for quality analysis. Up to the present, it still lacks universal and systematic analysis techniques for LLMs despite the urgent industrial demand across diverse domains. Towards bridging such a gap, in this paper, we initiate an early exploratory study and propose a universal analysis framework for LLMs, named LUNA , which is designed to be general and extensible and enables versatile analysis of LLMs from multiple quality perspectives in a human-interpretable manner. In particular, we first leverage the data from desired trustworthiness perspectives to construct an abstract model as an auxiliary analysis asset and proxy, which is empowered by various abstract model construction methods built-in LUNA . To assess the quality of the abstract model, we collect and define a number of evaluation metrics, aiming at both the abstract model level and the semantics level. Then, the semantics, which is the degree of satisfaction of the LLM w.r.t. the trustworthiness perspective, is bound to and enriches the abstract model with semantics, which enables more detailed analysis applications for diverse purposes, e.g., abnormal behavior detection. To better understand the potential usefulness of our analysis framework LUNA , we conduct a large-scale evaluation, the results of which demonstrate that 1) the abstract model has the potential to distinguish normal and abnormal behavior in LLM, 2) LUNA is effective for the real-world analysis of LLMs in practice, and the hyperparameter se
ISSN:	0098-5589 1939-3520
DOI:	10.1109/TSE.2024.3411928