Loading…
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance
The ability to communicate uncertainty, risk, and limitation is crucial for the safety of large language models. However, current evaluations of these abilities rely on simple calibration, asking whether the language generated by the model matches appropriate probabilities. Instead, evaluation of th...
Saved in:
Published in: | arXiv.org 2024-10 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The ability to communicate uncertainty, risk, and limitation is crucial for the safety of large language models. However, current evaluations of these abilities rely on simple calibration, asking whether the language generated by the model matches appropriate probabilities. Instead, evaluation of this aspect of LLM communication should focus on the behaviors of their human interlocutors: how much do they rely on what the LLM says? Here we introduce an interaction-centered evaluation framework called Rel-A.I. (pronounced "rely"}) that measures whether humans rely on LLM generations. We use this framework to study how reliance is affected by contextual features of the interaction (e.g, the knowledge domain that is being discussed), or the use of greetings communicating warmth or competence (e.g., "I'm happy to help!"). We find that contextual characteristics significantly affect human reliance behavior. For example, people rely 10% more on LMs when responding to questions involving calculations and rely 30% more on LMs that are perceived as more competent. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions, and illustrate the need to consider features of the interactional context. |
---|---|
ISSN: | 2331-8422 |