Loading…

Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts

Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There ar...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kazi, Nazmul, Kahanda, Indika, Rupassara, S. Indu, Kindt, John W.
Format:	Conference Proceeding
Language:	English
Subjects:	Analytical models Data analysis Data Extraction Data models Large Language Models Machine learning Open-Ended Survey Interviews Semantics Sentiment analysis Surveys Text Data Analysis Efficiency Text Mining Zero-shot
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.
ISSN:	1946-0759
DOI:	10.1109/ICMLA58977.2023.00138