Loading…
Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts
Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There ar...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data. |
---|---|
ISSN: | 1946-0759 |
DOI: | 10.1109/ICMLA58977.2023.00138 |