Loading…

Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts

Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Kazi, Nazmul, Kahanda, Indika, Rupassara, S. Indu, Kindt, John W.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 937
container_issue
container_start_page 932
container_title
container_volume
creator Kazi, Nazmul
Kahanda, Indika
Rupassara, S. Indu
Kindt, John W.
description Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.
doi_str_mv 10.1109/ICMLA58977.2023.00138
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10459999</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10459999</ieee_id><sourcerecordid>10459999</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3</originalsourceid><addsrcrecordid>eNotjN9OwjAchauJiQR5A036AsV2Xf9dkgV0yQgX4o03pFt_gxrWka6Avr2Lei7O-S6-HISeGJ0zRs1zWayrhdBGqXlGMz6nlHF9g2ZGGc0F5bngubxFE2ZySagS5h7NhuGTjh410nAzQZcPiD15O_QJl6HtY2eT7wNefqVom1-8-nTARd915-DTN1n5AGR7DuBwZeMexg77sx1h3Ts4DngV-w5vThDIMrjRKkOCePFwxdtow9BEf0rDA7pr7XGA2f9O0ftquS1eSbV5KYtFRTxjJhHuRAtZa6SymXZ13TKhaibHAZlRabnkhupM6tpq1jibydw2zikpmWiZBD5Fj3-_HgB2p-g7G793jObCjOE_qkRfSg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><source>IEEE Xplore All Conference Series</source><creator>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</creator><creatorcontrib>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</creatorcontrib><description>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</description><identifier>EISSN: 1946-0759</identifier><identifier>EISBN: 9798350345346</identifier><identifier>DOI: 10.1109/ICMLA58977.2023.00138</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Analytical models ; Data analysis ; Data Extraction ; Data models ; Large Language Models ; Machine learning ; Open-Ended Survey Interviews ; Semantics ; Sentiment analysis ; Surveys ; Text Data Analysis Efficiency ; Text Mining ; Zero-shot</subject><ispartof>2023 International Conference on Machine Learning and Applications (ICMLA), 2023, p.932-937</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3610-455X ; 0000-0002-5090-9664 ; 0000-0002-4536-6917</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10459999$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10459999$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kazi, Nazmul</creatorcontrib><creatorcontrib>Kahanda, Indika</creatorcontrib><creatorcontrib>Rupassara, S. Indu</creatorcontrib><creatorcontrib>Kindt, John W.</creatorcontrib><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><title>2023 International Conference on Machine Learning and Applications (ICMLA)</title><addtitle>ICMLA</addtitle><description>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</description><subject>Analytical models</subject><subject>Data analysis</subject><subject>Data Extraction</subject><subject>Data models</subject><subject>Large Language Models</subject><subject>Machine learning</subject><subject>Open-Ended Survey Interviews</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Surveys</subject><subject>Text Data Analysis Efficiency</subject><subject>Text Mining</subject><subject>Zero-shot</subject><issn>1946-0759</issn><isbn>9798350345346</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjN9OwjAchauJiQR5A036AsV2Xf9dkgV0yQgX4o03pFt_gxrWka6Avr2Lei7O-S6-HISeGJ0zRs1zWayrhdBGqXlGMz6nlHF9g2ZGGc0F5bngubxFE2ZySagS5h7NhuGTjh410nAzQZcPiD15O_QJl6HtY2eT7wNefqVom1-8-nTARd915-DTN1n5AGR7DuBwZeMexg77sx1h3Ts4DngV-w5vThDIMrjRKkOCePFwxdtow9BEf0rDA7pr7XGA2f9O0ftquS1eSbV5KYtFRTxjJhHuRAtZa6SymXZ13TKhaibHAZlRabnkhupM6tpq1jibydw2zikpmWiZBD5Fj3-_HgB2p-g7G793jObCjOE_qkRfSg</recordid><startdate>20231215</startdate><enddate>20231215</enddate><creator>Kazi, Nazmul</creator><creator>Kahanda, Indika</creator><creator>Rupassara, S. Indu</creator><creator>Kindt, John W.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0003-3610-455X</orcidid><orcidid>https://orcid.org/0000-0002-5090-9664</orcidid><orcidid>https://orcid.org/0000-0002-4536-6917</orcidid></search><sort><creationdate>20231215</creationdate><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><author>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analytical models</topic><topic>Data analysis</topic><topic>Data Extraction</topic><topic>Data models</topic><topic>Large Language Models</topic><topic>Machine learning</topic><topic>Open-Ended Survey Interviews</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Surveys</topic><topic>Text Data Analysis Efficiency</topic><topic>Text Mining</topic><topic>Zero-shot</topic><toplevel>online_resources</toplevel><creatorcontrib>Kazi, Nazmul</creatorcontrib><creatorcontrib>Kahanda, Indika</creatorcontrib><creatorcontrib>Rupassara, S. Indu</creatorcontrib><creatorcontrib>Kindt, John W.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kazi, Nazmul</au><au>Kahanda, Indika</au><au>Rupassara, S. Indu</au><au>Kindt, John W.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</atitle><btitle>2023 International Conference on Machine Learning and Applications (ICMLA)</btitle><stitle>ICMLA</stitle><date>2023-12-15</date><risdate>2023</risdate><spage>932</spage><epage>937</epage><pages>932-937</pages><eissn>1946-0759</eissn><eisbn>9798350345346</eisbn><coden>IEEPAD</coden><abstract>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</abstract><pub>IEEE</pub><doi>10.1109/ICMLA58977.2023.00138</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0003-3610-455X</orcidid><orcidid>https://orcid.org/0000-0002-5090-9664</orcidid><orcidid>https://orcid.org/0000-0002-4536-6917</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 1946-0759
ispartof 2023 International Conference on Machine Learning and Applications (ICMLA), 2023, p.932-937
issn 1946-0759
language eng
recordid cdi_ieee_primary_10459999
source IEEE Xplore All Conference Series
subjects Analytical models
Data analysis
Data Extraction
Data models
Large Language Models
Machine learning
Open-Ended Survey Interviews
Semantics
Sentiment analysis
Surveys
Text Data Analysis Efficiency
Text Mining
Zero-shot
title Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T01%3A18%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Zero-Shot%20Information%20Extraction%20with%20Community-Fine-Tuned%20Large%20Language%20Models%20From%20Open-Ended%20Interview%20Transcripts&rft.btitle=2023%20International%20Conference%20on%20Machine%20Learning%20and%20Applications%20(ICMLA)&rft.au=Kazi,%20Nazmul&rft.date=2023-12-15&rft.spage=932&rft.epage=937&rft.pages=932-937&rft.eissn=1946-0759&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICMLA58977.2023.00138&rft.eisbn=9798350345346&rft_dat=%3Cieee_CHZPO%3E10459999%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10459999&rfr_iscdi=true