Loading…

Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts

Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There ar...

Full description

Saved in:

Bibliographic Details
Main Authors:	Kazi, Nazmul, Kahanda, Indika, Rupassara, S. Indu, Kindt, John W.
Format:	Conference Proceeding
Language:	English
Subjects:	Analytical models Data analysis Data Extraction Data models Large Language Models Machine learning Open-Ended Survey Interviews Semantics Sentiment analysis Surveys Text Data Analysis Efficiency Text Mining Zero-shot
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	937
container_issue
container_start_page	932
container_title
container_volume
creator	Kazi, Nazmul Kahanda, Indika Rupassara, S. Indu Kindt, John W.
description	Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.
doi_str_mv	10.1109/ICMLA58977.2023.00138
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10459999</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10459999</ieee_id><sourcerecordid>10459999</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3</originalsourceid><addsrcrecordid>eNotjN9OwjAchauJiQR5A036AsV2Xf9dkgV0yQgX4o03pFt_gxrWka6Avr2Lei7O-S6-HISeGJ0zRs1zWayrhdBGqXlGMz6nlHF9g2ZGGc0F5bngubxFE2ZySagS5h7NhuGTjh410nAzQZcPiD15O_QJl6HtY2eT7wNefqVom1-8-nTARd915-DTN1n5AGR7DuBwZeMexg77sx1h3Ts4DngV-w5vThDIMrjRKkOCePFwxdtow9BEf0rDA7pr7XGA2f9O0ftquS1eSbV5KYtFRTxjJhHuRAtZa6SymXZ13TKhaibHAZlRabnkhupM6tpq1jibydw2zikpmWiZBD5Fj3-_HgB2p-g7G793jObCjOE_qkRfSg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><source>IEEE Xplore All Conference Series</source><creator>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</creator><creatorcontrib>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</creatorcontrib><description>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</description><identifier>EISSN: 1946-0759</identifier><identifier>EISBN: 9798350345346</identifier><identifier>DOI: 10.1109/ICMLA58977.2023.00138</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Analytical models ; Data analysis ; Data Extraction ; Data models ; Large Language Models ; Machine learning ; Open-Ended Survey Interviews ; Semantics ; Sentiment analysis ; Surveys ; Text Data Analysis Efficiency ; Text Mining ; Zero-shot</subject><ispartof>2023 International Conference on Machine Learning and Applications (ICMLA), 2023, p.932-937</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3610-455X ; 0000-0002-5090-9664 ; 0000-0002-4536-6917</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10459999$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10459999$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kazi, Nazmul</creatorcontrib><creatorcontrib>Kahanda, Indika</creatorcontrib><creatorcontrib>Rupassara, S. Indu</creatorcontrib><creatorcontrib>Kindt, John W.</creatorcontrib><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><title>2023 International Conference on Machine Learning and Applications (ICMLA)</title><addtitle>ICMLA</addtitle><description>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</description><subject>Analytical models</subject><subject>Data analysis</subject><subject>Data Extraction</subject><subject>Data models</subject><subject>Large Language Models</subject><subject>Machine learning</subject><subject>Open-Ended Survey Interviews</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Surveys</subject><subject>Text Data Analysis Efficiency</subject><subject>Text Mining</subject><subject>Zero-shot</subject><issn>1946-0759</issn><isbn>9798350345346</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjN9OwjAchauJiQR5A036AsV2Xf9dkgV0yQgX4o03pFt_gxrWka6Avr2Lei7O-S6-HISeGJ0zRs1zWayrhdBGqXlGMz6nlHF9g2ZGGc0F5bngubxFE2ZySagS5h7NhuGTjh410nAzQZcPiD15O_QJl6HtY2eT7wNefqVom1-8-nTARd915-DTN1n5AGR7DuBwZeMexg77sx1h3Ts4DngV-w5vThDIMrjRKkOCePFwxdtow9BEf0rDA7pr7XGA2f9O0ftquS1eSbV5KYtFRTxjJhHuRAtZa6SymXZ13TKhaibHAZlRabnkhupM6tpq1jibydw2zikpmWiZBD5Fj3-_HgB2p-g7G793jObCjOE_qkRfSg</recordid><startdate>20231215</startdate><enddate>20231215</enddate><creator>Kazi, Nazmul</creator><creator>Kahanda, Indika</creator><creator>Rupassara, S. Indu</creator><creator>Kindt, John W.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0003-3610-455X</orcidid><orcidid>https://orcid.org/0000-0002-5090-9664</orcidid><orcidid>https://orcid.org/0000-0002-4536-6917</orcidid></search><sort><creationdate>20231215</creationdate><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><author>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analytical models</topic><topic>Data analysis</topic><topic>Data Extraction</topic><topic>Data models</topic><topic>Large Language Models</topic><topic>Machine learning</topic><topic>Open-Ended Survey Interviews</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Surveys</topic><topic>Text Data Analysis Efficiency</topic><topic>Text Mining</topic><topic>Zero-shot</topic><toplevel>online_resources</toplevel><creatorcontrib>Kazi, Nazmul</creatorcontrib><creatorcontrib>Kahanda, Indika</creatorcontrib><creatorcontrib>Rupassara, S. Indu</creatorcontrib><creatorcontrib>Kindt, John W.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kazi, Nazmul</au><au>Kahanda, Indika</au><au>Rupassara, S. Indu</au><au>Kindt, John W.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</atitle><btitle>2023 International Conference on Machine Learning and Applications (ICMLA)</btitle><stitle>ICMLA</stitle><date>2023-12-15</date><risdate>2023</risdate><spage>932</spage><epage>937</epage><pages>932-937</pages><eissn>1946-0759</eissn><eisbn>9798350345346</eisbn><coden>IEEPAD</coden><abstract>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</abstract><pub>IEEE</pub><doi>10.1109/ICMLA58977.2023.00138</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0003-3610-455X</orcidid><orcidid>https://orcid.org/0000-0002-5090-9664</orcidid><orcidid>https://orcid.org/0000-0002-4536-6917</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 1946-0759
ispartof	2023 International Conference on Machine Learning and Applications (ICMLA), 2023, p.932-937
issn	1946-0759
language	eng
recordid	cdi_ieee_primary_10459999
source	IEEE Xplore All Conference Series
subjects	Analytical models Data analysis Data Extraction Data models Large Language Models Machine learning Open-Ended Survey Interviews Semantics Sentiment analysis Surveys Text Data Analysis Efficiency Text Mining Zero-shot
title	Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T01%3A18%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Zero-Shot%20Information%20Extraction%20with%20Community-Fine-Tuned%20Large%20Language%20Models%20From%20Open-Ended%20Interview%20Transcripts&rft.btitle=2023%20International%20Conference%20on%20Machine%20Learning%20and%20Applications%20(ICMLA)&rft.au=Kazi,%20Nazmul&rft.date=2023-12-15&rft.spage=932&rft.epage=937&rft.pages=932-937&rft.eissn=1946-0759&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICMLA58977.2023.00138&rft.eisbn=9798350345346&rft_dat=%3Cieee_CHZPO%3E10459999%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10459999&rfr_iscdi=true