Loading…
Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts
Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There ar...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 937 |
container_issue | |
container_start_page | 932 |
container_title | |
container_volume | |
creator | Kazi, Nazmul Kahanda, Indika Rupassara, S. Indu Kindt, John W. |
description | Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data. |
doi_str_mv | 10.1109/ICMLA58977.2023.00138 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10459999</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10459999</ieee_id><sourcerecordid>10459999</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3</originalsourceid><addsrcrecordid>eNotjN9OwjAchauJiQR5A036AsV2Xf9dkgV0yQgX4o03pFt_gxrWka6Avr2Lei7O-S6-HISeGJ0zRs1zWayrhdBGqXlGMz6nlHF9g2ZGGc0F5bngubxFE2ZySagS5h7NhuGTjh410nAzQZcPiD15O_QJl6HtY2eT7wNefqVom1-8-nTARd915-DTN1n5AGR7DuBwZeMexg77sx1h3Ts4DngV-w5vThDIMrjRKkOCePFwxdtow9BEf0rDA7pr7XGA2f9O0ftquS1eSbV5KYtFRTxjJhHuRAtZa6SymXZ13TKhaibHAZlRabnkhupM6tpq1jibydw2zikpmWiZBD5Fj3-_HgB2p-g7G793jObCjOE_qkRfSg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><source>IEEE Xplore All Conference Series</source><creator>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</creator><creatorcontrib>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</creatorcontrib><description>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</description><identifier>EISSN: 1946-0759</identifier><identifier>EISBN: 9798350345346</identifier><identifier>DOI: 10.1109/ICMLA58977.2023.00138</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Analytical models ; Data analysis ; Data Extraction ; Data models ; Large Language Models ; Machine learning ; Open-Ended Survey Interviews ; Semantics ; Sentiment analysis ; Surveys ; Text Data Analysis Efficiency ; Text Mining ; Zero-shot</subject><ispartof>2023 International Conference on Machine Learning and Applications (ICMLA), 2023, p.932-937</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0003-3610-455X ; 0000-0002-5090-9664 ; 0000-0002-4536-6917</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10459999$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10459999$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kazi, Nazmul</creatorcontrib><creatorcontrib>Kahanda, Indika</creatorcontrib><creatorcontrib>Rupassara, S. Indu</creatorcontrib><creatorcontrib>Kindt, John W.</creatorcontrib><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><title>2023 International Conference on Machine Learning and Applications (ICMLA)</title><addtitle>ICMLA</addtitle><description>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</description><subject>Analytical models</subject><subject>Data analysis</subject><subject>Data Extraction</subject><subject>Data models</subject><subject>Large Language Models</subject><subject>Machine learning</subject><subject>Open-Ended Survey Interviews</subject><subject>Semantics</subject><subject>Sentiment analysis</subject><subject>Surveys</subject><subject>Text Data Analysis Efficiency</subject><subject>Text Mining</subject><subject>Zero-shot</subject><issn>1946-0759</issn><isbn>9798350345346</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjN9OwjAchauJiQR5A036AsV2Xf9dkgV0yQgX4o03pFt_gxrWka6Avr2Lei7O-S6-HISeGJ0zRs1zWayrhdBGqXlGMz6nlHF9g2ZGGc0F5bngubxFE2ZySagS5h7NhuGTjh410nAzQZcPiD15O_QJl6HtY2eT7wNefqVom1-8-nTARd915-DTN1n5AGR7DuBwZeMexg77sx1h3Ts4DngV-w5vThDIMrjRKkOCePFwxdtow9BEf0rDA7pr7XGA2f9O0ftquS1eSbV5KYtFRTxjJhHuRAtZa6SymXZ13TKhaibHAZlRabnkhupM6tpq1jibydw2zikpmWiZBD5Fj3-_HgB2p-g7G793jObCjOE_qkRfSg</recordid><startdate>20231215</startdate><enddate>20231215</enddate><creator>Kazi, Nazmul</creator><creator>Kahanda, Indika</creator><creator>Rupassara, S. Indu</creator><creator>Kindt, John W.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0003-3610-455X</orcidid><orcidid>https://orcid.org/0000-0002-5090-9664</orcidid><orcidid>https://orcid.org/0000-0002-4536-6917</orcidid></search><sort><creationdate>20231215</creationdate><title>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</title><author>Kazi, Nazmul ; Kahanda, Indika ; Rupassara, S. Indu ; Kindt, John W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Analytical models</topic><topic>Data analysis</topic><topic>Data Extraction</topic><topic>Data models</topic><topic>Large Language Models</topic><topic>Machine learning</topic><topic>Open-Ended Survey Interviews</topic><topic>Semantics</topic><topic>Sentiment analysis</topic><topic>Surveys</topic><topic>Text Data Analysis Efficiency</topic><topic>Text Mining</topic><topic>Zero-shot</topic><toplevel>online_resources</toplevel><creatorcontrib>Kazi, Nazmul</creatorcontrib><creatorcontrib>Kahanda, Indika</creatorcontrib><creatorcontrib>Rupassara, S. Indu</creatorcontrib><creatorcontrib>Kindt, John W.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kazi, Nazmul</au><au>Kahanda, Indika</au><au>Rupassara, S. Indu</au><au>Kindt, John W.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts</atitle><btitle>2023 International Conference on Machine Learning and Applications (ICMLA)</btitle><stitle>ICMLA</stitle><date>2023-12-15</date><risdate>2023</risdate><spage>932</spage><epage>937</epage><pages>932-937</pages><eissn>1946-0759</eissn><eisbn>9798350345346</eisbn><coden>IEEPAD</coden><abstract>Machine learning holds significant promise for automating and optimizing text data analysis. However, resource-intensive tasks like data annotation, model training, and parameter tuning often limit its practicality for one-time data extraction, medium-sized datasets, or short-term projects. There are many community-fine-tuned large language models (CLLMs) that are fine-tuned on task-specific datasets and can demonstrate impressive performance on unseen data without further fine-tuning. Adopting a hybrid approach of leveraging CLLMs for rapid text data extraction and subsequently hand-curating the inaccurate outputs can yield high-quality results, workload balance, and improved efficiency. This project applies CLLMs to three tasks involving the analysis of open-ended survey responses: semantic text matching, exact answer extraction, and sentiment analysis. We present our overall process and discuss several seemingly simple yet effective techniques that we employ to improve model performance without fine-tuning the CLLMs on our own data. Our results demonstrate high precision in semantic text matching (0.92) and exact answer extraction (0.90), while the sentiment analysis model shows room for improvement (precision: 0.65, recall: 0.94, F1: 0.77). This study showcases the potential of CLLMs in open-ended survey text data analysis, particularly in scenarios with limited resources and scarce labeled data.</abstract><pub>IEEE</pub><doi>10.1109/ICMLA58977.2023.00138</doi><tpages>6</tpages><orcidid>https://orcid.org/0000-0003-3610-455X</orcidid><orcidid>https://orcid.org/0000-0002-5090-9664</orcidid><orcidid>https://orcid.org/0000-0002-4536-6917</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 1946-0759 |
ispartof | 2023 International Conference on Machine Learning and Applications (ICMLA), 2023, p.932-937 |
issn | 1946-0759 |
language | eng |
recordid | cdi_ieee_primary_10459999 |
source | IEEE Xplore All Conference Series |
subjects | Analytical models Data analysis Data Extraction Data models Large Language Models Machine learning Open-Ended Survey Interviews Semantics Sentiment analysis Surveys Text Data Analysis Efficiency Text Mining Zero-shot |
title | Zero-Shot Information Extraction with Community-Fine-Tuned Large Language Models From Open-Ended Interview Transcripts |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T01%3A18%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Zero-Shot%20Information%20Extraction%20with%20Community-Fine-Tuned%20Large%20Language%20Models%20From%20Open-Ended%20Interview%20Transcripts&rft.btitle=2023%20International%20Conference%20on%20Machine%20Learning%20and%20Applications%20(ICMLA)&rft.au=Kazi,%20Nazmul&rft.date=2023-12-15&rft.spage=932&rft.epage=937&rft.pages=932-937&rft.eissn=1946-0759&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICMLA58977.2023.00138&rft.eisbn=9798350345346&rft_dat=%3Cieee_CHZPO%3E10459999%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-3d5fe2f967a28dbbf157b16f15e6206a363908268ba81cda264acdd76615f16e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10459999&rfr_iscdi=true |