Loading…

Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation

Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relev...

Full description

Saved in:
Bibliographic Details
Published in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-08, Vol.53 (16), p.19610-19628
Main Authors: García-Méndez, Silvia, de Arriba-Pérez, Francisco, Barros-Vila, Ana, González-Castaño, Francisco J., Costa-Montenegro, Enrique
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3
cites cdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3
container_end_page 19628
container_issue 16
container_start_page 19610
container_title Applied intelligence (Dordrecht, Netherlands)
container_volume 53
creator García-Méndez, Silvia
de Arriba-Pérez, Francisco
Barros-Vila, Ana
González-Castaño, Francisco J.
Costa-Montenegro, Enrique
description Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing ( nlp ) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation ( lda ) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by nlp researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and rouge-l results endorse its potential as a valuable tool for busy investors. The rouge-l values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with lda to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.
doi_str_mv 10.1007/s10489-023-04452-4
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2858086071</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2858086071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</originalsourceid><addsrcrecordid>eNp9kM1KAzEUhYMoWKsv4Crg1tH8zWRmWeovFNwouAtpJumkTJMxyVh8DN_YtBXcubjcxf3OOdwDwCVGNxghfhsxYnVTIEILxFhJCnYEJrjktOCs4cdgghrCiqpq3k_BWYxrhBClCE_A92xMfiOTVbDVSatkvYPewKB7_SldgtYZH3aAd9dwCLq1eyZC6VqYT1rJmGLGoLFOOmVlD53eRpi64MdVB5MfsvnGt7rvrVvBrU0dXMiks_mdDVZ1vU5w1vde7VPOwYmRfdQXv3sK3h7uX-dPxeLl8Xk-WxSKVjQVkjUqPypNXcqaEd6wJcd5moqUWFZ8SQxWnDd1TbEmLZaGyopJyhHC3DBNp-Dq4DsE_zHqmMTaj8HlSEHqskZ1hTjOFDlQKvgYgzZiCHYjw5fASOyqF4fqRa5e7KsXLIvoQRQz7FY6_Fn_o_oBjdmJfA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2858086071</pqid></control><display><type>article</type><title>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</title><source>ABI/INFORM Global</source><source>Springer Nature</source><creator>García-Méndez, Silvia ; de Arriba-Pérez, Francisco ; Barros-Vila, Ana ; González-Castaño, Francisco J. ; Costa-Montenegro, Enrique</creator><creatorcontrib>García-Méndez, Silvia ; de Arriba-Pérez, Francisco ; Barros-Vila, Ana ; González-Castaño, Francisco J. ; Costa-Montenegro, Enrique</creatorcontrib><description>Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing ( nlp ) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation ( lda ) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by nlp researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and rouge-l results endorse its potential as a valuable tool for busy investors. The rouge-l values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with lda to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-023-04452-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Computer Science ; Dirichlet problem ; Expert systems ; Financing ; Investment strategy ; Investments ; Machine learning ; Machines ; Manufacturing ; Mathematical models ; Mechanical Engineering ; Modelling ; Natural language processing ; News ; Processes ; Segments ; Unstructured data</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2023-08, Vol.53 (16), p.19610-19628</ispartof><rights>The Author(s) 2023</rights><rights>The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</citedby><cites>FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</cites><orcidid>0000-0003-0533-1303</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2858086071/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2858086071?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>García-Méndez, Silvia</creatorcontrib><creatorcontrib>de Arriba-Pérez, Francisco</creatorcontrib><creatorcontrib>Barros-Vila, Ana</creatorcontrib><creatorcontrib>González-Castaño, Francisco J.</creatorcontrib><creatorcontrib>Costa-Montenegro, Enrique</creatorcontrib><title>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing ( nlp ) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation ( lda ) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by nlp researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and rouge-l results endorse its potential as a valuable tool for busy investors. The rouge-l values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with lda to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.</description><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Dirichlet problem</subject><subject>Expert systems</subject><subject>Financing</subject><subject>Investment strategy</subject><subject>Investments</subject><subject>Machine learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mathematical models</subject><subject>Mechanical Engineering</subject><subject>Modelling</subject><subject>Natural language processing</subject><subject>News</subject><subject>Processes</subject><subject>Segments</subject><subject>Unstructured data</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kM1KAzEUhYMoWKsv4Crg1tH8zWRmWeovFNwouAtpJumkTJMxyVh8DN_YtBXcubjcxf3OOdwDwCVGNxghfhsxYnVTIEILxFhJCnYEJrjktOCs4cdgghrCiqpq3k_BWYxrhBClCE_A92xMfiOTVbDVSatkvYPewKB7_SldgtYZH3aAd9dwCLq1eyZC6VqYT1rJmGLGoLFOOmVlD53eRpi64MdVB5MfsvnGt7rvrVvBrU0dXMiks_mdDVZ1vU5w1vde7VPOwYmRfdQXv3sK3h7uX-dPxeLl8Xk-WxSKVjQVkjUqPypNXcqaEd6wJcd5moqUWFZ8SQxWnDd1TbEmLZaGyopJyhHC3DBNp-Dq4DsE_zHqmMTaj8HlSEHqskZ1hTjOFDlQKvgYgzZiCHYjw5fASOyqF4fqRa5e7KsXLIvoQRQz7FY6_Fn_o_oBjdmJfA</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>García-Méndez, Silvia</creator><creator>de Arriba-Pérez, Francisco</creator><creator>Barros-Vila, Ana</creator><creator>González-Castaño, Francisco J.</creator><creator>Costa-Montenegro, Enrique</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-0533-1303</orcidid></search><sort><creationdate>20230801</creationdate><title>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</title><author>García-Méndez, Silvia ; de Arriba-Pérez, Francisco ; Barros-Vila, Ana ; González-Castaño, Francisco J. ; Costa-Montenegro, Enrique</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Dirichlet problem</topic><topic>Expert systems</topic><topic>Financing</topic><topic>Investment strategy</topic><topic>Investments</topic><topic>Machine learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mathematical models</topic><topic>Mechanical Engineering</topic><topic>Modelling</topic><topic>Natural language processing</topic><topic>News</topic><topic>Processes</topic><topic>Segments</topic><topic>Unstructured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>García-Méndez, Silvia</creatorcontrib><creatorcontrib>de Arriba-Pérez, Francisco</creatorcontrib><creatorcontrib>Barros-Vila, Ana</creatorcontrib><creatorcontrib>González-Castaño, Francisco J.</creatorcontrib><creatorcontrib>Costa-Montenegro, Enrique</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering collection</collection><collection>ProQuest Central Basic</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>García-Méndez, Silvia</au><au>de Arriba-Pérez, Francisco</au><au>Barros-Vila, Ana</au><au>González-Castaño, Francisco J.</au><au>Costa-Montenegro, Enrique</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2023-08-01</date><risdate>2023</risdate><volume>53</volume><issue>16</issue><spage>19610</spage><epage>19628</epage><pages>19610-19628</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing ( nlp ) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation ( lda ) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by nlp researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and rouge-l results endorse its potential as a valuable tool for busy investors. The rouge-l values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with lda to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-023-04452-4</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-0533-1303</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0924-669X
ispartof Applied intelligence (Dordrecht, Netherlands), 2023-08, Vol.53 (16), p.19610-19628
issn 0924-669X
1573-7497
language eng
recordid cdi_proquest_journals_2858086071
source ABI/INFORM Global; Springer Nature
subjects Artificial Intelligence
Computer Science
Dirichlet problem
Expert systems
Financing
Investment strategy
Investments
Machine learning
Machines
Manufacturing
Mathematical models
Mechanical Engineering
Modelling
Natural language processing
News
Processes
Segments
Unstructured data
title Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T02%3A01%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20detection%20of%20relevant%20information,%20predictions%20and%20forecasts%20in%20financial%20news%20through%20topic%20modelling%20with%20Latent%20Dirichlet%20Allocation&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Garc%C3%ADa-M%C3%A9ndez,%20Silvia&rft.date=2023-08-01&rft.volume=53&rft.issue=16&rft.spage=19610&rft.epage=19628&rft.pages=19610-19628&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-023-04452-4&rft_dat=%3Cproquest_cross%3E2858086071%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2858086071&rft_id=info:pmid/&rfr_iscdi=true