Loading…
Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation
Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relev...
Saved in:
Published in: | Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-08, Vol.53 (16), p.19610-19628 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3 |
---|---|
cites | cdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3 |
container_end_page | 19628 |
container_issue | 16 |
container_start_page | 19610 |
container_title | Applied intelligence (Dordrecht, Netherlands) |
container_volume | 53 |
creator | García-Méndez, Silvia de Arriba-Pérez, Francisco Barros-Vila, Ana González-Castaño, Francisco J. Costa-Montenegro, Enrique |
description | Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (
nlp
) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (
lda
) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by
nlp
researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and
rouge-l
results endorse its potential as a valuable tool for busy investors. The
rouge-l
values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with
lda
to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations. |
doi_str_mv | 10.1007/s10489-023-04452-4 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2858086071</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2858086071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</originalsourceid><addsrcrecordid>eNp9kM1KAzEUhYMoWKsv4Crg1tH8zWRmWeovFNwouAtpJumkTJMxyVh8DN_YtBXcubjcxf3OOdwDwCVGNxghfhsxYnVTIEILxFhJCnYEJrjktOCs4cdgghrCiqpq3k_BWYxrhBClCE_A92xMfiOTVbDVSatkvYPewKB7_SldgtYZH3aAd9dwCLq1eyZC6VqYT1rJmGLGoLFOOmVlD53eRpi64MdVB5MfsvnGt7rvrVvBrU0dXMiks_mdDVZ1vU5w1vde7VPOwYmRfdQXv3sK3h7uX-dPxeLl8Xk-WxSKVjQVkjUqPypNXcqaEd6wJcd5moqUWFZ8SQxWnDd1TbEmLZaGyopJyhHC3DBNp-Dq4DsE_zHqmMTaj8HlSEHqskZ1hTjOFDlQKvgYgzZiCHYjw5fASOyqF4fqRa5e7KsXLIvoQRQz7FY6_Fn_o_oBjdmJfA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2858086071</pqid></control><display><type>article</type><title>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</title><source>ABI/INFORM Global</source><source>Springer Nature</source><creator>García-Méndez, Silvia ; de Arriba-Pérez, Francisco ; Barros-Vila, Ana ; González-Castaño, Francisco J. ; Costa-Montenegro, Enrique</creator><creatorcontrib>García-Méndez, Silvia ; de Arriba-Pérez, Francisco ; Barros-Vila, Ana ; González-Castaño, Francisco J. ; Costa-Montenegro, Enrique</creatorcontrib><description>Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (
nlp
) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (
lda
) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by
nlp
researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and
rouge-l
results endorse its potential as a valuable tool for busy investors. The
rouge-l
values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with
lda
to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.</description><identifier>ISSN: 0924-669X</identifier><identifier>EISSN: 1573-7497</identifier><identifier>DOI: 10.1007/s10489-023-04452-4</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Computer Science ; Dirichlet problem ; Expert systems ; Financing ; Investment strategy ; Investments ; Machine learning ; Machines ; Manufacturing ; Mathematical models ; Mechanical Engineering ; Modelling ; Natural language processing ; News ; Processes ; Segments ; Unstructured data</subject><ispartof>Applied intelligence (Dordrecht, Netherlands), 2023-08, Vol.53 (16), p.19610-19628</ispartof><rights>The Author(s) 2023</rights><rights>The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</citedby><cites>FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</cites><orcidid>0000-0003-0533-1303</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2858086071/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2858086071?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>314,780,784,11688,27924,27925,36060,44363,74895</link.rule.ids></links><search><creatorcontrib>García-Méndez, Silvia</creatorcontrib><creatorcontrib>de Arriba-Pérez, Francisco</creatorcontrib><creatorcontrib>Barros-Vila, Ana</creatorcontrib><creatorcontrib>González-Castaño, Francisco J.</creatorcontrib><creatorcontrib>Costa-Montenegro, Enrique</creatorcontrib><title>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</title><title>Applied intelligence (Dordrecht, Netherlands)</title><addtitle>Appl Intell</addtitle><description>Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (
nlp
) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (
lda
) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by
nlp
researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and
rouge-l
results endorse its potential as a valuable tool for busy investors. The
rouge-l
values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with
lda
to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.</description><subject>Artificial Intelligence</subject><subject>Computer Science</subject><subject>Dirichlet problem</subject><subject>Expert systems</subject><subject>Financing</subject><subject>Investment strategy</subject><subject>Investments</subject><subject>Machine learning</subject><subject>Machines</subject><subject>Manufacturing</subject><subject>Mathematical models</subject><subject>Mechanical Engineering</subject><subject>Modelling</subject><subject>Natural language processing</subject><subject>News</subject><subject>Processes</subject><subject>Segments</subject><subject>Unstructured data</subject><issn>0924-669X</issn><issn>1573-7497</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp9kM1KAzEUhYMoWKsv4Crg1tH8zWRmWeovFNwouAtpJumkTJMxyVh8DN_YtBXcubjcxf3OOdwDwCVGNxghfhsxYnVTIEILxFhJCnYEJrjktOCs4cdgghrCiqpq3k_BWYxrhBClCE_A92xMfiOTVbDVSatkvYPewKB7_SldgtYZH3aAd9dwCLq1eyZC6VqYT1rJmGLGoLFOOmVlD53eRpi64MdVB5MfsvnGt7rvrVvBrU0dXMiks_mdDVZ1vU5w1vde7VPOwYmRfdQXv3sK3h7uX-dPxeLl8Xk-WxSKVjQVkjUqPypNXcqaEd6wJcd5moqUWFZ8SQxWnDd1TbEmLZaGyopJyhHC3DBNp-Dq4DsE_zHqmMTaj8HlSEHqskZ1hTjOFDlQKvgYgzZiCHYjw5fASOyqF4fqRa5e7KsXLIvoQRQz7FY6_Fn_o_oBjdmJfA</recordid><startdate>20230801</startdate><enddate>20230801</enddate><creator>García-Méndez, Silvia</creator><creator>de Arriba-Pérez, Francisco</creator><creator>Barros-Vila, Ana</creator><creator>González-Castaño, Francisco J.</creator><creator>Costa-Montenegro, Enrique</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L6V</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PSYQQ</scope><scope>PTHSS</scope><scope>Q9U</scope><orcidid>https://orcid.org/0000-0003-0533-1303</orcidid></search><sort><creationdate>20230801</creationdate><title>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</title><author>García-Méndez, Silvia ; de Arriba-Pérez, Francisco ; Barros-Vila, Ana ; González-Castaño, Francisco J. ; Costa-Montenegro, Enrique</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial Intelligence</topic><topic>Computer Science</topic><topic>Dirichlet problem</topic><topic>Expert systems</topic><topic>Financing</topic><topic>Investment strategy</topic><topic>Investments</topic><topic>Machine learning</topic><topic>Machines</topic><topic>Manufacturing</topic><topic>Mathematical models</topic><topic>Mechanical Engineering</topic><topic>Modelling</topic><topic>Natural language processing</topic><topic>News</topic><topic>Processes</topic><topic>Segments</topic><topic>Unstructured data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>García-Méndez, Silvia</creatorcontrib><creatorcontrib>de Arriba-Pérez, Francisco</creatorcontrib><creatorcontrib>Barros-Vila, Ana</creatorcontrib><creatorcontrib>González-Castaño, Francisco J.</creatorcontrib><creatorcontrib>Costa-Montenegro, Enrique</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer science database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ProQuest Engineering Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>Engineering Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest One Psychology</collection><collection>Engineering collection</collection><collection>ProQuest Central Basic</collection><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>García-Méndez, Silvia</au><au>de Arriba-Pérez, Francisco</au><au>Barros-Vila, Ana</au><au>González-Castaño, Francisco J.</au><au>Costa-Montenegro, Enrique</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation</atitle><jtitle>Applied intelligence (Dordrecht, Netherlands)</jtitle><stitle>Appl Intell</stitle><date>2023-08-01</date><risdate>2023</risdate><volume>53</volume><issue>16</issue><spage>19610</spage><epage>19628</epage><pages>19610-19628</pages><issn>0924-669X</issn><eissn>1573-7497</eissn><abstract>Financial news items are unstructured sources of information that can be mined to extract knowledge for market screening applications. They are typically written by market experts who describe stock market events within the context of social, economic and political change. Manual extraction of relevant information from the continuous stream of finance-related news is cumbersome and beyond the skills of many investors, who, at most, can follow a few sources and authors. Accordingly, we focus on the analysis of financial news to identify relevant text and, within that text, forecasts and predictions. We propose a novel Natural Language Processing (
nlp
) system to assist investors in the detection of relevant financial events in unstructured textual sources by considering both relevance and temporality at the discursive level. Firstly, we segment the text to group together closely related text. Secondly, we apply co-reference resolution to discover internal dependencies within segments. Finally, we perform relevant topic modelling with Latent Dirichlet Allocation (
lda
) to separate relevant from less relevant text and then analyse the relevant text using a Machine Learning-oriented temporal approach to identify predictions and speculative statements. Our solution outperformed a rule-based baseline system. We created an experimental data set composed of 2,158 financial news items that were manually labelled by
nlp
researchers to evaluate our solution. Inter-agreement Alpha-reliability and accuracy values, and
rouge-l
results endorse its potential as a valuable tool for busy investors. The
rouge-l
values for the identification of relevant text and predictions/forecasts were 0.662 and 0.982, respectively. To our knowledge, this is the first work to jointly consider relevance and temporality at the discursive level. It contributes to the transfer of human associative discourse capabilities to expert systems through the combination of multi-paragraph topic segmentation and co-reference resolution to separate author expression patterns, topic modelling with
lda
to detect relevant text, and discursive temporality analysis to identify forecasts and predictions within this text. Our solution may have compelling applications in the financial field, including the possibility of extracting relevant statements on investment strategies to analyse authors’ reputations.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10489-023-04452-4</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-0533-1303</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0924-669X |
ispartof | Applied intelligence (Dordrecht, Netherlands), 2023-08, Vol.53 (16), p.19610-19628 |
issn | 0924-669X 1573-7497 |
language | eng |
recordid | cdi_proquest_journals_2858086071 |
source | ABI/INFORM Global; Springer Nature |
subjects | Artificial Intelligence Computer Science Dirichlet problem Expert systems Financing Investment strategy Investments Machine learning Machines Manufacturing Mathematical models Mechanical Engineering Modelling Natural language processing News Processes Segments Unstructured data |
title | Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T02%3A01%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20detection%20of%20relevant%20information,%20predictions%20and%20forecasts%20in%20financial%20news%20through%20topic%20modelling%20with%20Latent%20Dirichlet%20Allocation&rft.jtitle=Applied%20intelligence%20(Dordrecht,%20Netherlands)&rft.au=Garc%C3%ADa-M%C3%A9ndez,%20Silvia&rft.date=2023-08-01&rft.volume=53&rft.issue=16&rft.spage=19610&rft.epage=19628&rft.pages=19610-19628&rft.issn=0924-669X&rft.eissn=1573-7497&rft_id=info:doi/10.1007/s10489-023-04452-4&rft_dat=%3Cproquest_cross%3E2858086071%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c363t-a49c452af85a842794b714b796251a67b2f1c7798831e2d1af3a64a370017f4e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2858086071&rft_id=info:pmid/&rfr_iscdi=true |