Loading…

Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter

Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, tradi...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on Internet technology 2014-12, Vol.14 (4), p.1-24
Main Authors: Vosecky, Jan, Jiang, Di, Leung, Kenneth Wai-Ting, Xing, Kai, Ng, Wilfred
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03
cites cdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03
container_end_page 24
container_issue 4
container_start_page 1
container_title ACM transactions on Internet technology
container_volume 14
creator Vosecky, Jan
Jiang, Di
Leung, Kenneth Wai-Ting
Xing, Kai
Ng, Wilfred
description Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics . In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.
doi_str_mv 10.1145/2651403
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1744697188</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1744697188</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</originalsourceid><addsrcrecordid>eNotUEtLAzEYDKJgreJfyE0vq_ny3D2W4qPQ4qHteckm2RJJNzXJov57WyoMzBxmBmYQugfyBMDFM5UCOGEXaAJCqEoSAZcnzVglWNNco5ucPwkBIYFN0HYxFLdLuvhhh9fReB2wHiyejT8-eJ1-8drt9VC8ybiPCa_GUHyvjSvO4k08eINX0bpwivsBb759KS7doqteh-zu_nmKtq8vm_l7tfx4W8xny8pQUZeq0VQTJmtprO2MqsFQkEYxBpQLBtpawzkhioHrxBGc9x3tmsbWTFFrCZuix3PvIcWv0eXS7n02LgQ9uDjmFhTnslFQ10frw9lqUsw5ub49JL8_DmyBtKfj2v_j2B-JQ187</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1744697188</pqid></control><display><type>article</type><title>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Vosecky, Jan ; Jiang, Di ; Leung, Kenneth Wai-Ting ; Xing, Kai ; Ng, Wilfred</creator><creatorcontrib>Vosecky, Jan ; Jiang, Di ; Leung, Kenneth Wai-Ting ; Xing, Kai ; Ng, Wilfred</creatorcontrib><description>Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics . In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.</description><identifier>ISSN: 1533-5399</identifier><identifier>EISSN: 1557-6051</identifier><identifier>DOI: 10.1145/2651403</identifier><language>eng</language><subject>Classification ; Clustering ; Platforms ; Semantics ; Streams ; Temporal logic ; Utilities ; Vibration</subject><ispartof>ACM transactions on Internet technology, 2014-12, Vol.14 (4), p.1-24</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</citedby><cites>FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Vosecky, Jan</creatorcontrib><creatorcontrib>Jiang, Di</creatorcontrib><creatorcontrib>Leung, Kenneth Wai-Ting</creatorcontrib><creatorcontrib>Xing, Kai</creatorcontrib><creatorcontrib>Ng, Wilfred</creatorcontrib><title>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</title><title>ACM transactions on Internet technology</title><description>Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics . In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.</description><subject>Classification</subject><subject>Clustering</subject><subject>Platforms</subject><subject>Semantics</subject><subject>Streams</subject><subject>Temporal logic</subject><subject>Utilities</subject><subject>Vibration</subject><issn>1533-5399</issn><issn>1557-6051</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNotUEtLAzEYDKJgreJfyE0vq_ny3D2W4qPQ4qHteckm2RJJNzXJov57WyoMzBxmBmYQugfyBMDFM5UCOGEXaAJCqEoSAZcnzVglWNNco5ucPwkBIYFN0HYxFLdLuvhhh9fReB2wHiyejT8-eJ1-8drt9VC8ybiPCa_GUHyvjSvO4k08eINX0bpwivsBb759KS7doqteh-zu_nmKtq8vm_l7tfx4W8xny8pQUZeq0VQTJmtprO2MqsFQkEYxBpQLBtpawzkhioHrxBGc9x3tmsbWTFFrCZuix3PvIcWv0eXS7n02LgQ9uDjmFhTnslFQ10frw9lqUsw5ub49JL8_DmyBtKfj2v_j2B-JQ187</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Vosecky, Jan</creator><creator>Jiang, Di</creator><creator>Leung, Kenneth Wai-Ting</creator><creator>Xing, Kai</creator><creator>Ng, Wilfred</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20141201</creationdate><title>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</title><author>Vosecky, Jan ; Jiang, Di ; Leung, Kenneth Wai-Ting ; Xing, Kai ; Ng, Wilfred</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Classification</topic><topic>Clustering</topic><topic>Platforms</topic><topic>Semantics</topic><topic>Streams</topic><topic>Temporal logic</topic><topic>Utilities</topic><topic>Vibration</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vosecky, Jan</creatorcontrib><creatorcontrib>Jiang, Di</creatorcontrib><creatorcontrib>Leung, Kenneth Wai-Ting</creatorcontrib><creatorcontrib>Xing, Kai</creatorcontrib><creatorcontrib>Ng, Wilfred</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on Internet technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vosecky, Jan</au><au>Jiang, Di</au><au>Leung, Kenneth Wai-Ting</au><au>Xing, Kai</au><au>Ng, Wilfred</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</atitle><jtitle>ACM transactions on Internet technology</jtitle><date>2014-12-01</date><risdate>2014</risdate><volume>14</volume><issue>4</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><issn>1533-5399</issn><eissn>1557-6051</eissn><abstract>Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as auxiliary semantics . In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.</abstract><doi>10.1145/2651403</doi><tpages>24</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1533-5399
ispartof ACM transactions on Internet technology, 2014-12, Vol.14 (4), p.1-24
issn 1533-5399
1557-6051
language eng
recordid cdi_proquest_miscellaneous_1744697188
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Classification
Clustering
Platforms
Semantics
Streams
Temporal logic
Utilities
Vibration
title Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T17%3A23%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrating%20Social%20and%20Auxiliary%20Semantics%20for%20Multifaceted%20Topic%20Modeling%20in%20Twitter&rft.jtitle=ACM%20transactions%20on%20Internet%20technology&rft.au=Vosecky,%20Jan&rft.date=2014-12-01&rft.volume=14&rft.issue=4&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.issn=1533-5399&rft.eissn=1557-6051&rft_id=info:doi/10.1145/2651403&rft_dat=%3Cproquest_cross%3E1744697188%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1744697188&rft_id=info:pmid/&rfr_iscdi=true