Loading…
Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter
Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, tradi...
Saved in:
Published in: | ACM transactions on Internet technology 2014-12, Vol.14 (4), p.1-24 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03 |
---|---|
cites | cdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03 |
container_end_page | 24 |
container_issue | 4 |
container_start_page | 1 |
container_title | ACM transactions on Internet technology |
container_volume | 14 |
creator | Vosecky, Jan Jiang, Di Leung, Kenneth Wai-Ting Xing, Kai Ng, Wilfred |
description | Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as
auxiliary semantics
. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering. |
doi_str_mv | 10.1145/2651403 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1744697188</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1744697188</sourcerecordid><originalsourceid>FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</originalsourceid><addsrcrecordid>eNotUEtLAzEYDKJgreJfyE0vq_ny3D2W4qPQ4qHteckm2RJJNzXJov57WyoMzBxmBmYQugfyBMDFM5UCOGEXaAJCqEoSAZcnzVglWNNco5ucPwkBIYFN0HYxFLdLuvhhh9fReB2wHiyejT8-eJ1-8drt9VC8ybiPCa_GUHyvjSvO4k08eINX0bpwivsBb759KS7doqteh-zu_nmKtq8vm_l7tfx4W8xny8pQUZeq0VQTJmtprO2MqsFQkEYxBpQLBtpawzkhioHrxBGc9x3tmsbWTFFrCZuix3PvIcWv0eXS7n02LgQ9uDjmFhTnslFQ10frw9lqUsw5ub49JL8_DmyBtKfj2v_j2B-JQ187</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1744697188</pqid></control><display><type>article</type><title>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Vosecky, Jan ; Jiang, Di ; Leung, Kenneth Wai-Ting ; Xing, Kai ; Ng, Wilfred</creator><creatorcontrib>Vosecky, Jan ; Jiang, Di ; Leung, Kenneth Wai-Ting ; Xing, Kai ; Ng, Wilfred</creatorcontrib><description>Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as
auxiliary semantics
. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.</description><identifier>ISSN: 1533-5399</identifier><identifier>EISSN: 1557-6051</identifier><identifier>DOI: 10.1145/2651403</identifier><language>eng</language><subject>Classification ; Clustering ; Platforms ; Semantics ; Streams ; Temporal logic ; Utilities ; Vibration</subject><ispartof>ACM transactions on Internet technology, 2014-12, Vol.14 (4), p.1-24</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</citedby><cites>FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Vosecky, Jan</creatorcontrib><creatorcontrib>Jiang, Di</creatorcontrib><creatorcontrib>Leung, Kenneth Wai-Ting</creatorcontrib><creatorcontrib>Xing, Kai</creatorcontrib><creatorcontrib>Ng, Wilfred</creatorcontrib><title>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</title><title>ACM transactions on Internet technology</title><description>Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as
auxiliary semantics
. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.</description><subject>Classification</subject><subject>Clustering</subject><subject>Platforms</subject><subject>Semantics</subject><subject>Streams</subject><subject>Temporal logic</subject><subject>Utilities</subject><subject>Vibration</subject><issn>1533-5399</issn><issn>1557-6051</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNotUEtLAzEYDKJgreJfyE0vq_ny3D2W4qPQ4qHteckm2RJJNzXJov57WyoMzBxmBmYQugfyBMDFM5UCOGEXaAJCqEoSAZcnzVglWNNco5ucPwkBIYFN0HYxFLdLuvhhh9fReB2wHiyejT8-eJ1-8drt9VC8ybiPCa_GUHyvjSvO4k08eINX0bpwivsBb759KS7doqteh-zu_nmKtq8vm_l7tfx4W8xny8pQUZeq0VQTJmtprO2MqsFQkEYxBpQLBtpawzkhioHrxBGc9x3tmsbWTFFrCZuix3PvIcWv0eXS7n02LgQ9uDjmFhTnslFQ10frw9lqUsw5ub49JL8_DmyBtKfj2v_j2B-JQ187</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Vosecky, Jan</creator><creator>Jiang, Di</creator><creator>Leung, Kenneth Wai-Ting</creator><creator>Xing, Kai</creator><creator>Ng, Wilfred</creator><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20141201</creationdate><title>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</title><author>Vosecky, Jan ; Jiang, Di ; Leung, Kenneth Wai-Ting ; Xing, Kai ; Ng, Wilfred</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Classification</topic><topic>Clustering</topic><topic>Platforms</topic><topic>Semantics</topic><topic>Streams</topic><topic>Temporal logic</topic><topic>Utilities</topic><topic>Vibration</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Vosecky, Jan</creatorcontrib><creatorcontrib>Jiang, Di</creatorcontrib><creatorcontrib>Leung, Kenneth Wai-Ting</creatorcontrib><creatorcontrib>Xing, Kai</creatorcontrib><creatorcontrib>Ng, Wilfred</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on Internet technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Vosecky, Jan</au><au>Jiang, Di</au><au>Leung, Kenneth Wai-Ting</au><au>Xing, Kai</au><au>Ng, Wilfred</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter</atitle><jtitle>ACM transactions on Internet technology</jtitle><date>2014-12-01</date><risdate>2014</risdate><volume>14</volume><issue>4</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><issn>1533-5399</issn><eissn>1557-6051</eissn><abstract>Microblogging platforms, such as Twitter, have already played an important role in recent cultural, social and political events. Discovering latent topics from social streams is therefore important for many downstream applications, such as clustering, classification or recommendation. However, traditional topic models that rely on the bag-of-words assumption are insufficient to uncover the rich semantics and temporal aspects of topics in Twitter. In particular, microblog content is often influenced by external information sources, such as Web documents linked from Twitter posts, and often focuses on specific entities, such as people or organizations. These external sources provide useful semantics to understand microblogs and we generally refer to these semantics as
auxiliary semantics
. In this article, we address the mentioned issues and propose a unified framework for Multifaceted Topic Modeling from Twitter streams. We first extract social semantics from Twitter by modeling the social chatter associated with hashtags. We further extract terms and named entities from linked Web documents to serve as auxiliary semantics during topic modeling. The Multifaceted Topic Model (MfTM) is then proposed to jointly model latent semantics among the social terms from Twitter, auxiliary terms from the linked Web documents and named entities. Moreover, we capture the temporal characteristics of each topic. An efficient online inference method for MfTM is developed, which enables our model to be applied to large-scale and streaming data. Our experimental evaluation shows the effectiveness and efficiency of our model compared with state-of-the-art baselines. We evaluate each aspect of our framework and show its utility in the context of tweet clustering.</abstract><doi>10.1145/2651403</doi><tpages>24</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1533-5399 |
ispartof | ACM transactions on Internet technology, 2014-12, Vol.14 (4), p.1-24 |
issn | 1533-5399 1557-6051 |
language | eng |
recordid | cdi_proquest_miscellaneous_1744697188 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list) |
subjects | Classification Clustering Platforms Semantics Streams Temporal logic Utilities Vibration |
title | Integrating Social and Auxiliary Semantics for Multifaceted Topic Modeling in Twitter |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T17%3A23%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Integrating%20Social%20and%20Auxiliary%20Semantics%20for%20Multifaceted%20Topic%20Modeling%20in%20Twitter&rft.jtitle=ACM%20transactions%20on%20Internet%20technology&rft.au=Vosecky,%20Jan&rft.date=2014-12-01&rft.volume=14&rft.issue=4&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.issn=1533-5399&rft.eissn=1557-6051&rft_id=info:doi/10.1145/2651403&rft_dat=%3Cproquest_cross%3E1744697188%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c258t-9a2a03686cddbc781c216c733124531addc4400731eb5eb544fb2b99d8372dd03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1744697188&rft_id=info:pmid/&rfr_iscdi=true |