Loading…

Finding associations between natural and computer languages: A case-study of bilingual LDA applied to the bleeping computer forum posts

In the context of technical support, trails of technical discussions often contain a mixture of natural language (e.g., English) and software log excerpts. Uncovering latent links between certain problems and log excerpts that are often requested during the discussions of those problems enables the...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of systems and software 2023-07, Vol.201, p.111651, Article 111651
Main Authors: Yao, Kundi, Oliva, Gustavo A., Hassan, Ahmed E., Asaduzzaman, Muhammad, Malton, Andrew J., Walenstein, Andrew
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the context of technical support, trails of technical discussions often contain a mixture of natural language (e.g., English) and software log excerpts. Uncovering latent links between certain problems and log excerpts that are often requested during the discussions of those problems enables the construction of a valuable knowledge base. Nevertheless, uncovering such latent links is challenging because English and software logs are two fundamentally different languages. In this paper, we investigate the suitability of multilingual LDA models to address the problem at hand. We study three models, namely: enriched LDA (M+), two-layer LDA (M2L), and off-the-shelf bilingual LDA (Mbi). We use approximately 8K discussion threads from a Bleeping Computer forum as our dataset. We observe that M2L performs the best overall, although it yields a substantially coarser-grained view of the discussed themes in the threads (20 topics, 0.3% of the documents). We also note that M+ outperforms Mbiachieving higher coherence, lower perplexity, and higher cross-lingual coverage ratio. We invite future studies to qualitatively assess the quality of the topics produced by the LDA models, such that the feasibility of employing such models in practice can be better determined. •We expanded the selection of related studies from the bimodality/dual-channel area.•We discussed alternative approaches that balance two languages in LDA models.•We revised certain paragraphs of the paper to improve its soundness and clarity.•We made proper changes including typo fixing, term rewording, and issue fixing.
ISSN:0164-1212
1873-1228
DOI:10.1016/j.jss.2023.111651