Loading…
Advanced Automated Tagging for Stack Overflow: A Multi-Stage Approach Using Deep Learning and NLP Techniques
This paper explores the critical role of systematic question categorization in question-and-answer platforms, with a focus on the vital function of tagging in efficient content organization. The significance of precise tagging for optimal content management is underscored, noting how tag inaccuracie...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper explores the critical role of systematic question categorization in question-and-answer platforms, with a focus on the vital function of tagging in efficient content organization. The significance of precise tagging for optimal content management is underscored, noting how tag inaccuracies can lead to search inefficiencies and reduced platform effectiveness. The core of this study is the introduction of an innovative automated tagging system designed specifically for Stack Overflow. As a principal case study, Stack Overflow, renowned for its extensive collection of programming-related queries and solutions, provides a fertile ground for testing and refining our system. Our system leverages various elements such as the question's title, description, and embedded code snippets to recommend pertinent tags, thereby aiming to refine and expedite the tagging process. It starts with question preprocessing, followed by a two-step candidate tag extraction. The first step utilizes the YAKE algorithm for initial tag extraction, and the second involves using MPNET for question embedding. This is complemented by methods like multi-label k-nearest neighbor, multi-label Random Forest, and Cosine Similarity for further tag extraction. The process then moves to tag selection and pruning, eliminating overlaps, and concludes with tag sorting.We assess our system's performance using metrics such as F1-score, Recall, and Precision. Our experimental results show a notable improvement over existing baseline methods, with our approach achieving a substantial 3.4% enhancement in performance compared to the most effective baseline. This indicates the potential of our system to significantly advance tag-based categorization in question-and-answer platforms. |
---|---|
ISSN: | 2640-5768 |
DOI: | 10.1109/AISP61396.2024.10475258 |