İrem Selin Deniz, An Investigation of Issue Labeling in Open Source Software Projects Using Large Language Models

M.S. Candidate: İrem Selin Deniz
Program: Information Systems
Date: 06.09.2024 / 11:00
Place: A-108

Abstract: In the evolving landscape of open source software projects, effective issue management remains a pivotal aspect of sustaining project success. Issue reports provide valuable information as they are created for reporting bugs, requesting new features, or asking questions about a software product. The high number of issue reports, which vary widely in quality, requires accurate issue classification mechanisms to prioritize work and manage resources effectively. Properly assigned issue labels are crucial for effective project management and for the reliability of research conducted to improve issue management as they often assume the assigned issue labels as the ground truth. This study aims to assess the reliability of the assigned issue labels in open source software development projects to improve issue management processes. The research involves collecting two datasets of issue reports from open source software development projects hosted on GitHub. Experiments were conducted with the state-of-the-art large language models for issue label classification. Furthermore, a qualitative analysis was performed to evaluate the relevance of the assigned issue labels with respect to the content of the issues. The empirical study performed on issue reports revealed a significant mismatch between the assigned labels and the actual content of the issues. The study also demonstrated the effectiveness of the state-of-the-art large language models in classifying issue labels, highlighting concerns about the reliability of issue labels in open source software development projects.