Refining Software Clustering: The Impact of Code Co-Changes on Architectural Reconstruction
Version control systems are essential for tracking and managing changes in software code. They also provide information about the relationships between software entities: when multiple entities (e.g., classes or interfaces) are frequently modified together, it may indicate an underlying connection b...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/11096604/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Version control systems are essential for tracking and managing changes in software code. They also provide information about the relationships between software entities: when multiple entities (e.g., classes or interfaces) are frequently modified together, it may indicate an underlying connection between them. These code co-changes can be analyzed and filtered to uncover logical dependencies, which complement structural or lexical dependencies. While structural dependencies are extracted using code analysis techniques, logical dependencies can be extracted solely from version control system logs. This makes their extraction independent of the programming language. Our work investigates filtering techniques to eliminate co-change situations that are not meaningful as logical dependencies. The effectiveness of these filtering techniques is investigated by using the obtained logical dependencies in architectural reconstruction, a reverse engineering activity aimed at recovering a system’s modular structure. This paper explores the use of logical dependencies as input for architectural reconstruction. The main goal is to investigate whether logical dependencies can complement or even fully replace structural dependencies in this process. We use clustering based on software dependencies to group related entities into modules. We conduct experiments on four open-source Java projects using three clustering algorithms (Louvain, Leiden, and DBSCAN) and two evaluation metrics (Modularization Quality and MoJoFM). We consider three different dependency configurations: 1) only structural dependencies, 2) only logical dependencies, and 3) their combination. Our goal is to assess which configuration performs best and to examine the advantages and limitations of using logical dependencies. |
---|---|
ISSN: | 2169-3536 |