Abstract:Statistical properties of un-weighted software networks have been extensively studied. However, soft-ware networks in their nature should be weighted. Understanding the properties enclosed in the weighted software networks can lead to better software engineering practices. In this paper, we construct a set of weighted software networks from real-world Java software systems and empirically investigate their topological properties by using weighted k-core decomposition. First, we investigate the static topological properties of the weighted k-core structure, and find that small value of the graph coreness is a property shared by many software systems, the distribution of weighted coreness follows a power law with an exponential cutoff, and weighted coreness and node degree are closely correlated with their spearman correlation coefficients larger than 0.94. Second, we analyze the evolving topological properties of the weighted k-core structure, including the graph coreness, size of the main core, and new members and vanishing members of the main core. Empirical results show that the graph coreness will keep relatively stable unless the system undergoes major changes, size of the main core keeps stable in its evolution, and new members or vanishing members of a main core are from or go to the shells very near the corresponding main cores. Finally, we apply the weighted k-core decomposition method to identify the key classes, and find that, compared with other nine approaches, our approach performs best in the whole set of subject systems according to the average ranking of the Friedman test. It can identify a majority of classes deemed important. This work could help developers to improve software understanding, propose new metrics for software measurement and evaluate the quality of the system in development.
1. Introduction
The objective of this paper is to explore the characteristics of k-core structure in weighted software networks extracted from Java software systems. First, we formally represent the topological structure of Java software at the class level of granularity using a weighted software network, which takes into consideration the coupling frequencies between classes as weights. Second, we introduce the k-core decomposition method for weighted complex networks proposed in [17] (hereinafter referred to as W k - core ) and use it to calculate the k-core structure of the weighted software network. W k - core will partition the weighted software network into a layered structure which will be further measured by amount of relevant properties by statistical parameters. Our approach could potentially uncover some characteristics enclosed in the topological structure of software systems, which can help developers to improve software understanding, propose new metrics for software measurement and evaluate the quality of the system indevelopment.
The rest of this paper is structured as follows. Section 2 gives abrief overview of the related work on investigation of the k-core structures of software networks. In Section 3, we describes our approach in detail, with focus on the definition of the weighted software network and W k - core . In Section 4, we use W k - core to partition the weighted software network into a layered structure and use some statistical parameters to uncover some characteristics enclosed in the topological structure of software systems. In Section 5, we discuss the implications of the results obtained in the current work to software engineering. And we conclude this paper in Section 6.
2. Related work
To the best of our knowledge, there are only several research studies that have been performed to investigate the k-core structures of software networks. They are all published before the yearof 2016.
Zhang et al. [12,14] investigated the topological properties of a set of un-weighted software networks extracted from software systems at the class level, and found some noticeable properties such as small software coreness, high-core connecting tendency of classes, and evolution stability of software coreness. Li et al. [13] also employed the k-core decomposition method to analyze the hierarchy of un-weighted software networks at the class level,and found some similar properties as that of Zhang et al. reported in [12,14]. In [15], Li et al. further analyzed the crucial fractions of software networks using k-core decomposition method, and found that the crucialfractions of different software networks share same universal topological properties, such as the behaviors of scalefree, small-world and strong connectivity. Though Zhang et al.and Li et al. performed their studies on different set of software systems, they followed a similar line of thought and obtained similar results.
However,one major limitation of these methods is that the software networks they used are un-weighted, which does not conform to the reality of a piece of software since software networks in their nature should be weighted [9,16]. The weights describe the coupling strength between the entities. Another limitation of the existing methods is that the software systems they analyzed are mainly written in C++ language. Whether the conclusions they obtained can be extended to software systems written in Java, one of the most widely used programming languages,isstillaproblem.To the best of our knowledge, little attention has been paid to the analysis of k-core structure of weighted software networks extracted from Java software systems.
3. Method
Our approach works as follows.First,we will parsethe.javafiles of a Java software system to extrac tmeaningful structural information in the source code and propose a weighted software network to formally represent the extracted information. Second, we will employ W k - core to obtain the k-core structure of the weighted software network. Finally, the k-core structure is characterized by a amount of relevant properties via statistical parameters. The following subsections will discuss the main steps of our approach in detail.3.1. Software network definition As a human-made complex system, modern software systems are usually built out of many classes/interfaces, which are interacting reciprocally by different kinds of couplings [18–21]. To represent software as a weighted software network, the structural information in the source code should be extracted first. So we will perform static analysis to extract classes/interfaces and the couplings between them. When collecting structural information,we only analyze the classes/interfaces actually encountered in the code, neglecting those that were only referenced [16].In our approach, we will introduce a weighted software network, WCCN (weighted class coupling network), to formally represent the extracted structural information.
4. Empirical study
We designed and conducted a set of experiments to investigate the topological structure and its evolution of real-world software systems using weighted k-core decomposition method. Our exper iments were carried out on a PC at 2.6 GHz with 8 GB of RAM.In the following sections, we describe in detail the objects of study (Section 4.1) and our analysis of the results (Section 4.2).4.1. Objects of study In the current work, we chose total 16 nontrivial Java software systems as objects of study (see Table 1 for their names and domains). When selecting specific software systems, we keep in mind 3 requirements:
• They should be open source and publicly available ensuring that the results obtained can be replicated.
• They should have more than 5 consecutive versions and still be active representing the new development trends of techniques.
• They originate from different application domains with different sizes allowing, to some extent, the generalization of the conclusions.
In Table 1, we provide an overview of the size characteristics of these software systems, measured in KLOC (thousand lines of code), and some detailed statistical properties of the WCCNs constructed from the source code of some directories of them,measured in | N | (number of nodes), | E | (number of edges), (average degree of network nodes), d (diameter), C (clustering coefficient), and l (average path length). The definition of these parameters can be found in [24]. It should be noted that KLOC is the practical lines of code, excluding the comment lines and blank lines. When calculating ⟨ k ⟩ , d, C and l, we ignore the isolated nodes in WCCNs. We also provide the l rand , the l of the corresponding random network, which can be approximately calculated.
From Table 1, we can find that WCCNs share some topological properties of complex networks, that is, their l are similar to the l rand of the corresponding random network with the same N and k , and their C are much larger than C rand.
5. Implications for software engineering
Complex systems and complexity science are viewed as the ‘21st Century Science’ [40]. Its basic view is that the topological structure determines the function, emphasizing the view of the system as a whole. Software networks represent another impor-tant class of complex networks which can also be studied using complex network theory. It provides a different dimension to our understanding of software from the perspective of software as a whole, ignoring the microscopic details. Research on studying software from the perspective of complex networks is emerging and is mainly involved in four distinct aspects [40]: characterizing the shared topological features of software networks, modeling the growth of software networks, measurement of software networks, and their applications in software practices. In the current work, we mainly focus on the investigation of topological properties in weighted software networks using the weighted k-core decompo-sition in complex network theory. The results we obtained may have following theoretical and practical values.
An obvious application of our approach to software engineer-ing is to improve software understanding. Our empirical results performed on a set of 16 Java software system show that small value of the graph coreness is a property shared by many software systems, the distribution of weighted coreness follows a power law with an exponential cutoff, weighted coreness and node degree are closely correlated with their spearman correlation coefficients larger than 0.94, the graph coreness will keep relatively stable unless the system undergoes major changes, size of the main core keeps stable in its evolution, and new members or vanish-ing members of a main core are from or go to the shells very near the corresponding main cores. These observations uncover new properties enclosed in the topological structure of a piece of software and its evolution, providing a new insight into our understanding of a software system. For example, we should de-velop a software system with a relatively small value of kWmax, and when maintaining a software system, we should keep kWmax stable. Any software evolution models [41–43] proposed to model the growth of software networks should also take into consideration all these newly revealed properties. Further, if the k-core structure of a software system drifts from these properties, it may indicate that the software has been overdeveloped, and a major cleanup or antiregressive process is required.
Another obvious application of our approach to software en-gineering is to propose new metrics for software. Till now, a sig-nificant number of OO metrics have been proposed in literature such as the metrics proposed by Abreu and Carapuca [47], CK metrics [48], and the metrics proposed by Li and Henry [49]. The traditional metrics mainly focus on the local features of a spe-cific software system, e.g., the number of classes, the number of methods, etc. But they fail to deeply explore the rich information in software topological structure. Complex networks emphasize the view of the system as a whole. Software metrics based on the complex network theory use a global (or whole) perspective to view the software topology, compensating for the traditional object-oriented metrics. Many new metrics based on complex net-work theory have been proposed [13,50,51]. But these metrics are mainly based on un-weighted software networks. In the current work, we propose a weighted version of software networks which can be used to propose new metrics, and the weighted k-core decomposition can also be used to quantify the hierarchy of a software system [13].
The current work represents seminal work in the area of the analysis and application of topological properties revealed from the weighted software networks. However, we still need further work to design more sophisticated applications of our revealed results that would be of considerable benefit in practice.
6. Conclusions
In this work, we propose an approach to uncover the properties enclosed in the weighted software networks to help developers improve software understanding, propose new metrics for soft-ware measurement, and evaluate the quality of the system in development. To analyze the topological properties of software, we first propose a weighted class coupling network (WCCN) to represent a piece of software at the class level of granularity which takes into consideration the coupling frequency to assign weights to the edges. Then, a weighted k-core decomposition method was introduced to partition the WCCN into a layered structure. Some statistical parameters (i.e., graph coreness and its evolution, weighted coreness distribution, the correlation between weighted coreness and degree, size of the main core, and new members and vanishing members of the main core), are used to investigate the weighted k-core structure and many topological properties have been found. We also use weighted k-core decomposition method to identify the key classes and its effectiveness has been demonstrated. Our approach is demonstrated using a collection of 16 open-source Java software systems.