CLC number: TP311
On-line Access: 2017-09-08
Received: 2015-10-30
Revision Accepted: 2016-04-12
Crosschecked: 2017-08-04
Cited: 0
Clicked: 7652
Rashid Naseem, Mustafa Bin Mat Deris, Onaiza Maqbool, Jing-peng Li, Sara Shahzad, Habib Shah. Improved binary similarity measures for software modularization[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(8): 1082-1107.
@article{title="Improved binary similarity measures for software modularization",
author="Rashid Naseem, Mustafa Bin Mat Deris, Onaiza Maqbool, Jing-peng Li, Sara Shahzad, Habib Shah",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="18",
number="8",
pages="1082-1107",
year="2017",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500373"
}
%0 Journal Article
%T Improved binary similarity measures for software modularization
%A Rashid Naseem
%A Mustafa Bin Mat Deris
%A Onaiza Maqbool
%A Jing-peng Li
%A Sara Shahzad
%A Habib Shah
%J Frontiers of Information Technology & Electronic Engineering
%V 18
%N 8
%P 1082-1107
%@ 2095-9184
%D 2017
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500373
TY - JOUR
T1 - Improved binary similarity measures for software modularization
A1 - Rashid Naseem
A1 - Mustafa Bin Mat Deris
A1 - Onaiza Maqbool
A1 - Jing-peng Li
A1 - Sara Shahzad
A1 - Habib Shah
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 18
IS - 8
SP - 1082
EP - 1107
%@ 2095-9184
Y1 - 2017
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500373
Abstract: Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.
[1]Andreopoulos, B., An, A.J., Tzerpos, V., et al., 2005. Multiple layer clustering of large software systems. Proc. 12th Working Conf. on Reverse Engineering, p.79-88.
[2]Andritsos, P., Tzerpos, V., 2005. Information-theoretic software clustering. IEEE Trans. Softw. Eng., 31(2): 150-165.
[3]Anquetil, N., Lethbridge, T.C., 1999. Experiments with clustering as a software remodularization method. Proc. 6th Working Conf. on Reverse Engineering, p.235-255.
[4]Bauer, M., Trifu, M., 2004. Architecture-aware adaptive clustering of OO systems. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.3-14.
[5]Bittencourt, R.A., Guerrero, D.D.S., 2009. Comparison of graph clustering algorithms for recovering software architecture module views. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.251-254.
[6]Cheetham, A.H., Hazel, J.E., 1969. Binary (presence-absence) similarity coefficents. J. Paleontol., 43(5): 1130-1136.
[7]Chong, C.Y., Lee, S.P., Ling, T.C., 2013. Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inform. Softw. Technol., 55(11):1994-2012.
[8]Cui, J.F., Chae, H.S., 2011. Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inform. Softw. Technol., 53(6): 601-614.
[9]Davey, J., Burd, E., 2000. Evaluating the suitability of data clustering for software remodularisation. Proc. 7th Working Conf. on Reverse Engineering, p.268-276.
[10]Dugerdil, P., Jossi, S., 2008. Reverse-architecting legacy software based on roles: an industrial experiment. Commun. Comput. Inform. Sci., 22:114-127.
[11]Glorie, M., Zaidman, A., van Deursen, A., et al., 2009. Splitting a large software repository for easing future software evolution–-an industrial experience report. em J. Softw. Mainten. Evol. Res. Pract., 21(2):113-141.
[12]Godfrey, M.W., Lee, E.H., 2000. Secrets from the monster: extracting Mozilla’s software architecture. Proc. Int. Symp. on Constructing Software Engineering Tools, p.1-10.
[13]Hall, M., Walkinshaw, N., McMinn, P., 2012. Supervised software modularisation. Proc. 28th IEEE Int. Conf. on Software Maintenance, p.472-481.
[14]Hussain, I., Khanum, A., Abbasi, A.Q., et al., 2015. A novel approach for software architecture recovery using particle swarm optimization. Int. Arab. J. Inform. Technol., 12(1):1-10.
[15]Jackson, D.A., Somers, K.M., Harvey, H.H., 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence. Am. Nat., 133(3):436-453.
[16]Jahnke, J.H., 2004. Reverse engineering software architecture using rough clusters. Proc. IEEE Annual Meeting of the Fuzzy Information, p.4-9.
[17]Kanellopoulos, Y., Antonellis, P., Tjortjis, C., et al., 2007. K-attractors: a clustering algorithm for software measurement data analysis. Proc. 19th IEEE Int. Conf. on Tools with Artificial Intelligence, p.358-365.
[18]Lakhotia, A., 1997. A unified framework for expressing software subsystem classification techniques. J. Syst. Softw., 36(3):211-231.
[19]Lesot, M.J., Rifqi, M., Benhadda, H., 2009. Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Parad., 1(1):63.
[20]Lung, C.H., Zaman, M., Nandi, A., 2004. Applications of clustering techniques to software partitioning, recovery and restructuring. J. Syst. Softw., 73(2):227-244.
[21]Lutellier, T., Chollak, D., Garcia, J., et al., 2015. Comparing software architecture recovery techniques using accurate dependencies. Proc. 37th IEEE Int. Conf. on Software Engineering, p.69-78.
[22]Maqbool, O., Babri, H., 2004. The weighted combined algorithm: a linkage algorithm for software clustering. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.15-24.
[23]Maqbool, O., Babri, H., 2007. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11):759-780.
[24]Mitchell, B.S., 2006. Clustering Software Systems to Identify Subsystem Structures. Technical Report, Department of Mathematics and Computer Science, Drexel University, USA.
[25]Mitchell, B.S., Mancoridis, S., 2006. On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng., 32(3):193-208.
[26]Muhammad, S., Maqbool, O., Abbasi, A.Q., 2012. Evaluating relationship categories for clustering object-oriented software systems. IET Softw., 6(3):260-274.
[27]Naseem, R., Maqbool, O., Muhammad, S., 2010. An improved similarity measure for binary features in software clustering. Proc. 2nd Int. Conf. on Computational Intelligence, Modelling and Simulation, p.111-116.
[28]Naseem, R., Maqbool, O., Muhammad, S., 2011. Improved similarity measures for software clustering. Proc. 15th European Conf. on Software Maintenance and Reengineering, p.45-54.
[29]Naseem, R., Maqbool, O., Muhammad, S., 2013. Cooperative clustering for software modularization. J. Syst. Softw., 86(8):2045-2062.
[30]Patel, C., Hamou-Lhadj, A., Rilling, J., 2009. Software clustering using dynamic analysis and static dependencies. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.27-36.
[31]Praditwong, K., 2011. Solving software module clustering problem by evolutionary algorithms. Proc. 8th Int. Joint Conf. on Computer Science and Software Engineering, p.154-159.
[32]Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng., 37(2):264-282.
[33]Saeed, M., Maqbool, O., Babri, H., et al., 2003. Software clustering techniques and the use of combined algorithm. Proc. 7th European Conf. on Software Maintenance and Reengineering, p.301-306.
[34]Sartipi, K., Kontogiannis, K., 2003. On modeling software architecture recovery as graph matching. Proc. Int. Conf. on Software Maintenance, p.224-234.
[35]Seung-Seok, C., Cha, S.H., Tappert, C.C., 2010. A survey of binary similarity and distance measures. J. Syst. Cybern. Inform., 8(1):43-48.
[36]Shah, Z., Naseem, R., Orgun, M., et al., 2013. Software clustering using automated feature subset selection. Proc. Int. Conf. on Advanced Data Mining and Applications, p.47-58.
[37]Shtern, M., Tzerpos, V., 2010. On the comparability of software clustering algorithms. Proc. IEEE 18th Int. Conf. on Program Comprehension, p.64-67.
[38]Shtern, M., Tzerpos, V., 2012. Clustering methodologies for software engineering. Adv. Softw. Eng., 2012: 792024.1-792024.18.
[39]Shtern, M., Tzerpos, V., 2014. Methods for selecting and improving software clustering algorithms. Softw. Pract. Exp., 44(1):33-46.
[40]Siddique, F., Maqbool, O., 2012. Enhancing comprehensibility of software clustering results. IET Softw., 6(4):283.
[41]Synytskyy, N., Holt, R.C., Davis, I., 2005. Browsing software architectures with LSEdit. Proc. 13th Int. Workshop on Program Comprehension, p.176-178.
[42]Tonella, P., 2001. Concept analysis for module restructuring. IEEE Trans. Softw. Eng., 27(4):351-363.
[43]Tzerpos, V., Holt, R.C., 1999. MoJo: a distance metric for software clusterings. Proc. 6th Working Conf. on Reverse Engineering, p.187-193.
[44]Tzerpos, V., Holt, R.C., 2000. On the stability of software clustering algorithms. Proc. 8th Int. Workshop on Program Comprehension, p.211-218.
[45]Vasconcelos, A., Werner, C., 2007. Architecture recovery and evaluation aiming at program understanding and reuse. Proc. Int. Conf. on the Quality of Software Architectures, p.72-89.
[46]Veal, B.W.G., 2011. Binary Similarity Measures and Their Applications in Machine Learning. PhD Thesis, London School of Economics, London, UK.
[47]Wang, Y., Liu, P., Guo, H., et al., 2010. Improved hierarchical clustering algorithm for software architecture recovery. Proc. Int. Conf. on Intelligent Computing and Cognitive Informatics, p.247-250.
[48]Wen, Z., Tzerpos, V., 2003. An optimal algorithm for MoJo distance. Proc. 11th IEEE Int. Workshop on Program Comprehension, p.227-235.
[49]Wen, Z., Tzerpos, V., 2004. An effectiveness measure for software clustering algorithms. Proc. 12th IEEE Int. Workshop on Program Comprehension, p.194-203.
[50]Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization. Proc. 4th Working Conf. on Reverse Engineering, p.33-43.
[51]Wu, J., Hassan, A.E., Holt, R.C., 2005. Comparison of clustering algorithms in the context of software evolution. Proc. 21st IEEE Int. Conf. on Software Maintenance, p.525-535.
[52]Xanthos, S., Goodwin, N., 2006. Clustering object-oriented software systems using spectral graph partitioning. Urbana, 51(1):1-5.
[53]Xia, C., Tzerpos, V., 2005. Software clustering based on dynamic dependencies. Proc. 9th European Conf. on Software Maintenance and Reengineering, p.124-133.
Open peer comments: Debate/Discuss/Question/Opinion
<1>