CLC number: TP309.5

On-line Access: 2018-08-06

Received: 2016-08-22

Revision Accepted: 2017-03-15

Crosschecked: 2018-06-08

Cited: 0

Clicked: 1651

Ahmad Firdaus


Frontiers of Information Technology & Electronic Engineering  2018 Vol.19 No.6 P.712-736


Discovering optimal features using static analysis and a genetic search based method for Android malware detection

Author(s):  Ahmad Firdaus, Nor Badrul Anuar, Ahmad Karim, Mohd Faizal Ab Razak

Affiliation(s):  Department of Computer System and Technology, University of Malaya, Kuala Lumpur 50603, Malaysia; more

Corresponding email(s):   ahmadfirdaus@um.edu.my, badrul@um.edu.my

Key Words:  Genetic algorithm, Static analysis, Android, Malware, Machine learning

Ahmad Firdaus, Nor Badrul Anuar, Ahmad Karim, Mohd Faizal Ab Razak. Discovering optimal features using static analysis and a genetic search based method for Android malware detection[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(6): 712-736.

Mobile device manufacturers are rapidly producing miscellaneous android versions worldwide. Simultaneously, cyber criminals are executing malicious actions, such as tracking user activities, stealing personal data, and committing bank fraud. These criminals gain numerous benefits as too many people use android for their daily routines, including important communications. With this in mind, security practitioners have conducted static and dynamic analyses to identify malware. This study used static analysis because of its overall code coverage, low resource consumption, and rapid processing. However, static analysis requires a minimum number of features to efficiently classify malware. Therefore, we used genetic search (GS), which is a search based on a genetic algorithm (GA), to select the features among 106 strings. To evaluate the best features determined by GS, we used five machine learning classifiers, namely, Naïve Bayes (NB), functional trees (FT), J48, random forest (RF), and multilayer perceptron (MLP). Among these classifiers, FT gave the highest accuracy (95%) and true positive rate (TPR) (96.7%) with the use of only six features.


概要:移动设备制造商在全球范围内快速开发各种Android版本。同时,网络罪犯也在实施各种恶意行为,例如跟踪用户活动、窃取个人数据以及实施银行诈骗。由于在日常生活中使用Android进行重要通信的人群数量庞大,这些网络罪犯从中获得了巨大非法收益。为此,安全从业者通过静态和动态分析对恶意软件进行识别。静态分析具有整体代码覆盖、低资源消耗和快速处理的优势。然而,静态分析需要最少量的特征才能对恶意软件进行有效分类。因此,我们采用基于遗传算法(GA)的遗传搜索(GS)在106个字符串中选择特征。为评估由GS确定的最佳特征,我们使用了5种机器学习分类器,分别是Naïve Bayes(NB)、功能树(FT)、J48、随机森林(RF)和多层感知器(MLP)。在这5种分类器中,FT仅使用6种特征,获得最高准确度(95%)和最高真正率(TPR)(96.7%)。


