CLC number: TP309
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-07-04
Cited: 0
Clicked: 1491
Citations: Bibtex RefMan EndNote GB/T7714
Bin LI, Yijie WANG, Li CHENG. Adaptive and augmented active anomaly detection on dynamic network traffic streams[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(3): 446-460.
@article{title="Adaptive and augmented active anomaly detection on dynamic network traffic streams",
author="Bin LI, Yijie WANG, Li CHENG",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="25",
number="3",
pages="446-460",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300244"
}
%0 Journal Article
%T Adaptive and augmented active anomaly detection on dynamic network traffic streams
%A Bin LI
%A Yijie WANG
%A Li CHENG
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 3
%P 446-460
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300244
TY - JOUR
T1 - Adaptive and augmented active anomaly detection on dynamic network traffic streams
A1 - Bin LI
A1 - Yijie WANG
A1 - Li CHENG
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 3
SP - 446
EP - 460
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300244
Abstract: active anomaly detection queries labels of sampled instances and uses them to incrementally update the detection model, and has been widely adopted in detecting network attacks. However, existing methods cannot achieve desirable performance on dynamic network traffic streams because (1) their query strategies cannot sample informative instances to make the detection model adapt to the evolving stream and (2) their model updating relies on limited query instances only and fails to leverage the enormous unlabeled instances on streams. To address these issues, we propose an active tree based model, adaptive and augmented active prior-knowledge forest (A3PF), for anomaly detection on network traffic streams. A prior-knowledge forest is constructed using prior knowledge of network attacks to find feature subspaces that better distinguish network anomalies from normal traffic. On one hand, to make the model adapt to the evolving stream, a novel adaptive query strategy is designed to sample informative instances from two aspects: the changes in dynamic data distribution and the uncertainty of anomalies. On the other hand, based on the similarity of instances in the neighborhood, we devise an augmented update method to generate pseudo labels for the unlabeled neighbors of query instances, which enables usage of the enormous unlabeled instances during model updating. Extensive experiments on two benchmarks, CIC-IDS2017 and UNSW-NB15, demonstrate that A3PF achieves significant improvements over previous active methods in terms of the area under the receiver operating characteristic curve (AUC-ROC) (20.9% and 21.5%) and the area under the precision-recall curve (AUC-PR) (44.6% and 64.1%).
[1]Apruzzese G, Laskov P, Tastemirova A, 2022. SoK: the impact of unlabelled data in cyberthreat detection. IEEE 7th European Symp on Security and Privacy, p.20-42.
[2]Beaugnon A, Chifflier P, Bach F, 2017. ILAB: an interactive labelling strategy for intrusion detection. 20th Int Symp on Research in Attacks, Intrusions, and Defenses, p.120-140.
[3]Bilge L, Dumitras T, 2012. Before we knew it: an empirical study of zero-day attacks in the real world. Proc ACM Conf on Computer and Communications Security, p.833-844.
[4]Breunig MM, Kriegel HP, Ng RT, et al., 2000. LOF: identifying density-based local outliers. Proc ACM SIGMOD Int Conf on Management of Data, p.93-104.
[5]Das S, Islam MR, Jayakodi NK, et al., 2019. Active anomaly detection via ensembles: insights, algorithms, and interpretability. https://arxiv.org/abs/1901.08930
[6]Das S, Wong WK, Dietterich T, et al., 2020. Discovering anomalies by incorporating feedback from an expert. ACM Trans Knowl Disc Data, 14(4):1-32.
[7]Dong S, 2021. Multi class SVM algorithm with active learning for network traffic classification. Expert Syst Appl, 176:114885.
[8]Field DA, 1988. Laplacian smoothing and Delaunay triangulations. Commun Appl Numer Methods, 4(6):709-712.
[9]Gao Y, Chandra S, Li YF, et al., 2022. SACCOS: a semi-supervised framework for emerging class detection and concept drift adaption over data streams. IEEE Trans Knowl Data Eng, 34(3):1416-1426.
[10]Guerra-Manzanares A, Bahsi H, 2023. On the application of active learning for efficient and effective IoT botnet detection. Fut Gener Comput Syst, 141:40-53.
[11]Hafeez H, Khalil T, 2023. IP spoofing & its detection techniques for the prevention of DoS attacks. Recent Prog Sci Technol, 6:49-57.
[12]Hulten G, Spencer L, Domingos P, 2001. Mining time-changing data streams. Proc 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.97-106.
[13]Kathareios G, Anghel A, Mate A, et al., 2017. Catch it if you can: real-time network anomaly detection with low false alarm rates. 16th IEEE IEEE Int Conf on Machine Learning and Applications, p.924-929.
[14]Korycki Ł, Cano A, Krawczyk B, 2019. Active learning with abstaining classifiers for imbalanced drifting data streams. IEEE Int Conf on Big Data, p.2334-2343.
[15]Li B, Wang YJ, Xu KL, et al., 2022. DFAID: density-aware and feature-deviated active intrusion detection over network traffic streams. Comput Secur, 118:102719.
[16]Liu FT, Ting KM, Zhou ZH, 2008. Isolation forest. Proc 8th IEEE IEEE Int Conf on Data Mining, p.413-422.
[17]Liu TL, Qi Y, Shi L, et al., 2019. Locate-then-detect: real-time web attack detection via attention-based deep neural networks. Proc 28th Int Joint Conf on Artificial Intelligence, p.4725-4731.
[18]Mirsky Y, Doitshman T, Elovici Y, et al., 2018. Kitsune: an ensemble of autoencoders for online network intrusion detection. https://arxiv.org/abs/1802.09089
[19]Montiel J, Read J, Bifet A, et al., 2018. Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res, 19(72):1-5.
[20]Moustafa N, Slay J, 2015a. The significant features of the UNSW-NB15 and the KDD99 data sets for network intrusion detection systems. 4th Int Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, p.25-31.
[21]Moustafa N, Slay J, 2015b. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). Military Communications and Information Systems Conf, p.1-6.
[22]Pedregosa F, Varoquaux G, Gramfort A, et al., 2011. Scikit-learn: machine learning in Python. J Mach Learn Res, 12:2825-2830.
[23]Roshan S, Miche Y, Akusok A, et al., 2018. Adaptive and online network intrusion detection system using clustering and extreme learning machines. J Frankl Inst, 355(4):1752-1779.
[24]Sathe S, Aggarwal CC, 2016. Subspace outlier detection in linear time with randomized hashing. IEEE 16th Int Conf on Data Mining, p.459-468.
[25]Shahraki A, Abbasi M, Taherkordi A, et al., 2022. A comparative study on online machine learning techniques for network traffic streams analysis. Comput Netw, 207:108836.
[26]Shan JC, Zhang H, Liu WK, et al., 2019. Online active learning ensemble framework for drifted data streams. IEEE Trans Neur Netw Learn Syst, 30(2):486-498.
[27]Sharafaldin I, Lashkari AH, Ghorbani AA, 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proc 4th Int Conf on Information Systems Security and Privacy, p.108-116.
[28]Siddiqui MA, Stokes JW, Seifert C, et al., 2019. Detecting cyber attacks using anomaly detection with explanations and expert feedback. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.2872-2876.
[29]Veeramachaneni K, Arnaldo I, Korrapati V, et al., 2016. AI2: training a big data machine to defend. IEEE 2nd Int Conf on Big Data Security on Cloud, IEEE Int Conf on High Performance and Smart Computing, and IEEE Int Conf on Intelligent Data and Security, p.49-54.
[30]Viegas E, Santin A, Bessani A, et al., 2019. BigFlow: real-time and reliable anomaly-based intrusion detection for high-speed networks. Fut Gener Comput Syst, 93:473-485.
[31]Wang ZY, Wang YJ, Huang ZY, et al., 2021. Entropy and autoencoder-based outlier detection in mixed-type network traffic data. IEEE Int Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, p.501-508.
[32]Wu YH, Fang YZ, Shang SK, et al., 2021. A novel framework for detecting social bots with deep neural networks and active learning. Knowl-Based Syst, 211:106525.
[33]Yan XY, Homaifar A, Sarkar M, et al., 2021. A clustering-based framework for classifying data streams. https://arxiv.org/abs/2106.11823
[34]Zhao Y, Nasrullah Z, Li Z, 2019. PyOD: a Python toolbox for scalable outlier detection. J Mach Learn Res, 20:1-7.
Open peer comments: Debate/Discuss/Question/Opinion
<1>