CLC number: TP393.08
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2010-08-02
Cited: 2
Clicked: 8710
Long-zheng Cai, Jian Chen, Yun Ke, Tao Chen, Zhi-gang Li. A new data normalization method for unsupervised anomaly intrusion detection[J]. Journal of Zhejiang University Science C, 2010, 11(10): 778-784.
@article{title="A new data normalization method for unsupervised anomaly intrusion detection",
author="Long-zheng Cai, Jian Chen, Yun Ke, Tao Chen, Zhi-gang Li",
journal="Journal of Zhejiang University Science C",
volume="11",
number="10",
pages="778-784",
year="2010",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C0910625"
}
%0 Journal Article
%T A new data normalization method for unsupervised anomaly intrusion detection
%A Long-zheng Cai
%A Jian Chen
%A Yun Ke
%A Tao Chen
%A Zhi-gang Li
%J Journal of Zhejiang University SCIENCE C
%V 11
%N 10
%P 778-784
%@ 1869-1951
%D 2010
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C0910625
TY - JOUR
T1 - A new data normalization method for unsupervised anomaly intrusion detection
A1 - Long-zheng Cai
A1 - Jian Chen
A1 - Yun Ke
A1 - Tao Chen
A1 - Zhi-gang Li
J0 - Journal of Zhejiang University Science C
VL - 11
IS - 10
SP - 778
EP - 784
%@ 1869-1951
Y1 - 2010
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C0910625
Abstract: unsupervised anomaly detection can detect attacks without the need for clean or labeled training data. This paper studies the application of clustering to unsupervised anomaly detection (ACUAD). Data records are mapped to a feature space. Anomalies are detected by determining which points lie in the sparse regions of the feature space. A critical element for this method to be effective is the definition of the distance function between data records. We propose a unified normalization distance framework for records with numeric and nominal features mixed data. A heuristic method that computes the distance for nominal features is proposed, taking advantage of an important characteristic of nominal features—their probability distribution. Then, robust methods are proposed for mapping numeric features and computing their distance, these being able to tolerate the impact of the value difference in scale and diversification among features, and outliers introduced by intrusions. Empirical experiments with the KDD 1999 dataset showed that ACUAD can detect intrusions with relatively low false alarm rates compared with other approaches.
[1]Cansado, A., Soto, A., 2008. Unsupervised anomaly detection in large databases using Bayesian networks. Appl. Artif. Intell., 22(4):309-330.
[2]Eskin, E., 2000. Anomaly Detection over Noisy Data Using Learned Probability Distributions. Proc. Int. Conf. on Machine Learning, p.255-262.
[3]Eskin, E., Arnold, A., Prerau, M., Portony, L., Stolfo, S., 2002. A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In: Barbara, E., Jajodia, S. (Eds.), Applications of Data Mining in Computer Security. Kluwer Academic Publishers, Norwell, MA, USA, p.272.
[4]Ismail, A.S.H., Abdullah, A.H., Bak, K.B.A., Nqudi, M.A., Dahlan, D., Chimphlee, W., 2008. A Novel Method for Unsupervised Anomaly Detection Using Unlabelled Data. Proc. Int. Conf. on Computational Sciences and Its Applications., p.252-260.
[5]Knorr, E.M., 2002. Outliers and Data Mining: Finding Exceptions in Data. PhD Thesis, University of British Columbia, Canada, p.74.
[6]Kwitt, R., Hofmann, U., 2007. Unsupervised Anomaly Detection in Network Traffic by Means of Robust PCA. Proc. Int. Multi-Conf. on Computing in the Global Information Technology, p.37-41.
[7]Leung, K., Leckie, C., 2005. Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters. Proc. 28th Australasian Conf. on Computer Science, 102:333-342.
Open peer comments: Debate/Discuss/Question/Opinion
<1>