CLC number: TP393.098
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2012-02-08
Cited: 1
Clicked: 7574
Xi-chuan Zhou, Hai-bin Shen, Zhi-yong Huang, Guo-jun Li. Large margin classification for combating disguise attacks on spam filters[J]. Journal of Zhejiang University Science C, 2012, 13(3): 187-195.
@article{title="Large margin classification for combating disguise attacks on spam filters",
author="Xi-chuan Zhou, Hai-bin Shen, Zhi-yong Huang, Guo-jun Li",
journal="Journal of Zhejiang University Science C",
volume="13",
number="3",
pages="187-195",
year="2012",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1100259"
}
%0 Journal Article
%T Large margin classification for combating disguise attacks on spam filters
%A Xi-chuan Zhou
%A Hai-bin Shen
%A Zhi-yong Huang
%A Guo-jun Li
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 3
%P 187-195
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1100259
TY - JOUR
T1 - Large margin classification for combating disguise attacks on spam filters
A1 - Xi-chuan Zhou
A1 - Hai-bin Shen
A1 - Zhi-yong Huang
A1 - Guo-jun Li
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 3
SP - 187
EP - 195
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1100259
Abstract: This paper addresses the challenge of large margin classification for spam filtering in the presence of an adversary who disguises the spam mails to avoid being detected. In practice, the adversary may strategically add good words indicative of a legitimate message or remove bad words indicative of spam. We assume that the adversary could afford to modify a spam message only to a certain extent, without damaging its utility for the spammer. Under this assumption, we present a large margin approach for classification of spam messages that may be disguised. The proposed classifier is formulated as a second-order cone programming optimization. We performed a group of experiments using the TREC 2006 Spam Corpus. Results showed that the performance of the standard support vector machine (SVM) degrades rapidly when more words are injected or removed by the adversary, while the proposed approach is more stable under the disguise attack.
[1]Carpinter, J., Hunt, R., 2006. Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur., 25(8):566-578.
[2]Chang, C., Lin, C., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2:27:1-27:27.
[3]Chapelle, O., 2007. Training a support vector machine in the primal. Neur. Comput., 19(5):1155-1178.
[4]Chechik, G., Heitz, G., Elidan, G., Abbeel, P., Koller, D., 2008. Max-margin classification of data with absent features. J. Mach. Learn. Res., 9:1-21.
[5]Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D., 2004. Adversarial Classification. Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.99-108.
[6]Debnath, R., Muramatsu, M., Takahashi, H., 2004. The Support Vector Machine Learning Using the Second Order Cone Programming. Proc. IEEE Int. Joint Conf. on Neural Networks, 4:2991-2996.
[7]Drucker, H., Wu, D., Vapnik, V.N., 1999. Support vector machines for spam categorization. IEEE Trans. Neur. Networks, 10(5):1048-1054.
[8]Jennings, R., 2005. The Global Economic Impact of Spam. Technical Report, Ferris Research, San Diego, CA, USA.
[9]Jorgensen, Z., Zhou, Y., Inge, M., 2008. A multiple instance learning strategy for combating good word attacks on spam filters. J. Mach. Learn. Res., 9:1115-1146.
[10]Krause, N., Singer, Y., 2004. Leveraging the Margin More Carefully. Int. Conf. on Machine Learning.
[11]Lowd, D., Meek, C., 2005a. Adversarial Learning. Proc. 11th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining.
[12]Lowd, D., Meek, C., 2005b. Good Word Attacks on Statistical Spam Filters. Proc. 2nd Conf. on Email and Anti-Spam.
[13]MOSEK, 2011. The MOSEK Optimization Tools Version 6.0. User’s Manual and Reference 2011. Available from www.mosek.com
[14]Shivaswamy, P.K., Bhattacharyya, C., Smola, A.J., 2006. Second order cone programming approaches for handling missing and uncertain data. J. Mach. Learn. Res., 7:1283-1314.
[15]Song, Q., Hu, W., Xie, W., 2002. Robust support vector machine with bullet hole image classification. IEEE Trans. Syst. Man Cybern. C, 32(4):440-448.
[16]Webb, S., Chitti, S., Pu, C., 2005. An Experimental Evaluation of Spam Filter Performance and Robustness Against Attack. 1st Int. Conf. on Collaborative Computing: Networking, Applications and Worksharing, p.19-21.
[17]Wu, Y., Liu, Y., 2007. Robust truncated hinge loss support vector machines. J. Am. Stat. Assoc., 102(479):974-983.
[18]Xu, L., Crammer, K., Schuurmans, D., 2006. Robust Support Vector Machine Training via Convex Outlier Ablation. Proc. National Conf. of Artificial Intelligence, p.1-7.
Open peer comments: Debate/Discuss/Question/Opinion
<1>