Full Text:   <1446>

Summary:  <1043>

CLC number: TP391

On-line Access: 2015-12-07

Received: 2014-11-19

Revision Accepted: 2015-04-15

Crosschecked: 2015-11-11

Cited: 0

Clicked: 3909

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Omid Abbaszadeh

http://orcid.org/0000-0002-8923-940X

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2015 Vol.16 No.12 P.1059-1068

http://doi.org/10.1631/FITEE.1400398


An ensemble method for data stream classification in the presence of concept drift


Author(s):  Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori

Affiliation(s):  Department of Computer Engineering, University of Zanjan, Zanjan 45371-38791, Iran

Corresponding email(s):   o.abbaszadeh@znu.ac.ir, a_amiri@znu.ac.ir, khanteymoori@znu.ac.ir

Key Words:  Data stream, Classificaion, Ensemble classifiers, Concept drift


Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori. An ensemble method for data stream classification in the presence of concept drift[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(12): 1059-1068.

@article{title="An ensemble method for data stream classification in the presence of concept drift",
author="Omid Abbaszadeh, Ali Amiri, Ali Reza Khanteymoori",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="12",
pages="1059-1068",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1400398"
}

%0 Journal Article
%T An ensemble method for data stream classification in the presence of concept drift
%A Omid Abbaszadeh
%A Ali Amiri
%A Ali Reza Khanteymoori
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 12
%P 1059-1068
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1400398

TY - JOUR
T1 - An ensemble method for data stream classification in the presence of concept drift
A1 - Omid Abbaszadeh
A1 - Ali Amiri
A1 - Ali Reza Khanteymoori
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 12
SP - 1059
EP - 1068
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1400398


Abstract: 
One recent area of interest in computer science is data stream management and processing. By ‘data stream’, we refer to continuous and rapidly generated packages of data. Specific features of data streams are immense volume, high production rate, limited data processing time, and data concept drift; these features differentiate the data stream from standard types of data. An issue for the data stream is classification of input data. A novel ensemble classifier is proposed in this paper. The classifier uses base classifiers of two weighting functions under different data input conditions. In addition, a new method is used to determine drift, which emphasizes the precision of the algorithm. Another characteristic of the proposed method is removal of different numbers of the base classifiers based on their quality. Implementation of a weighting mechanism to the base classifiers at the decision-making stage is another advantage of the algorithm. This facilitates adaptability when drifts take place, which leads to classifiers with higher efficiency. Furthermore, the proposed method is tested on a set of standard data and the results confirm higher accuracy compared to available ensemble classifiers and single classifiers. In addition, in some cases the proposed classifier is faster and needs less storage space.

The paper discusses an interesting problem of data stream concept drifting. The paper uses ensemble models to handle continuous data streams, and a new weighting schema to drop outdated classifiers.

一种概念漂移情况下数据流分类的整体方法

目的:数据流(data stream)管理和处理是计算机科学领域的热点问题。本文提及的“数据流”指连续且快速生成的数据包。数据流的专有特性有数据量极大、生成率高、处理时间有限和数据概念漂移(concept drift)等。这些特性将数据流区别于其他标准数据形式。数据流的一个重要问题即为输入数据分类。本文提出一种新型的整体分类器(ensemble classifier)。
创新点:在数据流分类器的基础上,提出一种包含概念漂移检测、基分类器移除和动态加权机制的方法。
方法:(1)针对不同数据输入条件,对基分类器使用两种加权函数;(2)利用Kappa系数确定概念漂移,提升算法精度;(3)基于基分类器的质量,移除不同数目的基分类器;(4)在决策阶段对基分类器应用加权机制,提升算法对漂移的适应性,提高分类器效率。
结论:在标准数据集上测试,本文方法较现有整体分类器和单分类器可获得更高的精度;在某些情况下可节省运行时间和内存用量。

关键词:数据流;分类;整体分类器;概念漂移

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Baena-García, M., del Campo-Ávila, J., Fidalgo, R., et al., 2006. Early drift detection method. ECML PKDD.

[2]Bifet, A., 2009. Adaptive learning and mining for data streams and frequent patterns. ACM SIGKDD Explor. Newsl., 11(1):55-56.

[3]Bifet, A., Holmes, G., Kirkby, R., et al., 2010. MOA: massive online analysis. J. Mach. Learn. Res., 11:1601-1604.

[4]Brzezinski, D., Stefanowski, J., 2014. Reacting to different types of concept drift: the accuracy updated ensemble algorithm. IEEE Trans. Neur. Netw. Learn. Syst., 25(1):81-94.

[5]Gama, J., 2010. Knowledge Discovery from Data Streams. Chapman & Hall/CRC, London.

[6]Gama, J., Medas, P., Castillo, G., et al., 2004. Learning with drift detection. Brazilian Symp. on Artificial Intelligence, p.286-295.

[7]Hulten, G., Spencer, L., Domingos, P., 2001. Mining time-changing data streams. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery Data Mining, p.97-106.

[8]Jiang, T., Feng, Y.C., Zhang, B., et al., 2009. Monitoring correlative financial data streams by local pattern similarity. J. Zhejiang Univ.-Sci. A, 10(7):937-951.

[9]Kolter, J.Z., Maloof, M.A., 2007. Dynamic weighted majority: an ensemble method for drifting concepts. J. Mach. Learn. Res., 8:2755-2790.

[10]Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Hoboken.

[11]Minku, L.L., Yao, X., 2012. DDD: a new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng., 24(4):619-633.

[12]Oza, N.C., 2005. Online bagging and boosting. IEEE Int. Conf. on System and Man Cybernetics, p.2340-2345.

[13]Ruping, S., 2001. Incremental learning with support vector machines. IEEE 13th Int. Conf. on Data Mining, p.641-642.

[14]Sim, J., Wright, C.C., 2005. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Phys. Ther., 85(3):257-268.

[15]Street, W.N., Kim, Y.S., 2001. A streaming ensemble algorithm (SEA) for large-scale classification. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.377-382.

[16]Tsymbal, A., Pechenizkiy, M., Cunningham, P., et al., 2008. Dynamic integration of classifiers for handling concept drift. Inform. Fus., 9(1):56-68.

[17]Wang, H., Fan, W., Yu, P.S., et al., 2003. Mining concept-drifting data streams using ensemble classifiers. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.226-235.

[18]Xu, W.H., Qin, Z., Chang, Y., 2011. Clustering feature decision trees for semi-supervised classification from high-speed data streams. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 12(8):615-628.

[19]Zhang, P., Zhu, X., Shi, Y., 2008. Categorizing and mining concept drifting data streams. Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.812-820.

[20]Zhang, P., Zhou, C., Wang, P., et al., 2015. E-tree: an efficient indexing structure for ensemble models on data streams. IEEE Trans. Knowl. Data Eng., 27(2):461-474.

[21]Zhu, X., Zhang, P., Lin, X., et al., 2010. Active learning from stream data using optimal weight classifier ensemble. IEEE Trans. Syst. Man Cybern. B, 40(6):1607-1621.

[22]Žliobaite, I., 2009. Learning under Concept Drift: an Overview. Technical Report. Vilnius University, Lithuania.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE