Full Text:   <1018>

Summary:  <345>

CLC number: TP311

On-line Access: 2015-09-06

Received: 2014-11-02

Revision Accepted: 2015-05-27

Crosschecked: 2015-08-10

Cited: 0

Clicked: 2454

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Hong Yin

http://orcid.org/0000-0002-0682-6781

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2015 Vol.16 No.9 P.744-758

http://doi.org/10.1631/FITEE.1400376


Symbolic representation based on trend features for knowledge discovery in long time series


Author(s):  Hong Yin, Shu-qiang Yang, Xiao-qian Zhu, Shao-dong Ma, Lu-min Zhang

Affiliation(s):  1College of Computer, National University of Defense Technology, Changsha 410073, China; more

Corresponding email(s):   yinhonggfkd@aliyun.com

Key Words:  Long time series, Segmentation, Trend features, Symbolic, Knowledge discovery


Hong Yin, Shu-qiang Yang, Xiao-qian Zhu, Shao-dong Ma, Lu-min Zhang. Symbolic representation based on trend features for knowledge discovery in long time series[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(9): 744-758.

@article{title="Symbolic representation based on trend features for knowledge discovery in long time series",
author="Hong Yin, Shu-qiang Yang, Xiao-qian Zhu, Shao-dong Ma, Lu-min Zhang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="9",
pages="744-758",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1400376"
}

%0 Journal Article
%T Symbolic representation based on trend features for knowledge discovery in long time series
%A Hong Yin
%A Shu-qiang Yang
%A Xiao-qian Zhu
%A Shao-dong Ma
%A Lu-min Zhang
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 9
%P 744-758
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1400376

TY - JOUR
T1 - Symbolic representation based on trend features for knowledge discovery in long time series
A1 - Hong Yin
A1 - Shu-qiang Yang
A1 - Xiao-qian Zhu
A1 - Shao-dong Ma
A1 - Lu-min Zhang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 9
SP - 744
EP - 758
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1400376


Abstract: 
The symbolic representation of time series has attracted much research interest recently. The high dimensionality typical of the data is challenging, especially as the time series becomes longer. The wide distribution of sensors collecting more and more data exacerbates the problem. Representing a time series effectively is an essential task for decision-making activities such as classification, prediction, and knowledge discovery. In this paper, we propose a new symbolic representation method for long time series based on trend features, called trend feature symbolic approximation (TFSA). The method uses a two-step mechanism to segment long time series rapidly. Unlike some previous symbolic methods, it focuses on retaining most of the trend features and patterns of the original series. A time series is represented by trend symbols, which are also suitable for use in knowledge discovery, such as association rules mining. TFSA provides the lower bounding guarantee. Experimental results show that, compared with some previous methods, it not only has better segmentation efficiency and classification accuracy, but also is applicable for use in knowledge discovery from time series.

基于趋势特征的时间序列符号化方法

目的:提出一种通用方法用于长时间序列的知识发现过程。
创新点:提出一种基于并行分割的时间序列符号化方法—趋势特征符号化近似法(trend feature symbolic approximation, TFSA),对长时间序列进行快速分割,并且保留原始序列大多数趋势特征,将分割后的子序列用特征符号表示。本文的贡献在于改进了长时间序列的分割效率,而且TFSA专注于保留原始时间序列的大多数趋势特征,使得挖掘后的规则更加容易理解和解释。
方法:首先,通过一个两步(two-step)分割机制将时间序列分割成一系列不等长的子序列。然后,采用趋势特征符号化近似(TFSA)将子序列符号化并获得符号项集。最后通过一个基于apriori的关联规则算法来实现时序数据的知识发现。
结论:针对长时间序列,基于累积和控制图方法研究一种海量数据环境下序列的并行分割机制。可以通过分布式结点来实现,随结点数增加,其效率将进一步提高。TFSA符号化方法不同于传统的方法,它致力于保留原始时间序列的大部分趋势特征及模式,通过规定的趋势符号来表示时间序列,并且其表达方式也考虑后续的时间序列挖掘研究。实验证明,本文方法在时间序列的分割效率以及分类准确性上相比于已有的方法均有所提高。

关键词:长时间序列;分割;趋势特征;符号化;知识发现

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Agrawal, R., Srikant, R., 1995. Mining sequential patterns. Proc. 11th Int. Conf. on Data Engineering, p.3-14.

[2]André-Jönsson, H., Badal, D.Z., 1997. Using signature files for querying time-series data. Proc. 1st European Symp. on Principles of Data Mining and Knowledge Discovery, p.211-220.

[3]Bao, D., Yang, Z., 2008. Intelligent stock trading system by turning point confirming and probabilistic reasoning. Expert Syst. Appl., 34(1):620-627.

[4]Borgelt, C., Kruse, R., 2002. Induction of association rules: apriori implementation. Proc. Computational Statistics, p.395-400.

[5]Bu, Y., Chen, L., Fu, A.W.C., et al., 2009. Efficient anomaly monitoring over moving object trajectory streams. Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.159-168.

[6]Chan, K.P., Fu, A.W.C., 1999. Efficient time series matching by wavelets. Proc. 15th Int. Conf. on Data Engineering, p.126-133.

[7]Dasgupta, D., Forrest, S., 1996. Novelty detection in time series data using ideas from immunology. Proc. 5th Int. Conf. on Intelligent Systems, p.82-87.

[8]Esling, P., Agon, C., 2012. Time-series data mining. ACM Comput. Surv., 45(1), Article 12.

[9]Faloutsos, C., Ranganathan, M., Manolopoulos, Y., 1994. Fast subsequence matching in time-series databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.419-429.

[10]Guimarães, G., Ultsch, A., 1999. A method for temporal knowledge conversion. Proc. 3rd Int. Symp. on Advances in Intelligent Data Analysis, p.369-380.

[11]Guimarães, G., Peter, J.H., Penzel, T., et al., 2001. A method for automated temporal knowledge acquisition applied to sleep-related breathing disorders. Artif. Intell. Med., 23(3):211-237.

[12]Kadous, M.W., 1999. Learning comprehensible descriptions of multivariate time series. Proc. 16th Int. Conf. of Machine Learning, p.454-463.

[13]Keogh, E., Chakrabarti, K., Pazzani, M., et al., 2001. Locally adaptive dimensionality reduction for indexing large time series databases. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.151-162.

[14]Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y., 2005. Continuous trend-based classification of streaming time series. Proc. 9th East European Conf. on Advances in Databases and Information Systems, p.294-308.

[15]Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y., 2008. Continuous trend-based clustering in data streams. Proc. 10th Int. Conf. on Data Warehousing and Knowledge Discovery, p.251-262.

[16]Korn, F., Jagadish, H.V., Faloutsos, C., 1997. Efficiently supporting ad hoc queries in large datasets of time sequences. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.289-300.

[17]Lavielle, M., Teyssière, G., 2006. Detection of multiple change-points in multivariate time series. Lithuan. Math. J., 46(3):287-306.

[18]Lin, J., Keogh, E., Lonardi, S., et al., 2003. A symbolic representation of time series, with implications for streaming algorithms. Proc. 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, p.2-11.

[19]Manganaris, S., 1997. Supervised Classification with Temporal Data. PhD Thesis, Vanderbilt University, USA.

[20]Mannila, H., Toivonen, H., 1996. Discovering generalized episodes using minimal occurrences. Proc. Int. Conf. on Knowledge Discovery and Data Mining, p.146-151.

[21]Mellit, A., Pavan, A.M., Benghanem, M., 2013. Least squares support vector machine for short-term prediction of meteorological time series. Theor. Appl. Climatol., 111(1-2):297-307.

[22]Moody, G.B., Mark, R.G., 1983. A new method for detecting atrial fibrillation using RR intervals. Comput. Cardiol., 10:227-230.

[23]Phetking, C., Noor Md Sap, M., Selamat, A., 2008. A multiresolution important point retrieval method for financial time series representation. Proc. Int. Conf. on Computer and Communication Engineering, p.510-515.

[24]Poll, S., de Kleer, J., Feldman, A., et al., 2010. Second international diagnostics competition—DXC’10. Proc. 21st Int. Workshop on Principles of Diagnosis, p.1-15.

[25]Sarkar, S., Mukherjee, K., Sarkar, S., et al., 2013. Symbolic dynamic analysis of transient time series for fault detection in gas turbine engines. J. Dynam. Syst., Meas. Contr., 135(1):014506.1-014506.6.

[26]Villafane, R., Hua, K.A., Tran, D., et al., 2000. Knowledge discovery from series of interval events. J. Intell. Inform. Syst., 15(1):71-89.

[27]Vullings, H.J.L.M., Verhaegen, M.H.G., Verbruggen, H.B., 1997. ECG segmentation using time-warping. Proc. 2nd Int. Symp. on Advances in Intelligent Data Analysis Reasoning about Data, p.275-285.

[28]Yeh, A.B., Lin, D.K.J., Venkataramani, C., 2004. Unified CUSUM charts for monitoring process mean and variability. Qual. Technol. Quant. Manag., 1(1):65-86.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE