Data Stream Classification Using Classifier Ensemble
cytuj
pobierz pliki
RIS BIB ENDNOTEChoose format
RIS BIB ENDNOTEData Stream Classification Using Classifier Ensemble
Publication date: 14.04.2015
Schedae Informaticae, 2014, Volume 23, pp. 21 - 32
https://doi.org/10.4467/20838476SI.14.002.3019Authors
Data Stream Classification Using Classifier Ensemble
For the contemporary business, the crucial factor is making smart decisions on the basis of the knowledge hidden in stored data. Unfortunately,m traditional simple methods of data analysis are not sufficient for efficient management of modern enterprizes, because they are not appropriate for the huge and growing amount of the data stored by them. Additionally data usually comes continuously in the form of so-called data stream. The great disadvantage of traditional classification methods is that they assume that statistical properties of the discovered concept are being unchanged, while in real situation, we could observe so-called concept drift, which could be caused by changes in the probabilities of classes or/and conditional probability distributions of classes. The potential for considering new training data is an important feature of machine learning methods used in security applications (spam filtering or intrusion detection) or decision support systems for marketing departments, which need to follow the changing client behavior. Unfortunately, the occurrence of concept drift dramatically decreases classification accuracy. This work presents the comprehensive study on the ensemble classifier approach applied to the problem of drifted data streams. Especially it reports the research on modifications of previously developed Weighted Aging Classifier Ensemble (WAE) algorithm, which is able to construct a valuable classifier ensemble for classification of incremental drifted stream data. We generalize WAE method and propose the general framework for this approach. Such framework can prune an classifier ensemble before or after assigning weights to individual classifiers. Additionally, we propose new classifier pruning criteria, weight calculation methods, and aging operators. We also propose rejuvenating operator, which is able to soften the aging effect, which could be useful, especially in the case if quite ”old” classifiers are high quality models, i.e., their presence increases ensemble accuracy, what could be found, e.g., in the case of recurring concept drift. The chosen characteristics of the proposed frameworks were evaluated on the basis of the wide range of computer experiments carried out on the two benchmark data streams. Obtained results confirmed the usability of proposed method to the data stream classification with the presence of incremental concept drift.
Domingos P., Hulten G., A general framework for mining massive data streams, Journal of Computational and Graphical Statistics 12, 2003, pp. 945–949.
Widmer G., Kubat M., Learning in the presence of concept drift and hidden contexts, Mach. Learn. 23 (1), Apr. 1996, pp. 69–101.
Kifer D., Ben-David S., Gehrke J., Detecting change in data streams, Proceedings of the Thirtieth international conference on Very large data bases - Vol. 30, ser. VLDB ’04. VLDB Endowment, 2004, pp. 180–191.
Tsymbal A., Pechenizkiy M., Cunningham P., Puuronen S., Dynamic integration of classifiers for handling concept drift, Inf. Fusion 9 (1), Jan. 2008, pp. 56–68.
Littlestone N., Warmuth M.K., The weighted majority algorithm, Inf. Comput. 108 (2), Feb. 1994, pp. 212–261.
Bifet A., Holmes G., Pfahringer B., Read J., Kranen P., Kremer H., Jansen T., Seidl T., Moa: A real-time analytics open source framework, Proc. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2011), Athens, Greece, Springer Heidelberg, Germany, 2011, pp. 617–620.
Jackowski K., Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers, Pattern Analysis and Applications, 2013, pp. 1–16.
Street W.N., Kim Y., A streaming ensemble algorithm (sea) for large-scale classification, Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’01 ACM. New York, NY, USA, 2001, pp. 377–382.
Wang H., Fan W., Yu P.S., Han J., Mining concept-drifting data streams using ensemble classifiers, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ser. KDD ’03 ACM. New York, NY, USA, 2003, pp. 226–235.
Kolter J., Maloof M., Dynamic weighted majority: A new ensemble method for tracking concept drift, in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, Nov. 2003, pp. 123 – 130.
Wozniak M., Kasprzak A., Cal P., Application of combined classifiers to data stream classification, Proceedings of the 10th International Conference on Flexible Query Answering Systems FQAS 2013, ser. LNCS Springer-Verlag. Berlin– Heidelberg, 2013, in press.
Klinkenberg R., Renz I., Adaptive information filtering: Learning in the presence of concept drifts, AAAI Technical Report WS-98-05, 1998, pp. 33–40.
Wozniak M., Hybrid Classifiers – Methods of Data, Knowledge, and Classifier Combination, ser. Studies in Computational Intelligence, Springer 519, 2014.
Kuncheva L.I., Combining Pattern Classifiers: Methods and Algorithms, Wiley-Interscience, Hoboken, New Jersey, USA, 2004.
X.X., Stream data mining repository, http://www.cse.fau.edu/˜xqzhu/stream.html, 2010.
Quinlan J., C4.5: Programs for Machine Learning, ser. Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann Publishers, London, England, 1993.
Platt J.C., Advances in kernel methods, B. Sch¨olkopf, C.J.C. Burges, A.J. Smola (Eds.) MIT Press Cambridge, MA, USA, 1999, ch. Fast training of support vector machines using sequential minimal optimization, pp. 185–208.
Le Cessie S., Van Houwelingen J., Ridge estimators in logistic regression, Applied statistics, 1992, pp. 191–201.
Holte R.C., Very simple classification rules perform well on most commonly used datasets, Machine Learning 11, 1993, pp. 63–91.
Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I.H., The Weka data mining software: An update, SIGKDD Explor. Newsl. 11 (1), Nov. 2009, pp. 10–18.
Information: Schedae Informaticae, 2014, Volume 23, pp. 21 - 32
Article type: Original article
Titles:
Data Stream Classification Using Classifier Ensemble
Data Stream Classification Using Classifier Ensemble
Department of Systems and Computer Networks, Wroclaw University of Technology, Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland
Department of Systems and Computer Networks, Wroclaw University of Technology, Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland
Published at: 14.04.2015
Article status: Open
Licence: None
Percentage share of authors:
Article corrections:
-Publication languages:
EnglishView count: 3055
Number of downloads: 1880