An Empirical Study of Online Learning in Non-stationary Data Streams Using Ensemble of Ensembles

Radhika V. Kulkarni, S. Revathy, Suhas H. Patil

Abstract


Numerous information system applications produce a huge amount of non-stationary streaming data that demand real-time analytics. Classification of data streams engages supervised models to learn from a continuous infinite flow of labeled observations. The critical issue of such learning models is to handle dynamicity in data streams where the data instances undergo distributional change called concept drift. The online learning approach is essential to cater to learning in the streaming environment as the learning model is built and functional without the complete data for training in the beginning. Also, the ensemble learning method has proven to be successful in responding to evolving data streams. A multiple learner scheme boosts a single learner's prediction by integrating multiple base learners that outperform each independent learner. The proposed algorithm EoE (Ensemble of Ensembles) is an integration of ten seminal ensembles. It employs online learning with the majority voting to deal with the binary classification of non-stationary data streams. Utilizing the learning capabilities of individual sub ensembles and overcoming their limitations as an individual learner, the EoE makes a better prediction than that of its sub ensembles. The current communication empirically and statistically analyses the performance of the EoE on different figures of merits like accuracy, sensitivity, specificity, G-mean, precision, F1-measure, balanced accuracy, and overall performance measure when tested on a variety of real and synthetic datasets. The experimental results claim that the EoE algorithm outperforms its state-of-the-art independent sub ensembles in classifying non-stationary data streams.

Keywords


Concept drift; data stream; ensemble; non-stationary data classification; online learning.

Full Text:

PDF

References


A. Boukhalfa, N. Hmina, and H. Chaoui, “Parallel processing using big data and machine learning techniques for intrusion detection,†IAES Int. J. Artif. Intell., vol. 9, no. 3, pp. 553–560, 2020, doi: 10.11591/ijai.v9.i3.pp553-560.

E. De Luca, F. Fallucchi, R. Giuliano, G. Incarnato, and F. Mazzenga, “Analysing and visualizing tweets for U.S. president popularity,†Int. J. Adv. Sci. Eng. Inf. Technol., vol. 9, no. 2, pp. 692–699, 2019, doi: 10.18517/ijaseit.9.2.8284.

J. Sadhasivam and R. B. Kalivaradhan, “Sentiment analysis of Amazon products using ensemble machine learning algorithm,†Int. J. Math. Eng. Manag. Sci., vol. 4, no. 2, pp. 508–520, 2019, doi: 10.33889/ijmems.2019.4.2-041.

P. D. Talagala, R. J. Hyndman, K. Smith-Miles, S. Kandanaarachchi, and M. A. Muñoz, “Anomaly Detection in Streaming Nonstationary Temporal Data,†J. Comput. Graph. Stat., vol. 29, no. 1, pp. 13–27, Jan. 2020, doi: 10.1080/10618600.2019.1617160.

C.-C. Lin, D.-J. Deng, C.-H. Kuo, and L. Chen, “Concept drift detection and adaption in big imbalance industrial IoT data using an ensemble learning method of offline classifiers,†IEEE Access, vol. 7, pp. 56198–56207, 2019, doi: 10.1109/ACCESS.2019.2912631.

H. Zhang and Q. Liu, “Online learning method for drift and imbalance problem in client credit assessment,†Symmetry (Basel)., vol. 11, no. 890, Jul. 2019, doi: 10.3390/sym11070890.

J. Sun, H. Li, H. Fujita, B. Fu, and W. Ai, “Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting,†Inf. Fusion, vol. 54, pp. 128–144, 2020, doi: 10.1016/j.inffus.2019.07.006.

M. Bahri, A. Bifet, J. Gama, H. M. Gomes, and S. Maniu, “Data stream analysis: Foundations, major tasks and tools,†Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 11, no. 3, p. e1405, May 2021, doi: 10.1002/widm.1405.

E. Alothali, H. Alashwal, and S. Harous, “Data stream mining techniques: a review,†TELKOMNIKA, vol. 17, no. 2, pp. 728–737, 2019, doi: 10.12928/TELKOMNIKA.v17i2.11752.

S. Ancy and D. Paulraj, “Handling imbalanced data with concept drift by applying dynamic sampling and ensemble classification model,†Comput. Commun., vol. 153, pp. 553–560, 2020, doi: 10.1016/j.comcom.2020.01.061.

G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,†Expert Syst. Appl., vol. 73, pp. 220–239, 2017, doi: 10.1016/j.eswa.2016.12.035.

I. Khamassi, M. Sayed-Mouchaweh, M. Hammami, and K. Ghédira, “Discussion and review on evolving data streams and concept drift adapting,†Evol. Syst., vol. 9, pp. 1–23, 2018, doi: 10.1007/s12530-016-9168-2.

J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, “Learning under concept drift: A review,†IEEE Trans. Knowl. Data Eng., vol. 31, no. 12, pp. 2346–2363, Dec. 2019, doi: 10.1109/TKDE.2018.2876857.

H. M. Gomes, J. P. Barddal, F. Enembreck, and A. Bifet, “A survey on ensemble learning for data stream classification,†ACM Comput. Surv., vol. 50, no. 2, 2017, doi: 10.1145/3054925.

B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Woźniak, “Ensemble learning for data stream analysis: A survey,†Inf. Fusion, vol. 37, pp. 132–156, 2017, doi: 10.1016/j.inffus.2017.02.004.

R. Jani, N. Bhatt, and C. Shah, “A survey on issues of data stream mining in classification,†in Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 1, Smart Innovation, Systems and Technologies, vol. 83, S. C. Satapathy and A. Joshi, Eds. Ahmedabad, India: Springer, Cham, 2018, pp. 137–143.

Janardan and S. Mehta, “Concept drift in streaming data classification: Algorithms, platforms and issues,†Procedia Comput. Sci., vol. 122, pp. 804–811, 2017, doi: 10.1016/j.procs.2017.11.440.

V. Losing, B. Hammer, and H. Wersing, “Incremental online learning: A review and comparison of state of the art algorithms,†Neurocomputing, vol. 275, pp. 1261–1274, 2018, doi: 10.1016/j.neucom.2017.06.084.

R. S. M. de Barros and S. G. T. d. C. Santos, “An overview and comprehensive comparison of ensembles for concept drift,†Inf. Fusion, vol. 52, pp. 213–244, Dec. 2019, doi: 10.1016/j.inffus.2019.03.006.

M. M. Idrees, L. L. Minku, F. Stahl, and A. Badii, “A heterogeneous online learning ensemble for non-stationary environments,†Knowledge-Based Syst., vol. 188, p. 104983, Jan. 2020, doi: 10.1016/j.knosys.2019.104983.

N. Littlestone and M. K. Warmuth, “The Weighted Majority algorithm,†Inf. Comput., vol. 108, no. 2, pp. 212–261, 1994, doi: 10.1006/inco.1994.1009.

H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,†in Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03), 2003, pp. 226–235, doi: https://doi.org/10.1145/956750.956778.

J. Kolter and M. Maloof, “Dynamic Weighted Majority : An ensemble method for drifting concepts,†J. Mach. Learn. Res., vol. 8, pp. 2755–2790, 2007, doi: 10.1.1.140.2481.

P. R. L. Almeida, L. S. Oliveira, A. S. Britto Jr., and R. Sabourin, “Adapting dynamic classifier selection for concept drift,†Expert Syst. Appl., vol. 104, pp. 67–85, 2018, doi: 10.1016/j.eswa.2018.03.021.

S. Ren, B. Liao, W. Zhu, Z. Li, W. Liu, and K. Li, “The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift,†Neurocomputing, vol. 286, pp. 150–166, 2018, doi: 10.1016/j.neucom.2018.01.063.

A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà , “New ensemble methods for evolving data streams,†in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09), 2009, pp. 139–148, doi: https://doi.org/10.1145/1557019.1557041.

H. M. Gomes et al., “Adaptive random forests for evolving data stream classification,†Mach. Learn., 2017, doi: 10.1007/s10994-017-5642-8.

R. S. M. de Barros, J. I. G. Hidalgo, and D. R. de L. Cabral, “Wilcoxon Rank Sum Test Drift Detector,†Neurocomputing, vol. 275, pp. 1954–1963, 2018, doi: 10.1016/j.neucom.2017.10.051.

A. Cano and B. Krawczyk, “Kappa Updated Ensemble for drifting data stream mining,†Mach. Learn., vol. 109, no. 1, pp. 175–218, Jan. 2020, doi: 10.1007/s10994-019-05840-z.

V. Losing, B. Hammer, and H. Wersing, “Tackling heterogeneous concept drift with the Self-Adjusting Memory (SAM),†Knowl. Inf. Syst., vol. 54, pp. 171–201, 2018, doi: 10.1007/s10115-017-1137-y.

S. Priya and R. A. Uthra, “Comprehensive analysis for class imbalance data with concept drift using ensemble based classification,†J. Ambient Intell. Humaniz. Comput., 2020, doi: 10.1007/s12652-020-01934-y.

H. Hu, M. Kantardzic, and T. S. Sethi, “No Free Lunch Theorem for concept drift detection in streaming data classification: A review,†Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 10: e1327, no. 2, Mar. 2020, doi: 10.1002/widm.1327.

N. C. Oza, “Online ensemble learning,†University of California, Berkeley, 2001.

R. Elwell and R. Polikar, “Incremental learning of concept drift in non-stationary environments,†IEEE Trans. Neural Networks, vol. 22, no. 10, pp. 1517–1531, 2011, doi: 10.1109/TNN.2011.2160459.

D. Brzezinski and J. Stefanowski, “Combining block-based and online methods in learning ensembles from concept drifting data streams,†Inf. Sci. (Ny)., vol. 265, pp. 50–67, 2014, doi: 10.1016/j.ins.2013.12.011.

R. S. M. Barros, D. R. L. Cabral, P. M. Gonçalves, and S. G. T. C. Santos, “RDDM: Reactive Drift Detection Method,†Expert Syst. Appl., vol. 90, pp. 344–355, 2017, doi: 10.1016/j.eswa.2017.08.023.

S. Ren, B. Liao, W. Zhu, and K. Li, “Knowledge-Maximized Ensemble algorithm for different types of concept drift,†Inf. Sci. (Ny)., vol. 430–431, pp. 261–281, 2018, doi: 10.1016/j.ins.2017.11.046.

P. Sidhu and M. P. S. Bhatia, “A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority,†Int. J. Mach. Learn. Cybern., vol. 9, pp. 37–61, 2018, doi: 10.1007/s13042-015-0333-x.

Z. Yang, S. Al-Dahidi, P. Baraldi, E. Zio, and L. Montelatici, “A novel concept drift detection method for incremental learning in non-stationary environments,†IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 1, pp. 309–320, Jan. 2020, doi: 10.1109/TNNLS.2019.2900956.

Y. Lu, Y.-M. Cheung, and Y. Y. Tang, “Adaptive Chunk-based Dynamic Weighted Majority for imbalanced data streams with concept drift,†IEEE Trans. Neural Networks Learn. Syst., vol. 31, no. 8, pp. 2764–2778, 2020, doi: 10.1109/TNNLS.2019.2951814.

Z. Li, W. Huang, Y. Xiong, S. Ren, and T. Zhu, “Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm,†Knowledge-Based Syst., vol. 195, no. 105694, 2020, doi: 10.1016/j.knosys.2020.105694.

A. Bifet and R. Gavaldà , “Learning from time-changing data with adaptive windowing,†in Proceedings of the 7th SIAM International Conference on Data Mining, 2007, pp. 443–448, doi: 10.1137/1.9781611972771.42.

G. Jaber, A. CornuejÌols, and P. Tarroux, “A new online learning method for coping with recurring concepts: The ADACC system,†in Neural Information Processing (ICONIP 2013),Lecture Notes in Computer Science, vol. 8227, M. Lee, A. Hirose, Z. Hou, and R. M. Kil, Eds. Daegu, Korea (Republic of): Springer, Berlin, Heidelberg, 2013, pp. 595–604.

P. Domingos and G. Hulten, “Mining high-speed data streams,†in Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00), 2000, pp. 71–80, doi: 10.1145/347090.347107.

A. Bifet, G. Holmes, and B. Pfahringer, “Leveraging bagging for evolving data streams,†in Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2010, Lecture Notes in Computer Science, vol. 6321, J. L. Balcázar, F. Bonchi, A. Gionis, and M. Sebag, Eds. Barcelona, Spain: Springer, Berlin, Heidelberg, 2010, pp. 135–150.

D. Brzezinski and J. Stefanowski, “Reacting to different types of concept drift: The accuracy updated ensemble algorithm,†IEEE Trans. Neural Networks Learn. Syst., vol. 25, no. 1, pp. 81–94, 2014, doi: 10.1109/TNNLS.2013.2251352.

A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, “MOA: Massive Online Analysis,†J. Mach. Learn. Res., vol. 11, pp. 1601–1604, 2010.

S. García, A. Fernández, J. Luengo, and F. Herrera, “Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power,†Inf. Sci. (Ny)., vol. 180, no. 10, pp. 2044–2064, 2010, doi: 10.1016/j.ins.2009.12.010.




DOI: http://dx.doi.org/10.18517/ijaseit.11.5.13299

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development