Unsupervised Learning Techniques for Anomaly Detection in High-Dimensional Data Streams Using Clustering and Autoencoders

Sandeep Kumar Rathore; Thathineni  Jagadeesh; Rajashri  CK; Kottu Santosh  Kumar; Dr. A.  Vanathi; Pawan Wawage; Vijay Kumar; Dr. C S Pavan Kumar

Authors

Sandeep Kumar Rathore Department of Computer Engineering & Applications, GLA University, Mathura.
Thathineni Jagadeesh Assistant Professor, Department of CSE (Artificial Intelligence), Pragati Engineering College, ADB Road, Surampalem, Near Peddapuram, Kakinada District, Andhra Pradesh, India - 533437.
Rajashri CK Assistant Professor, Computer Science, Meenakshi College of Arts and Science, Meenakshi Academy of Higher Education and Research.
Kottu Santosh Kumar Assistant Professor, Departmentof Information Technology, Vardhaman College of Engineering, Shamshabad, Hyderabad, India - 501 218.
Dr. A. Vanathi Associate Professor, Department of Computer Science and Engineering, Aditya University, Surampalem, Andhra Pradesh, Pin 533437.
Pawan Wawage Assistant Professor, Information Technology, Vishwakarma Institute of Technology, Pune, Maharashtra, 411037.
Vijay Kumar School of Engineering &Technology,Noida international University, Uttar Pradesh 203201, India, Email: vijay.kumar@niu.edu.in Sr. Assistant Professor, Department of AI, Siddhartha Academy of Higher Education(Deemed to be University).
Dr. C S Pavan Kumar Sr. Assistant Professor, Department of AI, Siddhartha Academy of Higher Education(Deemed to be University).

Keywords:

Unsupervised Learning, Anomaly Detection, Autoencoder, Clustering, High-Dimensional Data Streams, Deep Learning, Cybersecurity Analytics, Machine Learning

Abstract

The accelerated growth of high dimensional streaming data produced by industrial automation systems, Internet of Things (IoT) devices, cybersecurity systems, healthcare monitoring systems and public cloud computing environments has led to the need to implement intelligent and scalable anomaly detection systems. Conventional supervised learning methods are very data heavy and somewhat unresponsive to dynamic stream experiences where anomalous behaviours are continually being developed. To overcome such limitations, this paper comes up with an unsupervised anomaly detection model that incorporates both clustering and deep autoencoders architectures in identifying abnormal patterns in high-dimensional streams of data. The suggested methodology utilizes data preprocessing, feature normalization, data organization on K-Means clustering and latent feature learning on deep autoencoder to detect anomalies without any labeled training data. The reconstruction error analysis is used to categorize the anomalous cases using error measurements between the original and reconstructed data representations. The benchmark intrusion detection data sets such as the NSL-KDD and the UNSW-NB15 were used to experimentally test the framework. Accuracy, precision, recall, F1-score, ROC-AUC and false positive rate were used as measures of performance. The experimental findings showed that the hybrid framework achieved a higher average detection accuracy of 97.1, precision of 96.5, recall of 95.8 and a ROC-AUC of 0.981 as compared to the traditional unsupervised methods namely Isolation Forest and One-Class SVM. The proposed Anomaly detection framework was statistically validated using 10-fold cross-validation to demonstrate the strength, scalability and reliability of the proposed framework in high dimensional streaming environments.

Unsupervised Learning Techniques for Anomaly Detection in High-Dimensional Data Streams Using Clustering and Autoencoders

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

INDEXING

Developed By

Information

Browse

Current Issue