Capstone project

Enhanced Random Forest Classifier with K-Means Clustering (ERF-KMC) for Detecting and Preventing Distributed-Denial-of-Service and Man-in-the-Middle Attacks in Internet-of-Medical-Things Networks.

Advancements in medical sciences have led to the creation of advanced automatic patient monitoring systems. These systems collect patient vital signs, aided by connected devices and sensors that comprise the Internet of Medical Things (IoMT) and wireless body sensor networks (WBSNs). These devices are typically wearable or implantable sensors that capture personal health data in real time and transmit it to servers or collection units for processing. A key value of WBSNs in healthcare lies in their ability to detect health issues early, enabling timely and remote diagnosis. The interconnected nature of these devices poses a challenge by increasing the number of nodes and potential access points for attackers. Furthermore, the transmitted data is highly confidential, and a breach would signify a violation of the HIPAA standards and potentially have devastating effects on patients and institutions. Enhancing the security of the IoMT is crucial for the healthcare industry in its effort to guarantee patients’ privacy and safety.

The study proposes the use of the random forest algorithm, combined with K-means clustering, to tackle the security problems in this type of connected network.

Previous admin-based approaches to IoMT security:

Identifying DDoS assaults by monitoring server usage patterns and setting specific thresholds.
By incorporating the time-to-live (TTL) component, this method scrutinized incoming requests, considering abnormal TTL values.
Blocking IP addresses and rate limiting.

Previous ML-based approaches to IoMT security:

Integration of random forest, K-nearest neighbors, support vector machines, and artificial neural networks.
Combining a genetic algorithm with a random forest.
A machine-learning technique combining clustering and graphing structural properties
Deep learning.
And others.

Based on previous works, the authors noted that random forest tends to outperform other algorithms for this task. However, the longer processing time puts it at a disadvantage. In order to solve this, the authors implement what they call an Enhanced Random Forest (ERF), basically consisting on the previous application of principal components analysis (PCA) for dimensionality reduction. By fine-tuning settings in the RF and implementing PCA, the execution time was reduced by 3.116 s, as compared to the default random forest classifier (3.795 s), while retaining accuracy and sensitivity at 99%. The study emphasizes the significance of parameter selection.

To keep a short execution time, the number of decision trees was set to 20 (n_estimators = 20. Default: 100). Continuing with the idea of reducing training time, the rest of the hyperparameters were set to these values:

max_features = 10 (number of features each tree can consider at a split. Default: #of features)

max_samples = 0.8 or 80% ( number of samples to draw for each tree. Default: entire dataset)

max_depth = 25 (maximum depth of each tree. Default: the tree grows until termination requirements are satisfied)

The study compares the performance of the method to two other algorithms, AdaBoost and CatBoost. It was found that ERF outperformed the two algorithms with an accuracy rate of 99.053%. The accuracy rates of AdaBoost and CatBoost were slightly lower, at 92.654% and 98.855%. The execution time of RF was 0.679 s, while AdaBoost and CatBoost were at 1.533 and 0.851 s, respectively.

To further evaluate the methodology, the algorithm was trained on another dataset, the BoTNeTIoT-L01, which contains data from IoT devices involved in the detection of DDoS (Distributed Denial of Service). The objective was to classify rows as DDoS or not, while comparing the methodology to several other algorithms. The results are shown in the following table:

The study also mentions k-means as an intended section of the detection and prevention system. The authors aim to use k-means to classify intrusions, once they have been detected by the ERF, into DDoS attacks or MITM (man in the middle). The use of k-means in this manner is not illustrated in the text. The intention itself is questionable, since k-means is a clustering algorithm, not being suitable for classification based on labeled data.

Conclusions

In spite of the unclear mention of the k-means clustering algorithm, the study illustrates the application of a successful methodology based on random forest, in response to the security challenges in IoMT networks. The model is able to detect and identify unauthorized access with increased efficiency, relying on the use of PCA as a previous step in the methodology. Moreover, the approach demonstrates potential for handling a wide range of network attacks, ensuring accurate and timely transmission of medical data while safeguarding patient privacy. The results suggest that combining clustering techniques with ensemble detection models can substantially improve the robustness of IoMT security systems, paving the way for more resilient and precise healthcare monitoring solutions.