2025-10-30 –, Main Stage
As machine learning becomes a core component in malware detection, new risks emerge from adversarial manipulation. This talk explores how ML-based malware classifiers respond to targeted feature modifications. In order to experimentally assess their robustness, several models were trained to classify malicious and benign files and then tested with adversarially altered samples. The presentation focuses on data preparation, attack simulation, and a comparative analysis of model robustness under adversarial conditions.
This talk presents a technical exploration of adversarial robustness in machine learning-based malware detection systems. The research is grounded on the EMBER dataset, one of the largest publicly available datasets for static malware analysis, containing raw features and labels for over 3.2 million malicious and benign samples spanning six file types: Win32, Win64, .NET, APK, ELF, and PDF.
The first phase involved analyzing the dataset to identify which features are most relevant for binary classification of malware. After feature selection and preprocessing, multiple machine learning models (e.g., Random Forest, Gradient Boosting, and Neural Networks) were trained on a representative sample subset to distinguish between malicious and benign files. These trained models were saved for further testing.
In the second phase, adversarial attacks were simulated by modifying key input features in the test samples — without altering the functional structure of the binaries — to observe how the predictions of each saved model changed. This process helped evaluate the models’ resilience to feature manipulation and adversarial evasion.
The presentation will detail the full pipeline: from dataset preparation and feature engineering, through model training, to adversarial evaluation. It concludes with a comparative analysis of each classifier’s robustness under attack, identifying which algorithms offer the highest resistance and reliability in adversarial scenarios. This work contributes to understanding how secure and dependable current ML-based malware detection methods are when faced with intelligent manipulation attempts.
I am a Data Scientist at Siemens Healthineers, working in the Cybersecurity department and currently based in Bratislava. Alongside my professional role, I am pursuing a PhD at the Slovak University of Technology, Faculty of Electrical Engineering and Informatics, specializing in Mechatronic Systems with a focus on intelligent systems and technologies. In my free time, I enjoy reading, staying active through sports, and spending time with friends.