Deteksi Malware pada File Executable Menggunakan Machine Learning Random Forest

Main Article Content

M. Cakra Adhana
Alde Alanda
Hidra Amnur
Febrian Kasmar

Abstract

The pervasive expansion of digital infrastructure has triggered an exponential surge in cyber threats, with malicious software (malware) posing a paramount risk to information security systems. Traditional signature-based and heuristic detection methods demonstrate severe limitations in mitigating zero-day exploits and multi-variant obfuscated malware due to their rigid dependency on existing signature repositories and susceptibility to high false-positive rates. To transcend these boundaries, this study introduces an adaptive and robust static detection framework for Portable Executable (PE) files leveraging the ensemble machine learning technique of Random Forest. Utilizing a structured dataset comprising PE files harvested from public malware repositories including Malware Bazaar alongside verified benign applications, static analysis was performed without code execution to preserve environment safety. A total of 75 distinctive structural features spanning COFF headers, section characteristics, data directories, and configuration markers were systematically extracted using the Python pefile library. The model was trained using an 80:20 data split ratio. Experimental evaluation achieved an exceptional internal generalization capability with an Out-of-Bag (OOB) score of 97.43%. Independent validation on a test suite of 332 unseen files yielded a balanced confusion matrix comprising 160 True Positives, 164 True Negatives, 5 False Positives, and 3 False Negatives, establishing a high precision, recall, and F1-score of approximately 98%. Feature importance analysis highlighted that parameters such as MajorOperatingSystemVersion, MajorSubsystemVersion, and DllCharacteristics serve as critical discriminators. Finally, the optimized predictive model was integrated into a web-accessible application architecture powered by Flask and MySQL to facilitate user-driven file uploading and real-time inference reporting, offering an scalable complementary defense layer for modern cybersecurity ecosystems

Article Details

How to Cite
M. Cakra Adhana, Alanda, A., Amnur, H., & Kasmar, F. (2026). Deteksi Malware pada File Executable Menggunakan Machine Learning Random Forest. JITSI : Jurnal Ilmiah Teknologi Sistem Informasi, 7(2), 176 - 182. https://doi.org/10.62527/jitsi.7.2.602
Section
Articles

References

[1] M. Selinger, “AV-TEST Awards 2023: shining the spotlight on the best IT security.”
[2] M. Asam, S. Hussain Khan, T. Jamal, U. Zahoora, and A. Khan, “Malware Classification Using Deep Boosted Learning.”
[3] M. Altaiy, İ. Yildiz, and B. Uçan, “MALWARE DETECTION USING DEEP LEARNING ALGORITHMS,” 2023. [Online]. Available: https://orcid.org/0000-0003-2943-3857
[4] E. S. Alomari et al., “Malware Detection Using Deep Learning and Correlation-Based Feature Selection,” Symmetry (Basel), vol. 15, no. 1, Jan. 2023, doi: 10.3390/sym15010123.
[5] M. Masum, M. Jobair Hossain Faruk, H. Shahriar, K. Qian, D. Lo, and M. Islam Adnan, “Ransomware Classification and Detection With Machine Learning Algorithms.”
[6] F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” vol. 8, no. 2, 2023, [Online]. Available: https://orangedatamining.com/
[7] S. Yoo, S. Kim, S. Kim, and B. B. Kang, “AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification,” Inf Sci (N Y), vol. 546, pp. 420–435, Feb. 2021, doi:10.1016/j.ins.2020.08.082.
[8] E. Valdis Tjahjadi and B. Santoso, “Klasifikasi Malware Menggunakan Teknik Machine Learning,” Copyright @BALOK, vol. 2, no. 1, 2023, [Online]. Available: https://www.kaggle.com/datasets/amauricio/pe-files-malwares.
[9] R. B. Hadiprakoso, W. Rendra Aditya, F. N. Pramitha, P. Siber, and S. Negara, “ANALISIS STATIS DETEKSI MALWARE ANDROID MENGGUNAKAN ALGORITMA SUPERVISED MACHINE LEARNING,” 2022.
[10] Y. Wanli Sitorus, P. Sukarno, S. Mandala, F. Informatika, and U. Telkom, “Analisis Deteksi Malware Android menggunakan metode Support Vector Machine & Random Forest,” vol. 8, no. 6, p. 12500, 2021.
[11] F. A. Rafrastara, C. Supriyanto, C. Paramita, and Y. P. Astuti, “Deteksi Malware menggunakan Metode Stacking berbasis Ensemble,” vol. 8, no. 1, 2023, [Online]. Available: https://orangedatamining.com/
[12] R. Marriam, R. Mohamad, S. Hiew Moi, and H. Amnur, “A Comparative Study of Rumor Detection Domains: Machine Learning, Deep Learning, and Statistical Approaches,” 2025. doi: http://dx.doi.org/10.62527/joiv.9.6.4793.
[13] F. Abdussalam and A. Rahmatulloh, “Analisis Efektivitas Algoritma Machine Learning Dalam Deteksi Malware Android ……… ANALISIS EFEKTIVITAS ALGORITMA MACHINE LEARNING DALAM DETEKSI MALWARE ANDROID DENGAN STATISTICAL TESTS,” 2024, doi:10.35316/jimi.v9i2.124-133.
[14] M. Asam et al., “Detection of exceptional malware variants using deep boosted feature spaces and machine learning,” Applied Sciences (Switzerland), vol. 11, no. 21, Nov. 2021, doi: 10.3390/app112110464.
[15] E. Horvitz and D. Mulligan, “Data, privacy, and the greater good,” Science (1979), vol. 349, no. 6245, pp. 253–255, Jul. 2015, doi:10.1126/science.aac4520.