Deteksi Malware pada File Executable Menggunakan Machine Learning Random Forest

M. Cakra Adhana; Alde Alanda; Hidra Amnur; Febrian  Kasmar

doi:10.62527/jitsi.7.2.602

PDF (Bahasa Indonesia)

Published: Jun 30, 2026

DOI: https://doi.org/10.62527/jitsi.7.2.602

Keywords:

Cybersecurity, Malware Detection, Portable Executable, Machine Learning, Random Forest, Static Analysis

M. Cakra Adhana

Politeknik Negeri Padang

Alde Alanda

Politeknik Negeri Padang

Hidra Amnur

Politeknik Negeri Padang

Febrian Kasmar

Politeknik Negeri Padang

Abstract

The pervasive expansion of digital infrastructure has triggered an exponential surge in cyber threats, with malicious software (malware) posing a paramount risk to information security systems. Traditional signature-based and heuristic detection methods demonstrate severe limitations in mitigating zero-day exploits and multi-variant obfuscated malware due to their rigid dependency on existing signature repositories and susceptibility to high false-positive rates. To transcend these boundaries, this study introduces an adaptive and robust static detection framework for Portable Executable (PE) files leveraging the ensemble machine learning technique of Random Forest. Utilizing a structured dataset comprising PE files harvested from public malware repositories including Malware Bazaar alongside verified benign applications, static analysis was performed without code execution to preserve environment safety. A total of 75 distinctive structural features spanning COFF headers, section characteristics, data directories, and configuration markers were systematically extracted using the Python pefile library. The model was trained using an 80:20 data split ratio. Experimental evaluation achieved an exceptional internal generalization capability with an Out-of-Bag (OOB) score of 97.43%. Independent validation on a test suite of 332 unseen files yielded a balanced confusion matrix comprising 160 True Positives, 164 True Negatives, 5 False Positives, and 3 False Negatives, establishing a high precision, recall, and F1-score of approximately 98%. Feature importance analysis highlighted that parameters such as MajorOperatingSystemVersion, MajorSubsystemVersion, and DllCharacteristics serve as critical discriminators. Finally, the optimized predictive model was integrated into a web-accessible application architecture powered by Flask and MySQL to facilitate user-driven file uploading and real-time inference reporting, offering an scalable complementary defense layer for modern cybersecurity ecosystems

How to Cite

M. Cakra Adhana, Alanda, A., Amnur, H., & Kasmar, F. (2026). Deteksi Malware pada File Executable Menggunakan Machine Learning Random Forest. JITSI : Jurnal Ilmiah Teknologi Sistem Informasi, 7(2), 176 - 182. https://doi.org/10.62527/jitsi.7.2.602

Issue

Vol. 7 No. 2 (2026)

Section

Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

References

[1] M. Selinger, “AV-TEST Awards 2023: shining the spotlight on the best IT security.”
[2] M. Asam, S. Hussain Khan, T. Jamal, U. Zahoora, and A. Khan, “Malware Classification Using Deep Boosted Learning.”
[3] M. Altaiy, İ. Yildiz, and B. Uçan, “MALWARE DETECTION USING DEEP LEARNING ALGORITHMS,” 2023. [Online]. Available: https://orcid.org/0000-0003-2943-3857
[4] E. S. Alomari et al., “Malware Detection Using Deep Learning and Correlation-Based Feature Selection,” Symmetry (Basel), vol. 15, no. 1, Jan. 2023, doi: 10.3390/sym15010123.
[5] M. Masum, M. Jobair Hossain Faruk, H. Shahriar, K. Qian, D. Lo, and M. Islam Adnan, “Ransomware Classification and Detection With Machine Learning Algorithms.”
[6] F. A. Rafrastara, C. Supriyanto, C. Paramita, Y. P. Astuti, and F. Ahmed, “Performance Improvement of Random Forest Algorithm for Malware Detection on Imbalanced Dataset using Random Under-Sampling Method,” vol. 8, no. 2, 2023, [Online]. Available: https://orangedatamining.com/
[7] S. Yoo, S. Kim, S. Kim, and B. B. Kang, “AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification,” Inf Sci (N Y), vol. 546, pp. 420–435, Feb. 2021, doi:10.1016/j.ins.2020.08.082.
[8] E. Valdis Tjahjadi and B. Santoso, “Klasifikasi Malware Menggunakan Teknik Machine Learning,” Copyright @BALOK, vol. 2, no. 1, 2023, [Online]. Available: https://www.kaggle.com/datasets/amauricio/pe-files-malwares.
[9] R. B. Hadiprakoso, W. Rendra Aditya, F. N. Pramitha, P. Siber, and S. Negara, “ANALISIS STATIS DETEKSI MALWARE ANDROID MENGGUNAKAN ALGORITMA SUPERVISED MACHINE LEARNING,” 2022.
[10] Y. Wanli Sitorus, P. Sukarno, S. Mandala, F. Informatika, and U. Telkom, “Analisis Deteksi Malware Android menggunakan metode Support Vector Machine & Random Forest,” vol. 8, no. 6, p. 12500, 2021.
[11] F. A. Rafrastara, C. Supriyanto, C. Paramita, and Y. P. Astuti, “Deteksi Malware menggunakan Metode Stacking berbasis Ensemble,” vol. 8, no. 1, 2023, [Online]. Available: https://orangedatamining.com/
[12] R. Marriam, R. Mohamad, S. Hiew Moi, and H. Amnur, “A Comparative Study of Rumor Detection Domains: Machine Learning, Deep Learning, and Statistical Approaches,” 2025. doi: http://dx.doi.org/10.62527/joiv.9.6.4793.
[13] F. Abdussalam and A. Rahmatulloh, “Analisis Efektivitas Algoritma Machine Learning Dalam Deteksi Malware Android ……… ANALISIS EFEKTIVITAS ALGORITMA MACHINE LEARNING DALAM DETEKSI MALWARE ANDROID DENGAN STATISTICAL TESTS,” 2024, doi:10.35316/jimi.v9i2.124-133.
[14] M. Asam et al., “Detection of exceptional malware variants using deep boosted feature spaces and machine learning,” Applied Sciences (Switzerland), vol. 11, no. 21, Nov. 2021, doi: 10.3390/app112110464.
[15] E. Horvitz and D. Mulligan, “Data, privacy, and the greater good,” Science (1979), vol. 349, no. 6245, pp. 253–255, Jul. 2015, doi:10.1126/science.aac4520.

Article Sidebar

Main Article Content

Abstract

Article Details

References