Implementation of the Random Forest Algorithm for Phishing Detection on Websites

Main Article Content

Muhammad Fahri

Abstract

Phishing attacks have become one of the most rapidly increasing cybersecurity threats in recent years. Phishing websites are designed to deceive users into divulging sensitive information such as login credentials, credit card data, and other personal details. This research proposes the implementation of the Random Forest algorithm for automated phishing website detection. The dataset used in this study comprises 10,000 classified URL samples, with 49 distinct features extracted. The research methodology includes data preprocessing, URL feature extraction, Random Forest model training, and performance evaluation. The evaluation results demonstrate that the developed Random Forest model achieved an accuracy of 98.20%, precision of 98.22%, recall of 98.22%, and an F1-score of 98.22%. This study proves that the Random Forest algorithm is highly effective for phishing detection and can be implemented as a preventive security system in internet Browse.

Article Details

How to Cite
Fahri, M. (2025). Implementation of the Random Forest Algorithm for Phishing Detection on Websites. JITSI : Jurnal Ilmiah Teknologi Sistem Informasi, 6(2), 186 - 194. https://doi.org/10.62527/jitsi.6.2.472
Section
Articles

References

[1] Choon Lin, T., Choudhury, S., Al-Turjman, F., et al. (2022). "Phishing attacks detection using machine learning approach." Expert Systems with Applications, 215, 119334.
[2] Jain, A. K., & Gupta, B. B. (2018). "A novel approach to protect against phishing attacks at client side using auto-updated white-list." EURASIP Journal on Information Security, 2018(1), 1-11.
[3] Alsariera, Y. A., Elijah, A. V., & Balogun, A. O. (2020). "Phishing detection using RDF and random forests." Procedia Computer Science, 167, 1167-1177.
[4] Kumi, S., Lim, C., & Lee, S. G. (2021). "Machine learning techniques for detecting phishing websites." Future Internet, 13(6), 149.
[5] Babagoli, M., Aghababa, M. P., & Solouk, V. (2018). "Heuristic nonlinear regression strategy for detecting phishing websites." Soft Computing, 22(15), 4315-4327.
[6] Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). "Experimental case studies for investigating e-banking phishing techniques and attack strategies." Cognitive Computation, 2(3), 242-253.
[7] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). "Machine learning based phishing detection from URLs." Expert Systems with Applications, 117, 345-357.
[8] Chiew, K. L., Yong, K. S., & Tan, C. L. (2018). "A survey of phishing attacks: Their types, vectors and technical approaches." Expert Systems with Applications, 106, 1-20.
[9] Breiman, L. (2001). "Random forests." Machine Learning, 45(1), 5-32.
[10] Marchal, S., François, J., State, R., & Engel, T. (2014). "PhishStorm: Detecting phishing with streaming analytics." IEEE Transactions on Network and Service Management, 11(4), 458-471.
[11] Alsariera, Y. A., Elijah, A. V., & Balogun, A. O. (2020). "Phishing detection using RDF and random forests." Procedia Computer Science, 167, 1167-1177.
[12] Kumi, S., Lim, C., & Lee, S. G. (2021). "Machine learning techniques for detecting phishing websites." Future Internet, 13(6), 149.
[13] Lakshmi, L., Reddy, G. H., & Reddy, G. P. (2019). "Phishing website detection using machine learning." International Journal of Recent Technology and Engineering, 8(2), 5373-5375.
[14] Rao, R. S., & Ali, S. T. (2015). "PhishDump: A multi-model ensemble based technique for the detection of phishing sites in mobile devices." Pervasive and Mobile Computing, 24, 55-74.
[15] Buber, E., Diri, B., & Sahingoz, O. K. (2017). "Detecting phishing attacks from URL by using NLP techniques." Computer Science and Information Systems, 14(1), 241-260.
[16] Shirazi, H., Bezawada, B., & Ray, I. (2018). "Kn0ck kn0ck: A tool to automatically generate training data for phishing detection." Computers & Security, 73, 372-385.
[17] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). "Machine learning based phishing detection from URLs." Expert Systems with Applications, 117, 345-357.
[18] Zhu, E., Chen, Y., Ye, C., Li, X., & Liu, F. (2019). "OFS-NN: An effective phishing websites detection model based on optimal feature selection and neural network." IEEE Access, 7, 73271-73284.
[19] H. Amnur, Rasyidah, and F. Setyawan, “Keamanan Jaringan Wireless Dengan Kali Linux”, jitsi, vol. 3, no. 1, pp. 16 - 22, Mar. 2022.
[20] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic minority over-sampling technique." Journal of Artificial Intelligence Research, 16, 321-357.
[21] Liaw, A., & Wiener, M. (2002). "Classification and regression by randomForest." R News, 2(3), 18-22.
[22] Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). "Do we need hundreds of classifiers to solve real world classification problems?" Journal of Machine Learning Research, 15(1), 3133-3181.
[23] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
[24] Sokolova, M., & Lapalme, G. (2009). "A systematic analysis of performance measures for classification tasks." Information Processing & Management, 45(4), 427-437.
[25] Kohavi, R. (1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection." International Joint Conference on Artificial Intelligence, 14(2), 1137-1145.
[26] Bergstra, J., & Bengio, Y. (2012). "Random search for hyper-parameter optimization." Journal of Machine Learning Research, 13(2), 281-305.
[27] Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). "Conditional variable importance for random forests." BMC Bioinformatics, 9(1), 1-11.
[28] Varshney, G., Misra, M., & Atrey, P. K. (2016). "A survey and classification of web phishing detection schemes." Security and Communication Networks, 9(18), 6266-6284.