Penerapan Algoritma Random Forest untuk Deteksi Phishing pada Website
Isi Artikel Utama
Abstrak
Serangan phishing merupakan salah satu ancaman keamanan siber yang paling meningkat pesat dalam beberapa tahun terakhir. Website phishing dirancang untuk mengelabui pengguna agar memberikan informasi sensitif seperti kredensial login, data kartu kredit, dan informasi pribadi lainnya. Penelitian ini mengusulkan penerapan algoritma Random Forest untuk deteksi phishing pada website secara otomatis. Dataset yang digunakan dalam penelitian ini mencakup 10.000 sampel URL yang telah diklasifikasikan, dengan ekstraksi 49 fitur berbeda. Metodologi penelitian meliputi preprocessing data, ekstraksi fitur URL, pelatihan model Random Forest, dan evaluasi performa. Hasil evaluasi menunjukkan bahwa model Random Forest yang dikembangkan mampu mencapai akurasi 98.20%, presisi 98.22%, recall 98.22%, dan F1-score 98.22%. Penelitian ini membuktikan bahwa algoritma Random Forest sangat efektif untuk deteksi phishing dan dapat diimplementasikan sebagai sistem keamanan preventif dalam Browse internet
Rincian Artikel
Referensi
[2] Jain, A. K., & Gupta, B. B. (2018). "A novel approach to protect against phishing attacks at client side using auto-updated white-list." EURASIP Journal on Information Security, 2018(1), 1-11.
[3] Alsariera, Y. A., Elijah, A. V., & Balogun, A. O. (2020). "Phishing detection using RDF and random forests." Procedia Computer Science, 167, 1167-1177.
[4] Kumi, S., Lim, C., & Lee, S. G. (2021). "Machine learning techniques for detecting phishing websites." Future Internet, 13(6), 149.
[5] Babagoli, M., Aghababa, M. P., & Solouk, V. (2018). "Heuristic nonlinear regression strategy for detecting phishing websites." Soft Computing, 22(15), 4315-4327.
[6] Aburrous, M., Hossain, M. A., Dahal, K., & Thabtah, F. (2010). "Experimental case studies for investigating e-banking phishing techniques and attack strategies." Cognitive Computation, 2(3), 242-253.
[7] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). "Machine learning based phishing detection from URLs." Expert Systems with Applications, 117, 345-357.
[8] Chiew, K. L., Yong, K. S., & Tan, C. L. (2018). "A survey of phishing attacks: Their types, vectors and technical approaches." Expert Systems with Applications, 106, 1-20.
[9] Breiman, L. (2001). "Random forests." Machine Learning, 45(1), 5-32.
[10] Marchal, S., François, J., State, R., & Engel, T. (2014). "PhishStorm: Detecting phishing with streaming analytics." IEEE Transactions on Network and Service Management, 11(4), 458-471.
[11] Alsariera, Y. A., Elijah, A. V., & Balogun, A. O. (2020). "Phishing detection using RDF and random forests." Procedia Computer Science, 167, 1167-1177.
[12] Kumi, S., Lim, C., & Lee, S. G. (2021). "Machine learning techniques for detecting phishing websites." Future Internet, 13(6), 149.
[13] Lakshmi, L., Reddy, G. H., & Reddy, G. P. (2019). "Phishing website detection using machine learning." International Journal of Recent Technology and Engineering, 8(2), 5373-5375.
[14] Rao, R. S., & Ali, S. T. (2015). "PhishDump: A multi-model ensemble based technique for the detection of phishing sites in mobile devices." Pervasive and Mobile Computing, 24, 55-74.
[15] Buber, E., Diri, B., & Sahingoz, O. K. (2017). "Detecting phishing attacks from URL by using NLP techniques." Computer Science and Information Systems, 14(1), 241-260.
[16] Shirazi, H., Bezawada, B., & Ray, I. (2018). "Kn0ck kn0ck: A tool to automatically generate training data for phishing detection." Computers & Security, 73, 372-385.
[17] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). "Machine learning based phishing detection from URLs." Expert Systems with Applications, 117, 345-357.
[18] Zhu, E., Chen, Y., Ye, C., Li, X., & Liu, F. (2019). "OFS-NN: An effective phishing websites detection model based on optimal feature selection and neural network." IEEE Access, 7, 73271-73284.
[19] H. Amnur, Rasyidah, and F. Setyawan, “Keamanan Jaringan Wireless Dengan Kali Linux”, jitsi, vol. 3, no. 1, pp. 16 - 22, Mar. 2022.
[20] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). "SMOTE: Synthetic minority over-sampling technique." Journal of Artificial Intelligence Research, 16, 321-357.
[21] Liaw, A., & Wiener, M. (2002). "Classification and regression by randomForest." R News, 2(3), 18-22.
[22] Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). "Do we need hundreds of classifiers to solve real world classification problems?" Journal of Machine Learning Research, 15(1), 3133-3181.
[23] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
[24] Sokolova, M., & Lapalme, G. (2009). "A systematic analysis of performance measures for classification tasks." Information Processing & Management, 45(4), 427-437.
[25] Kohavi, R. (1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection." International Joint Conference on Artificial Intelligence, 14(2), 1137-1145.
[26] Bergstra, J., & Bengio, Y. (2012). "Random search for hyper-parameter optimization." Journal of Machine Learning Research, 13(2), 281-305.
[27] Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T., & Zeileis, A. (2008). "Conditional variable importance for random forests." BMC Bioinformatics, 9(1), 1-11.
[28] Varshney, G., Misra, M., & Atrey, P. K. (2016). "A survey and classification of web phishing detection schemes." Security and Communication Networks, 9(18), 6266-6284.