A Convolutional Neural Network (CNN) Classification Model for Web Page: A Tool for Improving Web Page Category Detection Accuracy

Main Article Content

Siti Hawa Apandi
Jamaludin Sallim
Rozlina Mohamed

Abstract

Game and Online Video Streaming are the most viewed web pages. Users who spend too much time on these types of web pages may suffer from internet addiction. Access to Game and Online Video Streaming web pages should be restricted to combat internet addiction. A tool is required to recognise the category of web pages based on the text content of the web pages. Due to the unavailability of a matrix representation that can handle long web page text content, this study employs a document representation known as word cloud image to visualise the words extracted from the text content web page after data pre-processing. The most popular words are shown in large size and appear in the centre of the word cloud image. The most common words are the words that appear frequently in the text content web page and are related to describing what the web page content is about. The Convolutional Neural Network (CNN) recognises the pattern of words presented in the core portions of the word cloud image to categorise the category to which the web page belongs. The proposed model for web page classification has been compared with the other web page classification models. It shows the good result that achieved an accuracy of 85.6%. It can be used as a tool that helps to make identifying the category of web pages more accurate

Article Details

How to Cite
Apandi, S. H., Sallim, J., & Mohamed, R. (2023). A Convolutional Neural Network (CNN) Classification Model for Web Page: A Tool for Improving Web Page Category Detection Accuracy. JITSI : Jurnal Ilmiah Teknologi Sistem Informasi, 4(3), 110 - 121. https://doi.org/10.30630/jitsi.4.3.181
Section
Articles

References

Datareportal. (n.d., 1 October 2021). Digital Around The World. Available: https://datareportal.com/global-digital-overview
J. Johnson. (2021, 1 October 2021). Worldwide digital population as of January 2021. Available: https://www.statista.com/statistics/617136/digital-population-worldwide/
J. M. G. d. Costa, "Web Page Classification using Text and Visual Features," Master, Universidade de Coimbra, 2014.
H. Li, Z. Zhang, and Y. Xu, "Web page classification method based on semantics and structure," in 2019 2nd International Conference on Artificial Intelligence and Big Data ICAIBD, 2019
A. Osanyin, O. Oladipupo, and I. Afolabi, "A Review on Web Page Classification," Covenant Journal of Informatics and Communication Technology, vol. 6, pp. 11-32, 2018.
Q. Zhao, W. Yang, and R. Hua, "Design and Research of Composite Web Page Classification Network Based on Deep Learning," in 2019 IEEE 31st International Conference on Tools with Artificial Intelligence ICTAI, 2019, pp. 1531-1535.
A. R. Alharbi, S. D. Alharbi, A. Aljaedi, and O. Akanbi, "Neural Networks Based on Latent Dirichlet Allocation For News Web Page Classifications," in 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology IICAIET, 2020, pp. 1-6.
N. Huss. 2021, 1 January 2021. How Many Websites Are There Around the World? [2021]. Available: https://siteefy.com/how-many-websites-are-there/
whatisblogger.com. n.d., 1 October 2021. Web Page vs Website: What is the difference between a Web Page and Website with Examples Website versus Web Page. Available: https://whatisblogger.com/website-versus-web-page/
Wikipedia. n.d., 1 October 2021. Website. Available: https://en.wikipedia.org/wiki/Website
E. Suganya and D. S. Vijayarani, "Web Page Classification in Web Mining Research - A Survey " International Journal of Innovative Research in Science, Engineering and Technology, vol. 6, pp. 17472-17479, 2017.
S. Birruntha. 2021, 1 October 2021. Top 5 streaming platforms in Malaysia. Available: https://themalaysianreserve.com/2021/01/05/top-5-streaming-platforms-in-malaysia/
C. Hope. 2017, 1 October 2021. Browser-based game. Available: https://www.computerhope.com/jargon/b/browserbased-game.htm
Wikipedia. n.d., 1 October 2021. Browser game. Available: https://en.wikipedia.org/wiki/Browser_game
J. Hadley and L. Morton. 2021, 1 October 2021. The best browser games to play right now. Available: https://www.pcgamer.com/best-browser-games/
C. Hope. 2019, 1 October 2021. MMORPG. Available: https://www.computerhope.com/jargon/m/mmorpg.htm
F. Cao and L. Su, "Internet addiction among Chinese adolescents: prevalence and psychological features," Child: care, health and development, vol. 33, pp. 275-281, 2007.
K. S. Young and R. C. Rogers, "The relationship between depression and Internet addiction," Cyberpsychology & behavior, vol. 1, pp. 25-28, 1998.
R. A. Davis, "A cognitive-behavioral model of pathological Internet use," Computers in human behavior, vol. 17, pp. 187-195, 2001.
G. M. U. S. H. Services. n.d., 1 April 2021. Internet Addiction. Available: https://shs.gmu.edu/healthed/internet-addiction/
S. Souligna, "A Browser Based Intervention Approach Towards Managing Internet Addiction Disorder," Master thesis, Auckland University of Technology, 2017.
X. Qi, "Web page classification and hierarchy adaptation," Doctor of Philosophy in Computer Science, Lehigh University, 2012.
P. V. Nainwani and P. Prajapati, "Comparative Study of Web Page Classification Approaches," International Journal of Computer Applications, vol. 179, pp. 6-9, 2018.
J. Alamelu Mangai, V. Santhosh Kumar, and V. Sugumaran, "Recent Research in Web Page Classification–A Review," International Journal of Computer Engineering & Technology IJCET, vol. 1, pp. 112-122, 2010.
E. Suganya and D. Vijayarani, "Web Page Classification in Web Mining Research-A Survey," International Journal of Innovative Research in Science, Engineering and Technology, vol. 6, pp. 17472-17479, 2017.
L. Safae, B. El Habib, and T. Abderrahim, "A Review of Machine Learning Algorithms for Web Page Classification," in 2018 IEEE 5th International Congress on Information Science and Technology CiSt, 2018, pp. 220-226.
Z. Dou, I. Khalil, A. Khreishah, A. Al-Fuqaha, and M. Guizani, "Systematization of Knowledge SoK: A systematic review of software-based web phishing detection," IEEE Communications Surveys & Tutorials, vol. 19, pp. 2797-2819, 2017.
A. Bakshi. 2021, 20 December 2021. What is Deep Learning? Getting Started With Deep Learning. Available: https://www.edureka.co/blog/what-is-deep-learning
T. T. Nguyen, K. Chang, and S. C. Hui, "Word cloud model for text categorization," in 2011 IEEE 11th International Conference on Data Mining, 2011, pp. 487-496.
B. Labs. 2014, 24 March 2022. Word Clouds and the Value of Simple Visualizations
R. Kusumaningrum and S. Adhy, "WCLOUDVIZ Word Cloud Visualization of Indonesian News Articles Classification Based on Latent Dirichlet Allocation," Telkomnika, vol. 16, pp. 1752-1759, 2018.
M. Du, Y. Han, and L. Zhao, "A Heuristic Approach for Website Classification with Mixed Feature Extractors," in 2018 IEEE 24th International Conference on Parallel and Distributed Systems ICPADS, 2018, pp. 134-141.
H. Jamshed, M. S. A. Khan, M. Khurram, S. Inayatullah, and S. Athar, "Data Preprocessing A preliminary step for web data mining," 3c Tecnología glosas de innovación aplicadas a la pyme, vol. 8, pp. 206-221, 2019.
M. J. H. Mughal, "Data Mining Web Data Mining Techniques, Tools and Algorithms An Overview," Information Retrieval, vol. 9, 2018.
N. Sharma, R. Agarwal, and N. Kohli, "Review of features and machine learning techniques for web searching," in 2016 11th International Conference on Industrial and Information Systems ICIIS, 2016, pp. 312-317.
L. Yi, B. Liu, and X. Li, "Eliminating noisy information in web pages for data mining," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 296-305.
E. Buber and B. Diri, "Web Page Classification Using RNN," Procedia Computer Science, vol. 154, pp. 62-72, 2019.
M. Hashemi, "Web page classification a survey of perspectives, gaps, and future directions," Multimedia Tools and Applications, vol. 79, pp. 11921-11945, 2020.
S. M. Babapour and M. Roostaee, "Web pages classification An effective approach based on text mining techniques," in 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation KBEI, 2017, pp. 0320-0323.
B. A. Alahmadi, P. A. Legg, and J. R. Nurse, "Using internet activity profiling for insider-threat detection," in Proceedings of the 17th International Conference on Enterprise Information Systems WOSIS-2015, 2015, pp. 709-720.
F. De Fausti, F. Pugliese, and D. Zardetto, "Toward Automated Website Classification by Deep Learning," Rivista di Statistica Ufficiale, vol. 3, pp. 9-50, 2020