An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis
Main Article Content
Abstract
Nowadays, there are many types of digital materials that can be used in the classroom. Students and scholars are migrating from textbooks to digital study materials because textbooks are too large and expensive. Teachers and college students can use and modify the materials that are freely available or with some constraints for their learning and teaching. E-content can be designed, evolved, utilized, re-used, and distributed electronically from anywhere at anytime. Because of the flexibility of time, place, and speed of learning, e-content is becoming extremely popular. It can be readily and instantly shared and communicated with an infinite number of clients all across the globe. Document clustering is most commonly used to group documents that are related to a specific topic. Text document clustering can be used to group a collection of documents regarding the information they include and to deliver search results when a user searches the internet. In this paper mainly focuses on text document clustering to cope with massive collection of E-Content documents. Enhanced Expectation Maximization Text Document Clustering (EEMTDC) clustering algorithm was proposed and compared with Expectation Maximization (EM) clustering, K-Means clustering, and Hierarchical clustering (HC) algorithms. The experiment shows that the performance of proposed EEMTDC algorithm produces greater clustering accuracy than existing clustering algorithms.
Article Details
References
Ramkumar, A.S.; Nethravathy, R. Text Document Clustering using K-means Algorithm. Int. Res. J. Eng. Technol. 2019, 6, 1164–1168.
Jensi, R.; Wiselin , J. A Survey on Optimization Approaches to Text Document Clustering. Int. J. Comput. Sci. Appl. 2013, 3, 31–44.
Selvaraj, S.; Choi, E. Survey of Swarm Intelligence Algorithms. In ICSIM ’20: Proceedings of the 3rd International Conference on
Software Engineering and Information Management; Association for Computing Machinery: New York, NY, USA, 2020; pp. 69–73.
Abualigah, Laith Mohammad, Ahamad Tajudin Khader, Mohammed Azmi Al-Betar, and Osama Ahmad Alomari. "Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering." Expert Systems with Applications 84 (2017): 24-36.
Abualigah, Laith Mohammad, Ahamad Tajudin Khader, and Essam Said Hanandeh. "A novel weighting scheme applied to improve the text document clustering techniques." In Innovative Computing, Optimization and Its Applications, pp. 305-320. Springer, Cham, 2018.
Abasi, Ammar Kamal, Ahamad Tajudin Khader, Mohammed Azmi Al-Betar, Syibrah Naim, Sharif Naser Makhadmeh, and Zaid Abdi Alkareem Alyasseri. "A Text Feature Selection Technique based on Binary Multi-Verse Optimizer for Text Clustering." In 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), pp. 1-6. IEEE, 2019.
Ailem, M., Role, F., & Nadif, M. (2015, October). Co-clustering document-term matrices by direct maximization of graph modularity. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 1807-1810). ACM.
Ailem, M., Role, F., & Nadif, M. (2017). Sparse poisson latent block model for document clustering. IEEE Transactions on Knowledge and Data Engineering, 29(7), 1563-1576
Allahyari, Mehdi, Seyedamin Pouriyeh, Mehdi Assefi, Saied Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, and Krys Kochut. "A brief survey of text mining: Classification, clustering and extraction techniques." arXiv preprint arXiv:1707.02919 (2017).
Beil, Florian, Martin Ester, and Xiaowei Xu. "Frequent term-based text clustering." In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 436-442. ACM, 2002.
Foong, O. M., & Yong, S. P. (2016). Swarm LSA-PSO clustering model in text summarization. Int. J. Advance Soft Compu. Appl, 8(3).
Hu, Jian, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, and Zheng Chen. "Enhancing text clustering by leveraging Wikipedia semantics."
Jensi, R., and Dr G. Wiselin Jiji. "A survey on optimization approaches to text document clustering." arXiv preprint arXiv:1401.2229 (2014).
Jo, Taeho. "Text Clustering: Approaches." In Text Mining, pp. 203-224. Springer, Cham, 2019.
Karol, Stuti, and Veenu Mangat. "Evaluation of text document clustering approach based on particle swarm optimization." Open Computer Science 3, no. 2 (2013): 69-90.
De Vries, C.M. Document Clustering Algorithms, Representations and Evaluation for Information Retrieval. Ph.D. Thesis, Queensland University of Technology, Brisbane City, Australia, 2014.
Zhou, C.; Gao, H.; Gao, L.; Zhang, W.G. Particle Swarm Optimization (PSO) Algorithm. Appl. Res. Comput.2003, 12, 7–11.