Comparative Performance of Machine Learning Techniques for Sentiment and Insight Analysis in Amazon Product Reviews

Authors

  • Usha Mohani kavirayani MS in Computer Science, Kent State University. Author
  • Krishna Bhardwaj Mylavarapu MS in Computer Science, University of Illinois Springfield. Author
  • Jenitha Pilli MS in Computer Science, University of Louisiana at Lafayette. Author
  • Prathik Kumar Jannu Computer Science Engineering, JNTU Hyderabad. Author
  • Javed Ali Mohammad Masters in Telecommunications, Middlesex University. Author
  • Sri Harsha Panchali Information Systems Engineer, CrowdStrike Inc. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V4I1P106

Keywords:

Sentiment Analysis, TF-IDF, Amazon Product Reviews, Consumer Behavior, Machine Learning

Abstract

The sentiment analysis of online product review has been a pressing concern in perception and support of sound decision-making using information provided by e-commerce portals. Through a random forest (RF) classifier and a sophisticated preprocessor pipeline, this paper proposes a useful sentiment classification model of Amazon product reviews. The data collection is made in the beginning of the research process with Amazon Reviews (2018). It is pre-processed in several stages, such as special character removal, stop-word removal, tokenization and part-of-speech (POS) tagging to normalize and standardize the textual data. TF-IDF is an extraction tool which converts text into meaningful numerical processes. RF model is then trained to predict customer sentiment and its ensemble structure is employed to overfitting and maximize predictive stability. The measures of performance in the model include accuracy, precision, recall, F1-score and loss. RF classifier is an excellent accuracy (ACC) rate of 98.83, a high precision (PRE) rate of 97.70, a high recall (REC) rate of 98.64, and a balanced F1-score (F1) rate of 98.17 compared to other traditional and deep learning classifiers such as the Logistic Regression, SVM, NB-SVM, and CNN-RNN. Overall, the RF-based sentiment analysis model is a highly trustworthy, scalable and efficient way of obtaining insights using a huge volume of e-commerce reviews data.

References

T. U. Haque, N. N. Saber, and F. M. Shah, “Sentiment analysis on large scale Amazon product reviews,” in 2018 IEEE International Conference on Innovative Research and Development (ICIRD), IEEE, May 2018, pp. 1–6. doi: 10.1109/ICIRD.2018.8376299.

[2] M. Khanuja and L. Dewangan, “An Efficient Approach for Sentiment Classification using Logistic Regression,” JETIR, vol. 5, no. 6, pp. 476–481, 2018.

[3] B. Bansal and S. Srivastava, “Sentiment classification of online consumer reviews using word vector representations,” Procedia Comput. Sci., vol. 132, pp. 1147–1153, 2018, doi: 10.1016/j.procs.2018.05.029.

[4] H. Kim and Y.-S. Jeong, “Sentiment Classification Using Convolutional Neural Networks,” Appl. Sci., vol. 9, no. 11, pp. 1–14, Jun. 2019, doi: 10.3390/app9112347.

[5] S. Garg, “Predictive Analytics and Auto Remediation using Artificial Intelligence and Machine learning in Cloud Computing Operations,” Int. J. Innov. Res. Eng. Multidiscip. Phys. Sci., vol. 7, no. 2, pp. 1–5, 2019, doi: 10.5281/zenodo.15362327.

[6] N. Shrestha and F. Nasoz, “Deep Learning Sentiment Analysis of Amazon.Com Reviews and Ratings,” Int. J. Soft Comput. Artif. Intell. Appl., vol. 8, no. 1, pp. 01–15, 2019, doi: 10.5121/ijscai.2019.8101.

[7] E. D. Wahyuni and A. Djunaidy, “Fake Review Detection From a Product Review Using Modified Method of Iterative Computation Framework,” MATEC Web Conf., vol. 58, p. 03003, May 2016, doi: 10.1051/matecconf/20165803003.

[8] A. Alrehili and K. Albalawi, “Sentiment Analysis of Customer Reviews Using Ensemble Method,” in 2019 International Conference on Computer and Information Sciences (ICCIS), IEEE, Apr. 2019, pp. 1–6. doi: 10.1109/ICCISci.2019.8716454.

[9] S. Ahmed and F. Muhammad, “Using Boosting Approaches to Detect Spam Reviews,” in 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), IEEE, May 2019, pp. 1–6. doi: 10.1109/ICASERT.2019.8934467.

[10] R. Hossain, F. Ahamed, R. Zannat, and M. G. Rabbani, “Comparative Sentiment Analysis using Difference Types of Machine Learning Algorithm,” in 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), IEEE, Nov. 2019, pp. 329–333. doi: 10.1109/SMART46866.2019.9117274.

[11] U. W. Wijayanto and R. Sarno, “An Experimental Study of Supervised Sentiment Analysis Using Gaussian Naïve Bayes,” in 2018 International Seminar on Application for Technology of Information and Communication, IEEE, Sep. 2018, pp. 476–481. doi: 10.1109/ISEMANTIC.2018.8549788.

[12] F. Khurshid, Y. Zhu, C. W. Yohannese, and M. Iqbal, “Recital of supervised learning on review spam detection: An empirical analysis,” in 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), IEEE, Nov. 2017, pp. 1–6. doi: 10.1109/ISKE.2017.8258755.

[13] Z. Singla, S. Randhawa, and S. Jain, “Sentiment analysis of customer product reviews using machine learning,” in 2017 International Conference on Intelligent Computing and Control (I2C2), IEEE, Jun. 2017, pp. 1–5. doi: 10.1109/I2C2.2017.8321910.

[14] A. Ejaz, Z. Turabee, M. Rahim, and S. Khoja, “Opinion mining approaches on Amazon product reviews: A comparative study,” in 2017 International Conference on Information and Communication Technologies (ICICT), IEEE, Dec. 2017, pp. 173–179. doi: 10.1109/ICICT.2017.8320185.

[15] X. Fang and J. Zhan, “Sentiment analysis using product review data,” J. Big Data, 2015, doi: 10.1186/s40537-015-0015-2.

[16] H. Nguyen et al., “Comparative Study of Sentiment Analysis with Product Reviews Using Machine Learning and Lexicon-Based Approaches,” SMU Data Sci. Rev., vol. 1, no. 4, pp. 1–22, 2018.

[17] K. Jain and S. Kaushal, “A Comparative Study of Machine Learning and Deep Learning Techniques for Sentiment Analysis,” in 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), IEEE, Aug. 2018, pp. 483–487. doi: 10.1109/ICRITO.2018.8748793.

[18] A. S. Rathor, A. Agarwal, and P. Dimri, “Comparative Study of Machine Learning Approaches for Amazon Reviews,” Procedia Comput. Sci., vol. 132, pp. 1552–1561, 2018, doi: 10.1016/j.procs.2018.05.119.

[19] M. Shaheen, “Sentiment Analysis on Mobile Phone Reviews Using Supervised Learning Techniques,” Int. J. Mod. Educ. Comput. Sci., vol. 11, no. 7, pp. 32–43, 2019, doi: 10.5815/ijmecs.2019.07.04.

[20] P. Pathak, A. Shrivastava, and S. Gupta, “A survey on various security issues in delay tolerant networks,” J Adv Shell Progr., vol. 2, no. 2, pp. 12–18, 2015.

[21] I. Ud Din et al., “The Internet of Things: A Review of Enabled Technologies and Future Challenges,” IEEE Access, vol. 7, pp. 7606–7640, 2019, doi: 10.1109/ACCESS.2018.2886601.

[22] Y. Kim, J. Nam, T. Park, S. Scott-Hayward, and S. Shin, “SODA: A software-defined security framework for IoT environments,” Comput. Networks, vol. 163, p. 106889, Nov. 2019, doi: 10.1016/j.comnet.2019.106889.

[23] F. A. Ruambo and J. A. Mwakatobe, “Virtualizing the Iot Ecosystem: a Brief Review, Addressing Nfv Strategies,” Int. J. Eng. Appl. Sci. Technol., vol. 4, no. 3, pp. 322–331, 2019, doi: 10.33564/ijeast.2019.v04i03.053.

[24] R. Horvath, D. Nedbal, and M. Stieninger, “A Literature Review on Challenges and Effects of Software Defined Networking,” Procedia Comput. Sci., vol. 64, pp. 552–561, 2015, doi: 10.1016/j.procs.2015.08.563.

[25] Y. Li and M. Chen, “Software-Defined Network Function Virtualization: A Survey,” IEEE Access, vol. 3, pp. 2542–2553, 2015, doi: 10.1109/ACCESS.2015.2499271.

[26] W. Ben Jaballah, M. Conti, and C. Lal, “A Survey on Software-Defined VANETs: Benefits, Challenges, and Future Directions,” May 2019, doi: 10.48550/arXiv.1904.04577.

[27] Y. Lu, “Industry 4.0: A survey on technologies, applications and open research issues,” J. Ind. Inf. Integr., vol. 6, pp. 1–10, Jun. 2017, doi: 10.1016/j.jii.2017.04.005.

[28] B. Rodrigues, F. Cerveira, R. Barbosa, and J. Bernardino, “Virtualization: Past and Present Challenges,” in Proceedings of the 13th International Conference on Software Technologies, 2018, pp. 755–761. doi: 10.5220/0006910707550761.

[29] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, and R. Boutaba, “Network Function Virtualization: State-of-the-Art and Research Challenges,” IEEE Commun. Surv. Tutorials, vol. 18, no. 1, pp. 236–262, 2016, doi: 10.1109/COMST.2015.2477041.

[30] I. Alam et al., “IoT Virtualization: A Survey of Software Definition & Function Virtualization Techniques for Internet of Things,” pp. 1–30, 2019.

[31] S. K. Tayyaba, M. A. Shah, O. A. Khan, and A. W. Ahmed, “Software Defined Network (SDN) Based Internet of Things (IoT),” in Proceedings of the International Conference on Future Networks and Distributed Systems, 2017, pp. 1–8. doi: 10.1145/3102304.3102319.

[32] C. Tipantuna and P. Yanchapaxi, “Network functions virtualization: An overview and open-source projects,” in 2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM), IEEE, Oct. 2017, pp. 1–6. doi: 10.1109/ETCM.2017.8247541.

[33] X. Hesselbach, J. R. Amazonas, S. Villanueva, and J. F. Botero, “Coordinated node and link mapping VNE using a new paths algebra strategy,” J. Netw. Comput. Appl., vol. 69, pp. 14–26, Jul. 2016, doi: 10.1016/j.jnca.2016.02.025.

[34] A. Kushwaha, P. Pathak, and S. Gupta, “Review of optimize load balancing algorithms in cloud,” Int. J. Distrib. Cloud Comput., vol. 4, no. 2, pp. 1–9, 2016.

[35] K. E. U. Ahmed, J. Blech, M. A. Gregory, and H. (Heinz) W. Schmidt, “Software Defined Networks in Industrial Automation,” J. Sens. Actuator Networks, vol. 7, no. 3, p. 33, Aug. 2018, doi: 10.3390/jsan7030033.

[36] M. Karakus and A. Durresi, “A survey: Control plane scalability issues and approaches in Software-Defined Networking (SDN),” Comput. Networks, vol. 112, pp. 279–293, Jan. 2017, doi: 10.1016/j.comnet.2016.11.017.

[37] B. A. A. Nunes, M. Mendonca, X.-N. Nguyen, K. Obraczka, and T. Turletti, “A Survey of Software-Defined Networking: Past, Present, and Future of Programmable Networks,” IEEE Commun. Surv. Tutorials, vol. 16, no. 3, pp. 1617–1634, 2014, doi: 10.1109/SURV.2014.012214.00180.

[38] A. R, Samiksha, A. S, and J. S. K, “Efficient operating system level virtualization techniques for cloud resources,” IOP Conf. Ser. Mater. Sci. Eng., vol. 263, p. 042002, Nov. 2017, doi: 10.1088/1757-899X/263/4/042002.

[39] A. Wang, Z. Zha, Y. Guo, and S. Chen, “Software-Defined Networking Enhanced Edge Computing: A Network-Centric Survey,” Proc. IEEE, vol. 107, no. 8, pp. 1500–1519, Aug. 2019, doi: 10.1109/JPROC.2019.2924377.

[40] N. M. M. K. Chowdhury and R. Boutaba, “Network virtualization: state of the art and research challenges,” IEEE Commun. Mag., vol. 47, no. 7, pp. 20–26, Jul. 2009, doi: 10.1109/MCOM.2009.5183468.

[41] I. Ullah, S. Ahmad, F. Mehmood, and D. Kim, “Cloud Based IoT Network Virtualization for Supporting Dynamic Connectivity among Connected Devices,” Electronics, vol. 8, no. 7, p. 742, Jun. 2019, doi: 10.3390/electronics8070742.

[42] H. Yang, S. Kumara, S. T. S. Bukkapatnam, and F. Tsung, “The internet of things for smart manufacturing: A review,” IISE Trans., vol. 51, no. 11, pp. 1190–1216, Nov. 2019, doi: 10.1080/24725854.2018.1555383.

[43] I. Bedhief, L. Foschini, P. Bellavista, M. Kassar, and T. Aguili, “Toward Self-Adaptive Software Defined Fog Networking Architecture for IIoT and Industry 4.0,” in 2019 IEEE 24th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), 2019, pp. 1–5. doi: 10.1109/CAMAD.2019.8858499.

[44] Polu, A. R., Buddula, D. V. K. R., Narra, B., Gupta, A., Vattikonda, N., & Patchipulusu, H. (2021). Evolution of AI in Software Development and Cybersecurity: Unifying Automation, Innovation, and Protection in the Digital Age. Available at SSRN 5266517.

[45] Padur, S. K. R. (2020). From centralized control to democratized insights: Migrating enterprise reporting from IBM Cognos to Microsoft Power BI. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, 6(1), 218-225.

[46] Bitkuri, V., Kendyala, R., Kurma, J., Mamidala, V., Enokkaren, S. J., & Attipalli, A. (2021). Systematic Review of Artificial Intelligence Techniques for Enhancing Financial Reporting and Regulatory Compliance. International Journal of Emerging Trends in Computer Science and Information Technology, 2(4), 73-80.

[47] Padur, S. K. R. (2019). Machine learning for predictive capacity planning: Evolution from analytical modeling to autonomous infrastructure. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 5(5), 285-293.

[48] Attipalli, A., Enokkaren, S., BITKURI, V., Kendyala, R., KURMA, J., & Mamidala, J. V. (2021). Enhancing Cloud Infrastructure Security Through AI-Powered Big Data Anomaly Detection. Available at SSRN 5741305.

[49] Singh, A. A. S., Tamilmani, V., Maniar, V., Kothamaram, R. R., Rajendran, D., & Namburi, V. D. (2021). Predictive Modeling for Classification of SMS Spam Using NLP and ML Techniques. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 2(4), 60-69.

[50] Padur, S. K. R. (2020). AI augmented disaster recovery simulations: From chaos engineering to autonomous resilience orchestration. International Journal of Scientific Research in Science, Engineering and Technology, 7(6), 367-378.

[51] Reddy Padur, S. K. (2021). From Scripts to Platforms-as-Code: The Role of Terraform and Ansible in Declarative Infrastructure Rollouts. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 621-628.

[52] Kothamaram, R. R., Rajendran, D., Namburi, V. D., Singh, A. A. S., Tamilmani, V., & Maniar, V. (2021). A Survey of Adoption Challenges and Barriers in Implementing Digital Payroll Management Systems in Across Organizations. International Journal of Emerging Research in Engineering and Technology, 2(2), 64-72.

[53] Padur, S. K. R. (2018). Autonomous cloud economics: AI driven right sizing and cost optimization in hybrid infrastructures. International Journal of Scientific Research in Science and Technology, 4(5), 2090-2097.

[54] Rajendran, D., Namburi, V. D., Singh, A. A. S., Tamilmani, V., Maniar, V., & Kothamaram, R. R. (2021). Anomaly Identification in IoT-Networks Using Artificial Intelligence-Based Data-Driven Techniques in Cloud Environmen. International Journal of Emerging Trends in Computer Science and Information Technology, 2(2), 83-91.

[55] Padur, S. K. R. (2021). Bridging Human, System, and Cloud Integration through RESTful Automation and Governance. the International Journal of Science, Engineering and Technology, 9(6).

[56] Attipalli, A., BITKURI, V., KURMA, J., Enokkaren, S., Kendyala, R., & Mamidala, J. V. (2021). A Survey of Artificial Intelligence Methods in Liquidity Risk Management: Challenges and Future Directions. Available at SSRN 5741342.

[57] Padur, S. K. R. (2021). From Control to Code: Governance Models for Multi-Cloud ERP Modernization. International Journal of Scientific Research & Engineering Trends, 7(3).

[58] Routhu, K. K. (2021). Harnessing AI Dashboards in Oracle Cloud HCM: Advancing Predictive Workforce Intelligence and Managerial Agility. International Journal of Scientific Research & Engineering Trends, 7(6).

[59] Padur, S. K. R. (2021). Deep learning and process mining for ERP anomaly detection: Toward predictive and self-monitoring enterprise platforms. Available at SSRN 5605531.

Downloads

Published

2022-01-19

Issue

Section

Articles

How to Cite

[1]
U. M. kavirayani, K. B. Mylavarapu, J. Pilli, P. K. Jannu, J. A. Mohammad, and S. H. Panchali, “Comparative Performance of Machine Learning Techniques for Sentiment and Insight Analysis in Amazon Product Reviews”, AIJCST, vol. 4, no. 1, pp. 54–64, Jan. 2022, doi: 10.63282/3117-5481/AIJCST-V4I1P106.

Similar Articles

1-10 of 137

You may also start an advanced similarity search for this article.