VIJ Digital library
Articles

Machine Learning for Enhanced Churn Prediction in Banking: Leveraging Oversampling and Stacking Techniques

Omar Faruq
Department of Computer Science and Engineering East West University, Dhaka-1212,
Bio
Fahad Ahammed
Department of Computer Science and Engineering East West University, Dhaka-1212,
Bio
Arifa Sultana Mily
Department of Computer Science and Engineering East West University, Dhaka-1212,
Bio
4Ashraful Islam
Department of Computer Science and Engineering East West University, Dhaka-1212,

Submission to VIJ 2024-09-13

Keywords

  • Customer churn Prediction,
  • Machine Learning,
  • random oversampling,
  • Ensemble Model,
  • oversampling,
  • stacking model,
  • k-fold validation
  • ...More
    Less

Abstract

Every sector of business is getting more competitive as time passes. More and more companies are offering services to people. Banking sector is no different. With the plethora option customer has in terms banking, holding on to customer may prove difficult for banks. This research will help banks to predict which customers are likely going to churn and allow them to take precaution to stop customers from leaving. In this study we have used classifiers such as: K-Neighbors, Random Forest, XGboost, Adaboost classifiers and Ensemble Model (Stacking Technique) that uses all of these models together. The experimentation was conducted on a dataset from Kaggle. The dataset used in this research was heavily imbalanced. So, different oversampling methods like Random Oversampling and SMOTE-ENN have been used. In data preprocessing, label encoding was done and for validation K-folding technique (k-5) have been used. The highest accuracy has been achieved by using Random Oversampling with Stacking Model which is 97.31% (std: 0.0033, k=5).

References

  1. S Neslin, S Gupta, W Kamakura, J Lu, C Mason,"Defection Detection: Improving Predictive Accuracy of Customer Chum Models", Working Paper, Teradata Center at Duke University, 2004.3
  2. Zorić, A. B. (2016). Predicting Customer Churn in Banking Industry using Neural Networks. Interdisciplinary Description of Complex Systems, 14(2), 116–124. https://doi.org/10.7906/indecs.14.2.1
  3. A Lemmens, C Croux, "Bagging and Boosting Classification Trees to predict churn", Journal of Marketing Research, Vol. 43, No. 2, pp 276286.2006
  4. M.-K. Kim, M.-C. Park, and D.-H. Jeong, “The effects of customer satisfaction and switching barrier on customer loyalty in Korean mobile telecommunication services,” Telecommunications policy, vol. 28, no. 2, pp. 145–159, 2004
  5. Y. Deng, D. Li, L. Yang, J. Tang and J. Zhao, "Analysis and prediction of bank user churn based on ensemble learning algorithm," 2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA), Shenyang, China, 2021, pp. 288-291, doi: 10.1109/ICPECA51329.2021.9362520.
  6. Verma, Prashant (2020) "Churn Prediction for Savings Bank Customers: A Machine Learning Approach," Journal of Statistics Applications & Probability: Vol. 9: Iss. 3, Article 10.
  7. S. Jinbo, Li Xiu and L. Wenhuang, "The Application ofAdaBoost in Customer Churn Prediction," 2007 International Conference on Service Systems and Service Management, Chengdu, China, 2007, pp. 1-6, doi:
  8. 10.1109/ICSSSM.2007.4280172.M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  9. M. Rahman and V. Kumar, "Machine Learning Based Customer Churn Prediction In Banking," 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 2020, pp. 1196-1201, doi: 10.1109/ICECA49313.2020.9297529
  10. S. Cui and N. Ding, "Customer churn prediction using improved FCM algorithm," 2017 3rd International Conference on Information Management (ICIM), Chengdu, China, 2017, pp. 112-117, doi: 10.1109/INFOMAN.2017.7950357.
  11. Bharathi S V, Pramod D, Raman R. An Ensemble Model for Predicting Retail Banking Churn in the Youth Segment of Customers. Data. 2022; 7(5):61. https://doi.org/10.3390/data7050061M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  12. Muneer, A., Ali, R. F., Alghamdi, A., Taib, S. M., Almaghthawi, A., & Ghaleb, E. a. A. (2022). Predicting customers churning in banking industry: A machine learning approach. Indonesian Journal of Electrical Engineering and Computer Science, 26(1), 539. https://doi.org/10.11591/ijeecs.v26.i1.pp539-549M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  13. Iranmanesh, Seyed Hossein, et al. "Customer Churn Prediction Using Artificial Neural Network: An Analytical CRM Application." 3rd European International Conference on Industrial Engineering and Operations Management. 2019,M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  14. Xie, Y., Li, X., Ngai, E. W., & Ying, W. (2009). Customer churn prediction using improved balanced random forests. Expert Systems
  15. With Applications, 36(3), 5445–5449.
  16. https://doi.org/10.1016/j.eswa.2008.06.121
  17. Dierckx, G. (2004). Logistic Regression model. Encyclopedia of Actuarial Science. https://doi.org/10.1002/9780470012505.tal017M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  18. Slimani, C., Wu, C., Rubini, S., Chang, Y., & Boukhobza, J. (2023). Accelerating random forest on Memory-Constrained Devices through data storage optimization. IEEE Transactions on Computers, 72(6), 1595–1609. https://doi.org/10.1109/tc.2022.3215898M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  19. Kutschenreiter-Praszkiewicz, I. (2023). Development of a neural network structure for identifying begin-end points in the assembly process. Journal of Machine Engineering. https://doi.org/10.36897/jme/163318M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  20. Medvedev, A., Delvenne, J., & Lambiotte, R. (2018). Modelling structure and predicting dynamics of discussion threads in online boards. Journal of Complex Networks, 7(1), 67–82. https://doi.org/10.1093/comnet/cny010M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
  21. Zhang, Y., & Wang, L. (2023). An AdaBoost Method with K′KMeans Bayes Classifier for Imbalanced Data. Mathematics, 11(8), 1878. https://doi.org/10.3390/math11081878.
  22. Singh, P. (2023). Intepretable Deep Gaussian Naive Bayes Algorithm (IDGNBA) based task offloading framework for Edge-Cloud computing. International Journal of Data Informatics and Intelligent Computing, 2(2), 1–10. https://doi.org/10.59461/ijdiic.v2i2.57
  23. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
  24. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1006/jcss.1997.1504
  25. Cover, T., & Hart, P. (1967). Nearest-neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/TIT.1967.1053964
  26. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785-794). https://doi.org/10.1145/2939672.2939785
  27. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1
  28. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (pp. 1137-1143). https://dl.acm.org/doi/10.5555/1643031.1643047