Distributed Data Engineering Models for Real-Time Fraud Monitoring in FinTech Systems

Authors

  • Dileep Valiki Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3117-5481/AIJCST-V4I6P104

Keywords:

Fraud detection, Data flow, Data lake, Semantic segmentation, Archi­tec­tural model, FinTech sys­tems

Abstract

Fraud in FinTech systems, namely in-banking and insurances represents a complex and multifactorial phenomenon that has been further amplified by social distancing and confinement measures to cope with the COVID-19 crisis. According to selected reports, fraud in Europe alone is estimated at 2.2 billion Euros for 20224 and is expected to continue its otherwise upward trend in different markets. To mitigate risks and minimize losses, private and public organizations invest continuously on fraud monitoring systems. However, fraud crime continuously adapts its detection methods and hence, systems are still required to evolve in the fraud investigation and detection area. To facilitate benchmark and comparison of different approaches, the application of an open-source distributed and scalable data engineering model is applied to create fraud monitoring systems in a FinTech environment. Fraud monitoring systems in distributed environments require different and specific execution models in an integrated and open-source environment. Data engineering platforms allow integration of different data sources and types with the possibility of executing distributed data engineering tasks in batch, streaming, micro-batch and real-time modes. An integrated analytical environment is used to support fraud monitoring in enterprises outside the banking sector. GDPR and data anonymization measures are used to guarantee the data privacy with a clear impact of the climate change index in a fraud model. Detection of anti-money laundering events related to COVID-19 is also presented and discussed. Finally, a case study in the European Banking Federation area is further analyzed.

References

[1] Akidau, T., Chernyak, S., & Lax, R. (2018). Streaming systems: The what, where, when, and how of large-scale data processing. O’Reilly Media.

[2] Vadisetty, R., Polamarasetti, A., Guntupalli, R., Raghunath, V., Jyothi, V. K., & Kudithipudi, K. (2022). AI-Driven Cybersecurity: Enhancing Cloud Security with Machine Learning and AI Agents. Sateesh kumar and Raghunath, Vedaprada and Jyothi, Vinaya Kumar and Kudithipudi, Karthik, AI-Driven Cybersecurity: Enhancing Cloud Security with Machine Learning and AI Agents (February 07, 2022).

[3] Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A distributed messaging system for log processing. NetDB Workshop.

[4] Nagabhyru, K. C. (2022). Bridging Traditional ETL Pipelines with AI Enhanced Data Workflows: Foundations of Intelligent Automation in Data Engineering. Available at SSRN 5505199.

[5] Akidau, T., et al. (2015). The dataflow model. Proceedings of the VLDB Endowment, 8(12), 1792–1803.

[6] Amistapuram, K. (2022). Fraud Detection and Risk Modeling in Insurance: Early Adoption of Machine Learning in Claims Processing. Available at SSRN 5741982.

[7] Kleppmann, M. (2017). Designing data-intensive applications. O’Reilly Media.

[8] Rongali, S. K. (2022). AI-Driven Automation in Healthcare Claims and EHR Processing Using MuleSoft and Machine Learning Pipelines. Available at SSRN 5763022.

[9] Bass, L., Clements, P., & Kazman, R. (2013). Software architecture in practice (3rd ed.). Addison-Wesley.

[10] Varri, D. B. S. (2022). A Framework for Cloud-Integrated Database Hardening in Hybrid AWS-Azure Environments: Security Posture Automation Through Wiz-Driven Insights. International Journal of Scientific Research and Modern Technology, 1(12), 216-226.

[11] Mell, P., & Grance, T. (2011). The NIST definition of cloud computing. NIST.

[12] Vadisetty, R., Polamarasetti, A., Guntupalli, R., Raghunath, V., Jyothi, V. K., & Kudithipudi, K. (2021). Privacy-Preserving Gen AI in Multi-Tenant Cloud Environments. Sateesh kumar and Raghunath, Vedaprada and Jyothi, Vinaya Kumar and Kudithipudi, Karthik, Privacy-Preserving Gen AI in Multi-Tenant Cloud Environments (January 20, 2021).

[13] Armbrust, M., et al. (2010). A view of cloud computing. Communications of the ACM, 53(4), 50–58.

[14] Siva Hemanth Kolla. (2022). Knowledge Retrieval Systems for Enterprise Service Environments. International Journal of Intelligent Systems and Applications in Engineering, 10(3s), 495–506. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/8037

[15] Hohpe, G., & Woolf, B. (2003). Enterprise integration patterns. Addison-Wesley.

[16] Davuluri, P. N. Event-Driven Compliance Systems: Modernizing Financial Crime Detection Without Machine Intelligence.

[17] Tanenbaum, A. S., & Van Steen, M. (2017). Distributed systems (3rd ed.). Pearson.

[18] Inala, R. (2022). Engineering Data Products for Investment Analytics: The Role of Product Master Data and Scalable Big Data Solutions. International Journal of Scientific Research and Modern Technology, 155-171.

[19] Gilbert, S., & Lynch, N. (2002). Brewer’s conjecture and feasibility of consistent systems. ACM SIGACT News, 33(2), 51–59.

[20] Aitha, A. R. (2022). Deep Neural Networks for Property Risk Prediction Leveraging Aerial and Satellite Imaging. International Journal of Communication Networks and Information Security (IJCNIS), 14(3), 1308-1318.

[21] Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and practice (3rd ed.). OTexts.

[22] Gottimukkala, V. R. R. (2022). Licensing Innovation in the Financial Messaging Ecosystem: Business Models and Global Compliance Impact. International Journal of Scientific Research and Modern Technology, 1(12), 177-186.

[23] Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). Statistical and machine learning forecasting methods. PLOS ONE, 13(3), e0194889.

[24] Segireddy, A. R. (2022). Terraform and Ansible in Building Resilient Cloud-Native Payment Architectures. International Journal of Intelligent Systems and Applications in Engineering, 10, 444-455.

[25] Bandara, K., Bergmeir, C., & Smyl, S. (2021). Forecasting across time series databases. Data Mining and Knowledge Discovery, 35, 1–41.

[26] Yandamuri, U. S. (2022). Big Data Pipelines for Cross-Domain Decision Support: A Cloud-Centric Approach. International Journal of Scientific Research and Modern Technology, 227.

[27] Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR. International Journal of Forecasting, 36(3), 1181–1191.

[28] Garapati, R. S. (2022). Web-Centric Cloud Framework for Real-Time Monitoring and Risk Prediction in Clinical Trials Using Machine Learning. Current Research in Public Health, 2, 1346.

[29] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning. Springer.

[30] Amistapuram, K. Energy-Efficient System Design for High-Volume Insurance Applications in Cloud-Native Environments. International Journal of Innovative Research in Electrical, Electronics, Instrumentation and Control Engineering (IJIREEICE), DOI, 10.

[31] Vapnik, V. (1998). Statistical learning theory. Wiley.

[32] Inala, R. Advancing Group Insurance Solutions Through Ai-Enhanced Technology Architectures And Big Data Insights.

[33] Paszke, A., et al. (2019). PyTorch. NeurIPS, 8024–8035.

[34] Aitha, A. R. (2021). Optimizing Data Warehousing for Large Scale Policy Management Using Advanced ETL Frameworks.

[35] Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1–58.

[36] Davuluri, P. N. (2020). Improving Data Quality and Lineage in Regulated Financial Data Platforms. Finance and Economics, 1(1), 1-14.

[37] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I trust you? KDD, 1135–1144.

[38] Segireddy, A. R. (2021). Containerization and Microservices in Payment Systems: A Study of Kubernetes and Docker in Financial Applications. Universal Journal of Business and Management, 1(1), 1-17.

[39] Rudin, C. (2019). Stop explaining black box models. Nature Machine Intelligence, 1, 206–215.

[40] Varri, D. B. S. (2022). AI-Driven Risk Assessment And Compliance Automation In Multi-Cloud Environments. Available at SSRN 5774924.

[41] Polyzotis, N., Roy, S., Whang, S. E., & Zinkevich, M. (2018). Data management challenges in ML. SIGMOD Record, 47(2), 34–43.

[42] Vadisetty, R., Polamarasetti, A., Guntupalli, R., Rongali, S. K., Raghunath, V., Jyothi, V. K., & Kudithipudi, K. (2021). Legal and Ethical Considerations for Hosting GenAI on the Cloud. International Journal of AI, BigData, Computational and Management Studies, 2(2), 28-34.

[43] McAfee, A., & Brynjolfsson, E. (2012). Big data. Harvard Business Review, 90(10), 60–68.

[44] Yandamuri, U. S. (2022). Cloud-Based Data Integration Architectures for Scalable Enterprise Analytics. International Journal of Intelligent Systems and Applications in Engineering, 10, 472-483.

[45] Kimball, R., & Ross, M. (2013). The data warehouse toolkit (3rd ed.). Wiley.

[46] Rongali, S. K. (2021). Cloud-Native API-Led Integration Using MuleSoft and .NET for Scalable Healthcare Interop-erability. Journal for ReAttach Therapy and Developmental Diversities, 4(2), 181-192.

[47] Golfarelli, M., & Rizzi, S. (2009). Data warehouse design. McGraw-Hill.

[48] Gottimukkala, V. R. R. (2021). Digital Signal Processing Challenges in Financial Messaging Systems: Case Studies in High-Volume SWIFT Flows.

[49] Bernstein, P. A., & Newcomer, E. (2009). Principles of transaction processing. Morgan Kaufmann.

[50] Yandamuri, U. S. (2021). A Comparative Study of Traditional Reporting Systems versus Real-Time Analytics Dashboards in Enterprise Operations. Universal Journal of Business and Management.

[51] Garcia-Molina, H., & Salem, K. (1987). Sagas. SIGMOD, 249–259.

[52] Ramesh Inala. (2022). Cross-Domain MDM Integration Using AI-Driven Data Governance: A Case Study In Financial Technology Architecture. Migration Letters, 19(2), 280–304. Retrieved from https://migrationletters.com/index.php/ml/article/view/11982

[53] Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing. IEEE IoT Journal, 3(5), 637–646.

[54] Kolla, S. H. (2021). Rule-Based Automation for IT Service Management Workflows. Online Journal of Engineering Sciences, 1(1), 1–14. Retrieved from https://www.scipublications.com/journal/index.php/ojes/article/view/1360

[55] Mao, Y., You, C., Zhang, J., Huang, K., & Letaief, K. (2017). Mobile edge computing. IEEE Communications Surveys & Tutorials, 19(4), 2322–2358.

[56] Rongali, S. K. (2020). Predictive

[57] Modeling and Machine Learning Frameworks for Early Disease Detection in Healthcare Data Systems. Current Research in Public Health, 1(1), 1-15.

[58] Roman, R., Lopez, J., & Mambo, M. (2018). Mobile edge computing security. IEEE IoT Journal, 5(6), 4504–4516.

[59] Sicari, S., Rizzardi, A., Grieco, L., & Coen-Porisini, A. (2015). Security in IoT. Computer Networks, 76, 146–164.

[60] Gottimukkala, V. R. R. (2020). Energy-Efficient Design Patterns for Large-Scale Banking Applications Deployed on AWS Cloud. power, 9(12).

[61] Solove, D. J., & Schwartz, P. M. (2018). Information privacy law (6th ed.). Wolters Kluwer.

[62] Segireddy, A. R. (2020). Cloud Migration Strategies for High-Volume Financial Messaging Systems. ISO/IEC. (2018). ISO/IEC 27018.

[63] Garapati, R. S. (2022). AI-Augmented Virtual Health Assistant: A Web-Based Solution for Personalized Medication Management and Patient Engagement. Available at SSRN 5639650.

[64] Sakimura, N., et al. (2014). OpenID Connect Core 1.0.

[65] Amistapuram, K. (2021). Digital Transformation in Insurance: Migrating Enterprise Policy Systems to .NET Core. Universal Journal of Computer Sciences and Communications, 1(1), 1-17.

[66] Chaum, D. (1985). Security without identification. Communications of the ACM, 28(10), 1030–1044.

[67] Davuluri, P. N. (2020). Event-Driven Architectures for Real-Time Regulatory Monitoring in Global Banking.

[68] Van der Aalst, W. (2016). Process mining. Springer.

[69] Aitha, A. R. (2022). Cloud Native ETL Pipelines for Real Time Claims Processing in Large Scale Insurers. Available at SSRN 5532601.

[70] Augusto, A., et al. (2019). Automated discovery of process models. ACM Computing Surveys, 52(5), 1–43.

[71] Little, J. D. C. (1961). L = λW. Operations Research, 9(3), 383–387.

[72] Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). Impossibility of consensus. Journal of the ACM, 32(2), 374–382.

[73] Gray, J., & Lamport, L. (2006). Consensus on transaction commit. ACM TODS, 31(1), 133–160.

[74] Skarlat, O., et al. (2018). Optimized IoT service placement. IEEE TSC, 13(1), 1–13.

[75] Xu, Y., et al. (2019). Dynamic resource allocation in fog computing. IEEE Access, 7, 118217–118230.

[76] Deng, R., et al. (2016). Optimal workload allocation. IEEE TVT, 66(8), 7287–7299.

[77] Zhang, K., et al. (2017). Energy-efficient offloading. IEEE Access, 5, 13965–13976.

[78] Varri, D. B. S. (2021). Cloud-Native Security Architecture for Hybrid Healthcare Infrastructure. Available at SSRN 5785982.

[79] Zhang, Y., Chen, M., & Li, S. (2020). Edge intelligence. Proceedings of the IEEE, 108(8), 1–26.

[80] Manyika, J., et al. (2011). Big data: The next frontier. McKinsey Global Institute.

[81] Kolla, S. K. (2021). Architectural Frameworks for Large-Scale Electronic Health Record Data Platforms. Current Research in Public Health, 1(1), 1–19. Retrieved from https://www.scipublications.com/journal/index.php/crph/article/view/1372

Downloads

Published

2022-11-15

Issue

Section

Articles

How to Cite

[1]
D. Valiki, “Distributed Data Engineering Models for Real-Time Fraud Monitoring in FinTech Systems”, AIJCST, vol. 4, no. 6, pp. 33–43, Nov. 2022, doi: 10.63282/3117-5481/AIJCST-V4I6P104.

Similar Articles

21-30 of 144

You may also start an advanced similarity search for this article.