Self-Healing REST Services Using Artificial Intelligence in Multi-Cloud Environments
DOI:
https://doi.org/10.63345/sjaibt.v1.i3.201Keywords:
Self-Healing Systems, RESTful Services, Artificial Intelligence, Multi-Cloud Computing, Cloud-Native Architecture, Fault Detection, Autonomous Systems, Service Reliability, Machine Learning Monitoring, Distributed Systems.Abstract
The current day digital applications are based on the search of RESTful services and use distributed cloud infrastructures. The problem of service reliability becomes much more complicated as organizations are switching to multi-cloud architecture to enhance their scalability, reliability, and independence with a single vendor. Conventional methods of monitoring and incident-response usually respond very slowly to failures like API spikes in latencies, service failures, container crashes, and configuration errors. These limits cause downtime, reduced performance and operational costs.
Self-healing systems have become a good solution in order to overcome these challenges. A self-healing architecture allows software systems to automatically identify, diagnose, and recover failure automatically without human intervention. Together with Artificial Intelligence (AI), self-healing can forecast possible failures and optimise system behaviour, as well as automatically implement corrective measures. Monitoring systems that are powered by AI can process large amounts of system telemetry, logs and performance metrics to find anomalies and initiate automatic remediation.
This paper suggests a self-healing model of AI to be used by REST services in the context of multi-cloud environments. The framework will combine machine learning-based outlier detection, automated fault diagnosis, predictive analytics and smart orchestration systems. Through cloud monitoring data, distributed tracing, and decision engines based on reinforcing learning, the proposed system is able to monitor the performance of REST services and take automatic recovery measures like restarting a container, rerouting of a service, auto-scaling, and reconfiguring an API gateway.
Downloads
References
• Kephart, J. O., & Chess, D. M. (2003). The vision of autonomic computing. IEEE Computer, 36(1), 41–50.
• Kratzke, N., & Quint, P. C. (2017). Understanding cloud-native applications after 10 years of cloud computing. Journal of Systems and Software, 126, 1–16.
• Basiri, A., et al. (2016). Chaos engineering. IEEE Software, 33(3), 35–41.
• Xu, W., Huang, L., Fox, A., Patterson, D., & Jordan, M. (2009). Detecting large-scale system problems by mining console logs. Proceedings of SOSP.
• Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209.
• Lorido-Botran, T., Miguel-Alonso, J., & Lozano, J. A. (2014). Auto-scaling techniques for elastic applications in cloud environments. Journal of Grid Computing, 12(4), 559–592.
• Zhang, Q., Chen, M., Li, L., & Tang, Z. (2018). AI-based anomaly detection for cloud systems. Future Generation Computer Systems, 87, 898–910.
• Mao, M., & Humphrey, M. (2012). A performance study on the VM startup time in the cloud. IEEE Cloud Computing.
• Villamizar, M., et al. (2015). Infrastructure cost comparison of running web applications in the cloud using AWS Lambda and monolithic architectures. Proceedings of the IEEE Cloud.
• Dragoni, N., et al. (2017). Microservices: Yesterday, today, and tomorrow. Present and Ulterior Software Engineering.
• Newman, S. (2015). Building Microservices. O’Reilly Media.
• Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, Omega, and Kubernetes. Communications of the ACM.
• Hwang, K., Dongarra, J., & Fox, G. (2013). Distributed and Cloud Computing. Morgan Kaufmann.
• Li, Y., et al. (2019). Intelligent fault diagnosis in cloud computing. IEEE Access, 7, 109254–109267.
• Garlan, D., Cheng, S., Huang, A., Schmerl, B., & Steenkiste, P. (2004). Rainbow: Architecture-based self-adaptation with reusable infrastructure. IEEE Computer.
• Chen, T., Fox, E., & Chen, Z. (2018). Reinforcement learning for autonomous cloud resource management. Future Generation Computer Systems.
• Erl, T., Puttini, R., & Mahmood, Z. (2013). Cloud Computing: Concepts, Technology and Architecture. Pearson.
• Balalaie, A., Heydarnoori, A., & Jamshidi, P. (2016). Microservices architecture enables DevOps. IEEE Software.
• Pahl, C. (2015). Containerization and the PaaS cloud. IEEE Cloud Computing.
• Turnbull, J. (2014). The Docker Book. James Turnbull.
Downloads
Additional Files
Published
Issue
Section
License
Copyright (c) 2024 Scientific Journal of Artificial Intelligence and Blockchain Technologies

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The license allows re-users to share and adapt the work, as long as credit is given to the author and don't use it for commercial purposes.