Integrated Anomaly Detection in Automated Cloud Data Engineering Frameworks
DOI:
https://doi.org/10.63345/Abstract
This paper gives an example of an automated data engineering solution that focuses on large-scale cloud analytics and attempts to solve issues such as high data velocity, variable workloads, and numerous sources of heterogeneous data. The proposed model aims to achieve maximum performance and improved reliability, while saving on manual steps by utilizing a single pipeline combining automated ingestion control, adaptive data transformation, and resource optimization in a closed-loop manner. The model has been tested to provide a data ingestion throughput of 6.7 units in comparison to traditional cloud analytics platforms, and thus improving analytics throughput by up to 28 percent, and reducing end to end pipeline latency to 360 units. In terms of tested variable workloads, the proposed model showed an average of 94 percent for scalability, and 9 units for elasticity defined as the immediate response to adaptive demand by the model. The model showed data processing consistency of 97.9 percent and accuracy of 97.2 percent through confidence-based filtering, and early validation and schema alignment, in the end reducing automation overhead to 11 percent, demonstrating a high degree of operational autonomy. Enhanced system reliability and availability improved to 99.5 percent and fault recovery time was recorded at 16 units boosting system robustness. In improved resource utilization, the model achieved 91 percent balancing optimal arrangement of computing resources relative to workload, and overall reducing operational resource expenditure. In general, the findings support that the proposed framework provides a reliable, cost-effective, and scalable foundation for next-generation cloud analytics systems in dynamic environments.
Downloads
References
[1] N. Serrano, G. Gallardo, and J. Hernantes, “Infrastructure as a service and cloud technologies,” IEEE Software, vol. 32, no. 2, pp. 30–36, 2015.
[2] K. Curran and S. Carlin, “Cloud computing security,” International Journal of Ambient Computing and Intelligence, vol. 3, no. 1, pp. 14–19, 2011.
[3] A. Bouayad, A. Blilat, N. E. H. Mejhed, and M. E. Ghazi, “Cloud computing: Security challenges,” in Proc. Colloquium in Information Science and Technology (CIST), Fez, Morocco, Oct. 2012, pp. 26–31.
[4] J. Rittinghouse and J. Ransome, Cloud Computing: Implementation, Management, and Security, 1st ed. Boca Raton, FL, USA: CRC Press, 2009.
[5] K. Hwang, J. Dongarra, and G. C. Fox, Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, 1st ed. San Francisco, CA, USA: Morgan Kaufmann, 2011.
[6] I. Bermudez, S. Traverso, M. Munafò, and M. Mellia, “A distributed architecture for the monitoring of clouds and CDNs: Applications to Amazon AWS,” IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 516–529, 2014.
[7] K. Tamrakar, A. Yazidi, and H. Haugerud, “Cost efficient batch processing in Amazon cloud with deadline awareness,” in Proc. IEEE AINA, Taipei, Taiwan, Mar. 2017, pp. 963–971.
[8] A. Ekwe-Ekwe and A. Barker, “Location, location, location: Exploring Amazon EC2 spot instance pricing across geographical regions,” in Proc. IEEE/ACM CCGRID, Washington, DC, USA, May 2018, pp. 370–373.
[9] Rucco, Chiara, Longo, Antonella & Saad, Motaz, Enhancing Data Ingestion Efficiency in Cloud-Based Systems: A Design Pattern Approach, Data Sci. and Eng., vol. 1, pp. 1-17, 2025, doi:10.1007/s41019-025-00300-2.
[10] Nambiar, A. & Mundra, D., An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management, Big Data Cogn. Comput., vol. 6, no. 4, p. 132, 2022, doi:10.3390/bdcc6040132.
[11] Naamane, Z., A Systematic Literature Review: Benefits and Challenges of Cloud-Based Big Data Analytics, Issues Inf. Syst., vol. 24, no. 1, pp. 291-304, 2023, doi:10.48009/1_iis_2023_291-304.
[12] Miryala, N. & Gupta, D., Big Data Analytics in Cloud – Comparative Study, Int. J. Comput. Trends Technol., vol. 71, no. 12, pp. 35-43, 2023, doi:10.14445/22312803/IJCTT-V71I12P105.
[13] A. Iordache, C. Morin, N. Parlavantzas, E. Feller, and P. Riteau, “Resilin: Elastic MapReduce over multiple clouds,” in Proc. IEEE/ACM CCGRID, Delft, Netherlands, May 2013, pp. 261–268.
[14] N. Chalvantzis, I. Konstantinou, and N. Kozyris, “BBQ: Elastic MapReduce over cloud platforms,” in Proc. IEEE/ACM CCGRID, Madrid, Spain, May 2017, pp. 766–771.
[15] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” in Proc. IEEE MSST, Lake Tahoe, NV, USA, May 2010, pp. 1–10.
[16] C. Mavani, H. Mistry, A. M. Goswami, S. S. Raghavan and R. Patel, "A Computationally-Efficient and Transparent AI Framework for Real-Time Intrusion Detection in Cybersecurity Applications," 2025 IEEE 4th World Conference on Applied Intelligence and Computing (AIC), GB Nagar, Gwalior, India, 2025, pp. 1-11, doi: 10.1109/AIC66080.2025.11212123.
[17] S. G. Manikandan and S. Ravi, “Big data analysis using Apache Hadoop,” in Proc. ICITCS, Beijing, China, Oct. 2014, pp. 1–4.
[18] J. Leverich and C. Kozyrakis, “On the energy (in)efficiency of Hadoop clusters,” SIGOPS Operating Systems Review, vol. 44, no. 1, pp. 61–65, 2010.
[19] A. S. Lakshmi, M. BalRaju, and N. S. Chandra, “Towards optimization of Hadoop MapReduce jobs on cloud,” in Proc. CAST, Pune, India, Dec. 2016, pp. 255–260.
[20] Devi, Odugu Rama, Webber, Julian, Mehbodniya, Abolfazl, Chaitanya, Morsa, Jawarkar, Parag S., Soni, Mukesh, Miah, Shahajan, The Future Development Direction of Cloud-Associated Edge-Computing Security in the Era of 5G as Edge Intelligence, Scientific Programming, 2022, 1473901, 13 pages, 2022. https://doi.org/10.1155/2022/1473901
[21] R. Trestian, P. Shah, H. X. Nguyen, Q.-T. Vien, O. Gemikonakli, and B. Barn, “Towards connecting people, locations and real-world events in a cellular network,” Telematics and Informatics, vol. 34, no. 1, pp. 244–271, 2017.
[22] N. Maheshwari, R. Nanduri, and V. Varma, “Dynamic energy-efficient data placement and cluster reconfiguration algorithm for MapReduce framework,” Future Generation Computer Systems, vol. 28, no. 1, pp. 119–127, 2012.
[23] V. Shah and H. Trivedi, “A distributed dynamic and customized load balancing algorithm for virtual instances,” in Proc. NUiCONE, Ahmedabad, India, Nov. 2015, pp. 1–6.
[24] T. Gunarathne, B. Zhang, T. L. Wu, and J. Qiu, “Scalable parallel computing on clouds using Twister4Azure iterative MapReduce,” Future Generation Computer Systems, vol. 29, no. 4, pp. 1035–1048, 2013.
[25] Z. Li, C. Yang, K. Liu, F. Hu, and B. Jin, “Automatic scaling Hadoop in the cloud for efficient process of big geospatial data,” ISPRS International Journal of Geo-Information, vol. 5, no. 10, p. 173, 2016.
[26] Y.-W. Chen, S.-H. Hung, C.-H. Tu, and C.-W. Yeh, “Virtual Hadoop: MapReduce over Docker containers with an auto-scaling mechanism for heterogeneous environments,” in Proc. RACS, Odense, Denmark, Oct. 2016, pp. 201–206.
[27] Q. Fu, N. Timkovich, P. Riteau, and K. Keahey, “A step towards Hadoop dynamic scaling,” in Proc. IEEE HPCC/SmartCity/DSS, Exeter, UK, June 2018, pp. 67–74.
[28] B. Yadranjiaghdam, S. Yasrobi, and N. Tabrizi, “Developing a real-time data analytics framework for Twitter streaming data,” in Proc. IEEE BigData Congress, Honolulu, HI, USA, June 2017, pp. 329–336.
[29] Soni, M., Khan, S.F., Mavaluru, D., Mohammad, S. (2023). Efficient Fog-to-Cloud Internet-of-Medical-Things System. In: Tiwari, R., Koundal, D., Upadhyay, S. (eds) Image Based Computing for Food and Health Analytics: Requirements, Challenges, Solutions and Practices. Springer, Cham. https://doi.org/10.1007/978-3-031-22959-6_13
[30] MISTRY, H., Goswami, A., & Mavani, C. (2024). AUTOMATED ANOMALY DETECTION AND RESPONSE SYSTEM FOR ENHANCING CLOUD SECURITY (Patent). Zenodo. https://doi.org/10.5281/zenodo.18778285
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Scientific Journal of Artificial Intelligence and Blockchain Technologies

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
The license allows re-users to share and adapt the work, as long as credit is given to the author and don't use it for commercial purposes.