DEVELOPMENT OF AN INFORMATION TECHNOLOGY FOR AUTOMATED GENERATION AND TESTING OF SOFTWARE FOR MACHINE LEARNING TASKS USING LARGE LANGUAGE MODELS
DOI: 10.31673/2412-4338.2026.019011
Abstract
The article considers the problem of automating the processes of developing and testing software for machine learning tasks in the context of the rapid development of large language models and intelligent programming tools. The increasing complexity of artificial intelligence systems and the growing volume of data require the development of new approaches to organizing the lifecycle of ML systems that combine automated code generation, data analysis, and software quality assurance. The aim of this study is to improve the efficiency of developing and validating software for machine learning tasks by designing an information technology for automated generation and testing of software solutions based on multi-agent systems and large language models.
The article analyzes recent research in the fields of AI-augmented software engineering, testing of machine learning systems, and the application of agent-based architectures for programming automation. A conceptual architecture of a multi-agent information system is proposed that implements the complete pipeline for developing machine learning programs, ranging from infrastructure preparation and data analysis to program code generation, testing, and scientific interpretation of results. The system consists of several specialized agents, including an execution coordinator, a data analysis agent, a machine learning model development agent, a testing agent, and a results interpretation agent. To ensure the reliability of the developed technology, a three-level testing subsystem is proposed, which includes validation of input data quality, testing of the generated program code, and evaluation of the quality of the obtained machine learning models using statistical metrics.
The system implementation is performed in the Google Colab environment or on a local computer using a local Ollama server with Python libraries for data analysis and machine learning, including scikit-learn, pandas, and matplotlib, as well as support for large language models deployed locally or in cloud environments. An experimental evaluation of the system was conducted using standard datasets for classification and regression tasks. The obtained results demonstrate the effectiveness of the proposed approach and confirm the feasibility of using multi-agent systems and large language models to automate the development and testing of software solutions in the field of machine learning..
Keywords: machine learning, multi-agent systems, large language models, programming automation, software testing, AI-augmented software engineering.
References
[1] Itransition. (2026, January 27). Machine learning statistics for 2026: The ultimate list. Itransition. https://www.itransition.com/machine-learning/statistics
[2] Riccio, V., Jahangirova, G., Stocco, A., Humbatova, N., Weiss, M., & Tonella, P. (2020). Testing machine learning–based systems: A systematic mapping. Empirical Software Engineering, 25(6), 5193–5254. https://doi.org/10.1007/s10664-020-09881-0
[3] Al Alamin, M. A., & Uddin, G. (2021). Quality assurance challenges for machine learning software applications during software development life cycle phases. In 2021 IEEE International Conference on Autonomous Systems (ICAS). IEEE. https://doi.org/10.1109/ICAS49788.2021.9551151
[4] Breck, E., Cai, S., Nielsen, E., Salib, M., & Sculley, D. (2017). The ML test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 1123–1132). https://doi.org/10.1109/BigData.2017.8258038.
[5] Hutter, F., Kotthoff, L., & Vanschoren, J. (Eds.). (2019). Automated machine learning: Methods, systems, challenges. Springer. https://doi.org/10.1007/978-3-030-05318-5
[6] Schieferdecker, I. K. (2024). Augmenting software engineering with AI and developing it further towards AI-assisted model-driven software engineering. arXiv. https://arxiv.org/abs/2409.18048
[7] Akhtar, S., & Aftab, S. (2025). Towards AI-augmented software engineering: A theoretical framework. ICCK Journal of Software Engineering, 1, 124. https://doi.org/10.62762/JSE.2025.407864
[8] Nyaga, F. (2025). AI-driven software engineering: A systematic review of machine learning’s impact and future directions. Preprints. https://doi.org/10.20944/preprints202504.0174.v1
[9] Yang, Y., Xia, X., Lo, D., & Grundy, J. (2022). A survey on deep learning for software engineering. ACM Computing Surveys, 54(10s), Article 206. https://doi.org/10.1145/3505243
[10] Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., Nagappan, N., Nushi, B., & Zimmermann, T. (2019). Software engineering for machine learning: A case study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '19) (pp. 291–300). IEEE. https://doi.org/10.1109/ICSE-SEIP.2019.00042
[11] Liu, Y., Wang, Z., & Zhang, L. (2025). A survey on code generation with LLM-based agents. arXiv. https://arxiv.org/abs/2508.00083
[12] Alenezi, M., & Akour, M. (2025). AI-driven innovations in software engineering: A review of current practices and future directions. Applied Sciences, 15. https://doi.org/10.3390/app15031344
[13] Wang, L., Ma, C., Feng, X. et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 18, 186345 (2024). https://doi.org/10.1007/s11704-024-40231-1
[14] Wang, Z., Su, K., Zhang, J., Jia, H., Ye, Q., Xie, X., & Lu, Z. (2023). Multi-agent automated machine learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 11960–11969).
[15] Guo, T., Chen, X., Wang, Y., Chang, R., Peng, S., Chawla, N. V., Wawro, P., & Zhang, C. (2024). Large language model based multi-agent systems: A survey of progress and challenges. arXiv preprint arXiv:2408.11903. https://arxiv.org/abs/2408.11903
[16] Coleman, S. & Wilson, D. A Comprehensive Evaluation of Privacy-Preserving Mechanisms in Cloud-Based Big Data Analytics: Challenges and Future Research Directions. Preprints 2026, 2026011025. https://doi.org/10.20944/preprints202601.1025.v1
[17] Huang, D., & Wang, Z. (2025). LLMs at the edge: Performance and efficiency evaluation with Ollama on diverse hardware. In Proceedings of the International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/IJCNN64981.2025.11228317
[18] Palma, G., Cecchi, G., Caronna, M., & Rizzo, A. (2025). Leveraging large language models for scalable and explainable cybersecurity log analysis. Journal of Cybersecurity and Privacy, 5(3), 55. https://doi.org/10.3390/jcp5030055
[19] Jiang, N., Liu, K., Chen, T., & Liang, J. (2025). LLM-based multi-agent systems for software engineering: Literature review, vision, and the road ahead. ACM Transactions on Software Engineering and Methodology. https://doi.org/10.1145/3712003
[20] Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., & Wang, Q. (2024). Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2024.3368208
[21] Chandrasekaran, A., & Mahmood, Q. H. (2025). A review of large language models for automated test case generation. Systems, 7(3), 97. https://doi.org/10.3390/systems7030097
[22] Liu, D., Upadhyay, K., Chhetri, V., Siddique, A. B., & Farooq, U. (2026). A large-scale study on the development and issues of multi-agent AI systems. arXiv. https://arxiv.org/abs/2601.07136
[23] Riccio, V., Jahangirova, G., Stocco, A., Humbatova, N., Weiss, M., & Tonella, P. (2020). Testing machine learning–based systems: A systematic mapping. Empirical Software Engineering, 25(6), 5193–5254. https://doi.org/10.1007/s10664-020-09881-0