ANALYSIS OF ML MODELS LIFECYCLE MANAGEMENT EVOLUTION IN ANALYTICAL DATA PLATFORMS
DOI: 10.31673/2412-4338.2024.026874
Abstract
The analysis of how principles and processes for operationalization of ML models lifecycle were changing along with evolution of data analytics platform up to Data Mesh was done in this article. Firstly, overview of ML operations in OLTP, OLAP and data lake type of platform is performed. While describing ML operations in Data Mesh, analysis of how MLOps recommendations are aligned with core Data Mesh principles is performed with identification of challenges and pitfalls. The challenge of data accessibility for ML models in cross business domains environment with respect to domain-oriented decentralized data ownership and architecture principle is highlighted as one of the most challenging and critical. In a pursuit of resolving identified challenge, article provides analysis of federated learning and feature mesh approaches comparing perspectives of each approach. Finally, conclusions and analysis of remaining and consequential challenges are represented along with topics for further research.
Keywords: data mesh, MLOps, ML product, OLTP, OLAP, federated learning.
References
1. Stavros Harizopoulos, Daniel J. Abadi, Samuel Madden, and Michael Stonebraker. OLTP
through the looking glass, and what we found there. Making Databases Work: the Pragmatic Wisdom of
Michael Stonebraker. Association for Computing Machinery and Morgan & Claypool. 2018. P. 409–439.
2. Forresi, C., Gallinucci, E., Golfarelli, M. et al. A dataspace-based framework for OLAP
analyses in a high-variety multistore. The VLDB Journal 30, 1017–1040 (2021).
3. Michael Armbrust, Ali Ghodsi, Reynold Xin, Matei Zaharia. Lakehouse: A New Generation
of Open Platforms that Unify Data Warehousing and Advanced Analytics, 2020 Retrieved from
https://www.databricks.com/wp-content/uploads/2020/12/cidr_lakehouse.pdf.
4. D. Kreuzberger, N. Kühl and S. Hirschl, "Machine Learning Operations (MLOps): Overview,
Definition, and Architecture," in IEEE Access, vol. 11, pp. 31866-31879, 2023.
5. Vlasiuk, Y., Onyshchenko, V. (2023). Data Mesh as Distributed Data Platform for Large
Enterprise Companies. In: Hu, Z., Dychka, I., He, M. (eds) Advances in Computer Science for
Engineering and Education VI. ICCSEEA 2023. Lecture Notes on Data Engineering and Communications
Technologies, vol 181. Springer, Cham.
6. Dehghani Z. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. 2019.
/ Retrieved from: https://martinfowler.com/articles/data-monolith-to-mesh.html
7. T. Granlund, A. Kopponen, V. Stirbu, L. Myllyaho and T. Mikkonen, "MLOps Challenges
in Multi-Organization Setup: Experiences from Two Real-World Cases," 2021 IEEE/ACM 1st Workshop
on AI Engineering - Software Engineering for AI (WAIN), Madrid, Spain, 2021, pp. 82-88,
8. Dominik Kreuzberger, Niklas Kühl, Sebastian Hirschl. Machine Learning Operations
(MLOps): Overview, Definition, and Architecture. 2022.
9. Alex Buck, Tobias Zimmergren, Regan Downer and Liz Casey Operationalize data mesh for
AI/ML domain driven feature engineering. 2023. / Retrieved from: https://learn.microsoft.com/enus/azure/cloud-adoption-framework/scenarios/cloud-scale-analytics/architectures/operationalize-datamesh-for-ai-ml
10. Haoyuan Li Empowering Data Mesh with Federated Learning. 2023. / Retrieved from:
https://www.diva-portal.org/smash/get/diva2:1787862/FULLTEXT01.pdf