МАСШТАБУВАННЯ ОБЧИСЛЕНЬ ПІД ЧАС ГЕНЕРАЦІЇ ЯК УНІВЕРСАЛЬНИЙ ПРИНЦИП ДЛЯ ГЕНЕРАТИВНИХ МОДЕЛЕЙ: DOI: 10.31673/2412-4338.2025.048926

Віталій Володимирович Бондар; Віра Григорівна Бабенко; Дмитро Євгенович Козлов

Authors

Віталій Володимирович Бондар, (Bondar Vitalii) Cherkasy State Technological University, Cherkasy https://orcid.org/0009-0002-3230-5979
Віра Григорівна Бабенко, (Babenko Vira) Cherkasy State Technological University, Cherkasy https://orcid.org/0000-0003-2039-2841
Дмитро Євгенович Козлов, (Kozlov Dmytro) State University of Information and Communication Technologies, Kyiv, Ukraine https://orcid.org/0009-0007-1454-9036

Abstract

Over the past decade, neural network performance improvements have been achieved primarily through scaling computational resources during training. However, exhaustion of available data and rising energy costs create fundamental limitations for this paradigm. This study identifies common principles of an alternative approach – scaling computation directly during generation, where additional resources are allocated at the deployment stage. The work analyzes conceptual analogies between iterative refinement processes across different generative model architectures. In large language models, scaling is realized through chain-of-thought techniques, where intermediate tokens sequentially refine task representations. Diffusion models achieve analogous effects through multiple denoising steps, transforming noise into structured data. Flow matching models utilize control over integration precision of trajectories between distributions. All approaches share a common principle: allocating additional computation for sequential refinement of probability distribution transformations. The study establishes that under fixed budgets, or limited resources, compact models with additional inference-time computation can outperform architectures an order of magnitude larger. This strategy enables adaptive resource allocation depending on query complexity – a property unattainable with static scaling. Computational graph analysis reveals linear cost scaling with iteration count alongside power-law quality growth. The findings form a theoretical foundation for understanding inference-time scaling as a universal mechanism for enhancing generative system performance. The practical significance lies in shifting optimization paradigms from increasing model size to intelligent computation distribution, opening pathways for more efficient systems under resource constraints.

Keywords: machine learning, generative models, chain-of-thought reasoning, diffusion models, flow matching, large language models, scaling, optimal resource allocation, computational efficiency.

INFERENCE-TIME COMPUTATIONAL SCALING CALCULATIONS AS A UNIVERSAL PRINCIPLE FOR GENERATIVE MODELS

DOI: 10.31673/2412-4338.2025.048926

Authors

Abstract

Downloads

Published

Issue

Section

Developed By

Language

Make a Submission