Convergence of the loss function surface in transformer neural network architectures

Author	Egor Petrov
Consultant	Nikita Kiselev
Advisor	Andrey Grabovoy, PhD

Assets

Abstract

Training a neural network involves searching for the minimum point of the loss function, which defines the surface in the space of model parameters. The properties of this surface are determined by the chosen architecture, the loss function, and the training data. Existing studies show that as the number of objects in the sample increases, the surface of the loss function ceases to change significantly. The paper obtains an estimate for the convergence of the surface of the loss function for the transformer architecture of a neural network with attention layers, as well as conducts computational experiments that confirm the obtained theoretical results. In this paper, we propose a theoretical estimate for the minimum sample size required to train a model with any predetermined acceptable error, providing experiments that prove the theoretical boundaries.

Citation

If you find our work helpful, please cite us.

@article{petrov2025transformerlandscape,
    title={Convergence of the loss function surface in transformer neural network architectures},
    author={Egor Petrov, Nikita Kiselev, Vladislav Meshkov, Andrey Grabovoy},
    year={2025}
}

Licence

Our project is MIT licensed. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
code		code
paper		paper
slides		slides
.gitignore		.gitignore
LICENSE		LICENSE
LINKREVIEW.md		LINKREVIEW.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Convergence of the loss function surface in transformer neural network architectures

Assets

Abstract

Citation

Licence

About

Contributors 2

Languages

License

intsystems/2025-Project-182

Folders and files

Latest commit

History

Repository files navigation

Convergence of the loss function surface in transformer neural network architectures

Assets

Abstract

Citation

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages