Skip to content

[m1p 2025] Convergence of the loss function surface in transformer neural network architectures

License

Notifications You must be signed in to change notification settings

intsystems/2025-Project-182

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Convergence of the loss function surface in transformer neural network architectures

License GitHub Contributors GitHub Issues GitHub Pull Requests

Author Egor Petrov
Consultant Nikita Kiselev
Advisor Andrey Grabovoy, PhD

Assets

Abstract

Training a neural network involves searching for the minimum point of the loss function, which defines the surface in the space of model parameters. The properties of this surface are determined by the chosen architecture, the loss function, and the training data. Existing studies show that as the number of objects in the sample increases, the surface of the loss function ceases to change significantly. The paper obtains an estimate for the convergence of the surface of the loss function for the transformer architecture of a neural network with attention layers, as well as conducts computational experiments that confirm the obtained theoretical results. In this paper, we propose a theoretical estimate for the minimum sample size required to train a model with any predetermined acceptable error, providing experiments that prove the theoretical boundaries.

Citation

If you find our work helpful, please cite us.

@article{petrov2025transformerlandscape,
    title={Convergence of the loss function surface in transformer neural network architectures},
    author={Egor Petrov, Nikita Kiselev, Vladislav Meshkov, Andrey Grabovoy},
    year={2025}
}

Licence

Our project is MIT licensed. See LICENSE for details.

About

[m1p 2025] Convergence of the loss function surface in transformer neural network architectures

Topics

Resources

License

Stars

Watchers

Forks

Languages