PyTorch implementation of LambdaNetworks: Modeling long-range Interactions without Attention.
Lambda Networks apply associative law of matrix multiplication to reverse the computing order of self-attention, achieving the linear computation complexity regarding content interactions.
Similar techniques have been used previously in A2-Net and CGNL. Check out a collection of self-attention modules in another repository dot-product-attention.
✓ SGD optimizer, initial learning rate 0.1, momentum 0.9, weight decay 0.0001
✓ epoch 130, batch size 256, 8x Tesla V100 GPUs, LR decay strategy cosine
✓ label smoothing 0.1
Architecture | Parameters | FLOPs | Top-1 / Top-5 Acc. (%) | Download |
---|---|---|---|---|
Lambda-ResNet-50 | 14.995M | 6.576G | 78.208 / 93.820 | model | log |