Why is there no activation function applied to the 1x1 conv that produces the dense output?

I have been trying to understand why there is no activation function applied to the 1x1 conv that is used between the residual connections.   From what I understand having a linear layer with no activation function does not really add to the expressive power of the model.  The skip connections eventually have a relu applied so that does make sense to me.  However, the linear output of the residual connections has no activation applied as far as I can tell.  It is just added to the residual bus and fed into the next layer.  What is the point of having the 1x1 convolution in this case? Why not just skip the 1x1 convolution and add the filter * gate directly to the inputs to create the dense output?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why is there no activation function applied to the 1x1 conv that produces the dense output? #404

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions