You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Awesome repo. I have a question regarding the architecture are token interaction. Don't you think the way HTSAT creates (256, 256) tensor from (1024,64) spectrogram causes problematic token interaction when cyclic window shifting?
What I understood is that you cut the spectrogram (1024,64) into 4 PIECES along dim=0 (256,64) each. Later these 4 are concatenated along dim=1 resulting in the final tensor of shape (256,256). On this when you do cyclic window shift, results in window comprises of tokens from two different PIECES, that is some from low-frequency region of PIECE 1 and some from high frequency region from PIECE 2.
The text was updated successfully, but these errors were encountered:
Sorry for the late reply, I was busy with other stuff this quarter.
You ask a good question that if the cyclic window shifting will cause the overlapping or wrong information loss of the feature during the downsampling process.
My answer is not. And it is not because we downsample it by 2 each time and with 2 x 2 x 2 = 8 three times in total. In this case, all features "at the edge" of each (256, 64) piece will not share any wrong information into other piece because the downsample rate 8 is not big enough to compress (256, 64) into (1,1). This is considered when I did the project and that is why I think it would be not a problem.
But if you increase your downsample rate, it should be a problem. In this case, I think giving up converting from (1024, 64) to (256, 256) and directly process (1024, 64) shape will be an option. In all, the reason why I do the conversion is because we need to use the swin-transformer checkpoint.
Hi,
Awesome repo. I have a question regarding the architecture are token interaction. Don't you think the way HTSAT creates (256, 256) tensor from (1024,64) spectrogram causes problematic token interaction when cyclic window shifting?
What I understood is that you cut the spectrogram (1024,64) into 4 PIECES along dim=0 (256,64) each. Later these 4 are concatenated along dim=1 resulting in the final tensor of shape (256,256). On this when you do cyclic window shift, results in window comprises of tokens from two different PIECES, that is some from low-frequency region of PIECE 1 and some from high frequency region from PIECE 2.
The text was updated successfully, but these errors were encountered: