-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about temporal dependencies and feature correlations captured by DMSA #11
Comments
Hi there, |
I have a time series dataset where each column represents a different time series. I am also puzzled about how a single attention operation can capture both temporal dependence and feature correlation? Looking forward your reply |
Hi, first of all, thank you both @Stinger-Wiz @Will-Hor for raising this discussion. In 1, BRITS utilizes LSTM to produce history-based estimation and builds another component to produce feature-based estimation (please refer to Section 4.3 in 1), and then combines both of them to form the final imputation. We claim DMSA can capture the temporal dependencies and feature correlations between time steps with only one attention operation because, different from BRITS, we only need one DMSA, which can capture the temporal and feature correlations between time steps. The attention map has already embedded temporal dependencies between time steps. With diagonal masks applied, as we introduced in Section 3.2.1 in 2, input values at the t-th step can not see themselves and are prohibited from contributing to their own estimations. Consequently, estimations of the t-th step only depend on input values from other steps. It's worth mentioning that the component in BRITS to produce feature-based estimation is specially built to consider correlations between features of each time step, and its input is imputed data of the current step from the LSTM cell, namely this component works on the feature dimension. But DMSA works on the time dimension (this is why captured temporal dependencies and feature correlations are both between time steps). Due to that DMSA's input has already been projected into high dimensions (the features are fused) and SAITS does not make the imputation at this stage, DMSA does not need BRITS' component. If you guys have new findings, you're welcome to share them with me 😊 Many thanks! Footnotes
|
How should we understand the phrase 'DMSA works in the time dimension'? If attention maps represent the attention between each time step, where is the correlation between features reflected? Looking forward your reply🌹 |
Hi, thank you for your patience. The input of DMSA is fused information from the features in a |
Hi, guys @Stinger-Wiz @Will-Hor, does my previous reply sound reasonable to you? If you have any other questions about this issue, feel free to tell me :-) |
Thank you for your detailed explanation, now there is no problem with this research, thank you for your assistance :-) |
@Stinger-Wiz My pleasure. Also thank you very much for your attention to SAITS! If you think it is inspiring or helpful to your work, please star🌟 the repo to help more people notice this work. Also please take a look at our new work PyPOTS which may be useful. 😃 Many thanks for your contribution again! |
thank you very much. Your reply really helped me a lot. |
你好,关于文章当中的自注意力我有问题想请教您。维度为N×N的自注意力矩阵Q·Kt,表示的是长度为N的一种维度之间的注意力关系,而您文章中提到的“Such a mechanism makes DMSA able to capture the temporal dependencies and feature correlations between time steps in the high dimensional space with only one attention operation”,DMSA的一个注意力矩阵能一次性同时捕获到两种维度之间的注意力,想问一次注意力操作捕获到两种类型的注意力是怎么做到的。
The text was updated successfully, but these errors were encountered: