Question of tracking the untrackable #1

litingfeng · 2017-10-17T07:32:20Z

Hi,

I just read the paper Tracking The Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies and I have a question which I hope you can give me some hints:

What is the dimension of the similarity score ( vector or number)? Say if t_i connects d_j, is the score denoted as $\phi(t_i,d_j)$ or is the output of some process of $\phi(t_i,d_j)$ ?

I am looking forward to your answer. Thank you very much,

abhineet123 · 2017-10-17T13:06:55Z

The similarity score is a single number that is produced by the target RNN using the feature vector $\phi(t_i,d_j)$ as input.

litingfeng · 2017-10-17T13:13:18Z

@abhineet123 Did you mean that $\phi(t_i,d_j)$ is the input of target RNN(O) and score is the output? But in Figure 2. , $\phi(t_i,d_j)$ is the output of fc layer following RNN(O).

abhineet123 · 2017-10-17T13:27:28Z

Yes, that is correct.
The target RNN applies the softmax classifier and cross entropy loss to the feature vector $\phi(t_i,d_j)$ to produce the similarity score.

This is mentioned in the last line of 3.5 (ii) (first para of column 2 on page 5):

"Our target RNN is also trained to perform the task of data association – outputs the score of whether a detection (d) corresponds to a target (t) from $\phi(t_i,d_j)$ using a Softmax classiﬁer and cross-entropy loss."

It seems that the target RNN produces both $\phi(t_i,d_j)$ and the similarity score.

litingfeng · 2017-10-17T13:37:35Z

@abhineet123 I think I can understand it now. Thank you very much for your elaborate explanation.

litingfeng · 2017-10-18T03:10:09Z

Hi,

I have another question here about Figure 3. In the second paragraph on page 4( section 3.2), $\phi_{t}^{A}$ is a 500-D vector, $\phi_{j}$ is a H-D vector, which is 128 according to Implementation Details(2nd paragraph page 6). However, both of them are the output of the same CNN, why is it different of their dimension?

abhineet123 · 2017-10-18T04:02:07Z

The paper does not clearly mention how the same CNN is outputting both 500 and 128-D feature vectors but Fig. 3 does show what looks like an extra layer on top of the CNNs corresponding to the 500-D outputs which might indicate another layer that performs this conversion.
This seems to be confirmed by the last para of 3.2 that mentions that they used a pre-trained VGGNet as the appearance feature extractor after replacing its last FC layer with one of their own that produces 500-D vectors.

The reason behind this difference seems simple enough.
The 500-D vectors $\phi_{t}^{A}$ correspond to the appearance history of the target and are all passed through the LSTM to generate a single H dimensional vector $\phi_{i}$ that represents the overall target appearance by fusing information from all of these vectors.

This vector is directly comparable to the H-D feature vector $\phi_{j}$ corresponding to the candidate detection which is probably produced by the CNN without this last FC layer that they added.
The two vectors are thus concatenated to generate the 2H-D vector that is finally processed by the Siamese classification network whose output is also a 500-D vector.
Sec. 4.3 mentions that the Siamese classification network is constructed using the same CNN as used for appearance feature extraction.
This suggests that the FC layer of the Siamese network that produces this final 500-D output is similar to the FC layer that converts the H-D CNN output to the 500-D vectors $\phi_{t}^{A}$ . Since it is a Siamese network, however, it contains a pair of these feature extractor CNNs and its FC layer has been trained to distinguish between two of these H-D vectors instead of simply mapping one into a 500-D vector.

litingfeng · 2017-10-18T09:11:27Z

In summary, you mean that the 500-D vector of $\phi_{t}^{A}$ is the output of the feature extractor CNN with a 500 units of fc layer following a 128 units of fc layer, and the $\phi_{j}$ is the output of incomplete version of CNN which doesn't have the 500 units fc layer. I'm not sure whether I comprehend your point in the right way. In addition, I don't quite understand the last sentence and the architecture of the siamese network. Does it look like the figure ?

abhineet123 · 2017-10-18T14:38:30Z

Yes that is what I mean.
It is impossible to say exactly how the Siamese network is designed until they release their code but yes, this is roughly what I had in mind.

litingfeng · 2017-10-18T14:44:58Z

@abhineet123 I am really grateful for your reply. Later I will ask the author for more details.

abhineet123 · 2017-10-18T14:50:28Z

Glad to be of assistance and please let me know what the authors have to say about this.

litingfeng · 2017-10-28T15:51:42Z

Hi,

I asked the author but haven't received response yet. Here I have another question： do you know how to train the LSTM in this appearance model？ Is LSTM included in siamese CNN？ I'm still connused about the training procedure. Thank you very much.

abhineet123 · 2017-10-28T17:05:46Z

No I am also waiting for the authors to release their code to get the details of the training procedure.

swamika001 · 2018-02-12T18:58:10Z

Hi,
In 3.3 Motion, the authors wrote that velocities are extracted by their motion feature extractor. Does anyone have a clue on what algorithm it could be ?

abhineet123 · 2018-02-12T19:20:37Z

Probably some kind of optical flow algorithm like cvCalcOpticalFlowPyrLK.
This is what they used in an earlier version of this paper.

nidhinkrishnanv · 2018-04-01T09:55:27Z

Hi,

In 3.5 Target, any idea what is the input sequence to the Target RNN. The authors mention that the output of appearance, motion and interaction are concatenated and passed to Target RNN. But then how do that result in a sequence?

tonmoyborah · 2018-04-11T07:43:38Z

Has the code been released yet or any implementation available?

abhineet123 · 2018-04-11T12:29:26Z

Not as far as I know.

behappyZheng · 2018-10-10T05:59:22Z

hi, is the the paper Tracking The Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies have code to be implementation

abhineet123 · 2018-10-10T13:16:07Z

Not that I am aware of.

icesohelrana · 2018-11-18T01:25:31Z

Do you have any intuition about input image size fed into CNN? VGG16 takes 224*224 size as input and produces 28055 as first FC layer. But, a person's size(height and width) would not be a square. So how they cropped the image? If input image size is different, then first FC layer will be different.

abhineet123 · 2018-11-18T01:44:23Z

In their earlier paper, they extract the patch and then resize it to a fixed size (224*224 in your case) without preserving the aspect ratio. Though the patch becomes distorted to human eyes, it probably doesn't make any difference to the CNN as long as test patches are distorted in the same way as training ones.

tianzhihen · 2020-02-19T06:36:06Z

hi, is the the paper Tracking The Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies have code to be implementation?

abhineet123 · 2020-02-19T13:18:33Z

Not that I'm aware of.

tianzhihen · 2020-02-21T04:06:03Z

Is there a tracking method using self-attention（such as transformer、BERT） recently?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question of tracking the untrackable #1

Question of tracking the untrackable #1

litingfeng commented Oct 17, 2017

abhineet123 commented Oct 17, 2017

litingfeng commented Oct 17, 2017

abhineet123 commented Oct 17, 2017

litingfeng commented Oct 17, 2017

litingfeng commented Oct 18, 2017

abhineet123 commented Oct 18, 2017 •

edited

litingfeng commented Oct 18, 2017

abhineet123 commented Oct 18, 2017

litingfeng commented Oct 18, 2017

abhineet123 commented Oct 18, 2017

litingfeng commented Oct 28, 2017

abhineet123 commented Oct 28, 2017

swamika001 commented Feb 12, 2018

abhineet123 commented Feb 12, 2018

nidhinkrishnanv commented Apr 1, 2018

tonmoyborah commented Apr 11, 2018

abhineet123 commented Apr 11, 2018

behappyZheng commented Oct 10, 2018

abhineet123 commented Oct 10, 2018

icesohelrana commented Nov 18, 2018

abhineet123 commented Nov 18, 2018

tianzhihen commented Feb 19, 2020

abhineet123 commented Feb 19, 2020

tianzhihen commented Feb 21, 2020

Question of tracking the untrackable #1

Question of tracking the untrackable #1

Comments

litingfeng commented Oct 17, 2017

abhineet123 commented Oct 17, 2017

litingfeng commented Oct 17, 2017

abhineet123 commented Oct 17, 2017

litingfeng commented Oct 17, 2017

litingfeng commented Oct 18, 2017

abhineet123 commented Oct 18, 2017 • edited

litingfeng commented Oct 18, 2017

abhineet123 commented Oct 18, 2017

litingfeng commented Oct 18, 2017

abhineet123 commented Oct 18, 2017

litingfeng commented Oct 28, 2017

abhineet123 commented Oct 28, 2017

swamika001 commented Feb 12, 2018

abhineet123 commented Feb 12, 2018

nidhinkrishnanv commented Apr 1, 2018

tonmoyborah commented Apr 11, 2018

abhineet123 commented Apr 11, 2018

behappyZheng commented Oct 10, 2018

abhineet123 commented Oct 10, 2018

icesohelrana commented Nov 18, 2018

abhineet123 commented Nov 18, 2018

tianzhihen commented Feb 19, 2020

abhineet123 commented Feb 19, 2020

tianzhihen commented Feb 21, 2020

abhineet123 commented Oct 18, 2017 •

edited