@@ -91,6 +91,34 @@ <h2>Abstract</h2>
91
91
</ section >
92
92
< br >
93
93
94
+ < section >
95
+ < div class ="container ">
96
+ < div class ="row ">
97
+ < div class ="col-12 text-center ">
98
+ < h2 > Preliminaries</ h2 >
99
+ <!-- <hr style="margin-top:0px"> -->
100
+ </ div >
101
+ < br >
102
+ < div class ="col-12 ">
103
+ < p class ="text-left ">
104
+ < center >
105
+ < img src ="images/preliminary.png " style ="width:75%; margin-bottom:20px; ">
106
+ </ center >
107
+ <!-- <h4>Mean-Shift</h4> -->
108
+ <!-- We revisit the mean-shift and introduce contrastive mean-shift (CMS) learning for generalized category discovery. -->
109
+ Mean-shift is a classic, powerful technique for mode seeking and clustering analysis. It assigns each data point a
110
+ corresponding mode through iterative shifts by kernel-weighted aggregation of neighboring points. The set of data
111
+ points that converge to the same mode defines the basin of attraction of that mode, and this naturally relates to
112
+ clustering: the points in the same basin of attraction are associated with the same cluster.
113
+ < br >
114
+ </ p >
115
+ < br >
116
+
117
+ </ div >
118
+ </ div >
119
+ </ section >
120
+ < br >
121
+
94
122
< section >
95
123
< div class ="container ">
96
124
< div class ="row ">
@@ -101,16 +129,33 @@ <h2>Methods</h2>
101
129
< br >
102
130
< div class ="col-12 ">
103
131
< p class ="text-left ">
104
- < img src ="images/overview.png " style ="width:100%; margin-top:10px; margin-bottom:10px; "">
105
- < h4 > Contrastive Mean-Shift learning</ h4 >
132
+ < h4 > Learning framework: Contrastive Mean-Shift learning (CMS)</ h4 >
133
+ < center >
134
+ < img src ="images/overview.png " style ="width:100%; margin-top:10px; margin-bottom:20px; ">
135
+ </ center >
106
136
<!-- We revisit the mean-shift and introduce contrastive mean-shift (CMS) learning for generalized category discovery. -->
107
137
Given a collection of images, each initial image embedding $\boldsymbol{v}_i$ from an image encoder takes a single step of
108
138
mean shift to be $\boldsymbol{z}_i$ by aggregating its $k$ nearest neighbors with a weight kernel $\varphi(\cdot)$. The
109
139
encoder network is then updated by contrastive learning with the mean-shifted embeddings, which draws a mean-shifted embedding
110
140
of image $x_{i}$ and that of its augmented image $x_{i}^{+}$ closer and pushes those of distinct images apart from each other.
111
141
< br >
112
142
< br >
113
- < h4 > Iterative Mean-Shift at inference</ h4 >
143
+ < br >
144
+ < h4 > Validation: Estimating the number of clusters</ h4 >
145
+ < center >
146
+ < img src ="images/validation.png " style ="width:80%; margin-top:10px; margin-bottom:20px; ">
147
+ </ center >
148
+ During training, we estimate the number of clusters K at the end of every epoch for a fairer and efficient validation. We apply
149
+ agglomerative clustering on the validation set to obtain clustering results for different number of clusters. Among them, the
150
+ highest clustering accuracy on the labeled images is recorded as the validation performance, and the corresponding number of
151
+ clusters is determined as the estimated number of clusters.
152
+ < br >
153
+ < br >
154
+ < br >
155
+ < h4 > Inference: Iterative Mean-Shift (IMS)</ h4 >
156
+ < center >
157
+ < img src ="images/inference.png " style ="width:80%; margin-top:10px; margin-bottom:20px; ">
158
+ </ center >
114
159
To improve the final clustering property of the embeddings, we perform multi-step mean shift on the embeddings before
115
160
agglomerative clustering. Starting from the initial embeddings from the learned encoder, we update them to $t$-step
116
161
mean-shifted embeddings until the clustering accuracy on the labeled data converges. The final cluster assignment is obtained
0 commit comments