While the authors claim to be doing a convolution, even going as far as naming their tool CellCnn, they are doing a stride of 1 which is essentially a dense layer. Further, you say they used the "most discriminative filters" but the tensor sort they do I believe will result in sorting by the first filter, then second, etc. So essentially the sorting is almost entirely driven by the first filter.