Skip to content

【consultπŸ™‹β€β™‚οΈγ€‘ some confusion about CustomVisionTransformer.forward_features #130

Answered by lukas-blecher
TITC asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, I'll try to answer your question to your satisfaction.

  1. The cls token was first introduced in the BERT paper https://arxiv.org/abs/1810.04805.
    It's short for classification token and basically marks the start of a sequence. I just adopted the system.
  2. Ok what is your question here? So the problem I was facing is that images of equations come in many different resolutions. The naive way to deal with this would be to pad all images to a fixed image size (here self.width x self.height). But then there would be much computation done without any meaning.
    That's why I'm just using the smallest bounding box that encapsulates every relevant pixel. But this is only generating a new problem: How…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@lukas-blecher
Comment options

@TITC
Comment options

TITC Apr 20, 2022
Collaborator Author

@TITC
Comment options

TITC Apr 21, 2022
Collaborator Author

@lukas-blecher
Comment options

@TITC
Comment options

TITC Apr 22, 2022
Collaborator Author

Answer selected by TITC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #129 on April 19, 2022 17:42.