Fix: some bugs when text input reaches max_tokens of language_model #11669

jiangtann · 2024-04-26T09:55:55Z

Fix 1: RandomSamplingNegPos forget to remove gt_ignore_flags

在https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/datasets/transforms/formatting.py#L109 中，会根据valid_idx = np.where(results['gt_ignore_flags'] == 0)[0]来获得valid_idx，并通过valid_idx从gt_bboxes中取得有效的bboxes。

在https://github.com/open-mmlab/mmdetection/blob/dev-3.x/mmdet/datasets/transforms/text_transformers.py#L62 中，如果positive labels的token数之和超过了256，会随机删掉一部分gt_bboxes和gt_labels，但是没有对gt_ignore_flags进行相应的处理，导致在PackDetInputs中，通过valid_idx从gt_bboxes中取得有效的bboxes会出现index越界error。

参考RandomCrop中处理gt_ignore_flags的方式对RandomSamplingNegPos进行相应的修改

CLAassistant · 2024-04-26T09:56:00Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

BIGWangYuDong · 2024-05-27T03:33:09Z

mmdet/datasets/transforms/text_transformers.py

@@ -252,4 +254,4 @@ def transform(self, results: dict) -> dict:
 else:
 text = results['text']
 results['text'] = list(text.values())
- return results


seems unnecessary change

删掉文件中的最后一个换行符导致git认为此处有差异，已复原此处的换行符

jiangtann · 2024-05-27T18:56:51Z

Fix 2: adding special token results in token overflow

经过RandomSamplingNegPos后，使用self.tokenizer.tokenize方法最多可能会得到256个token。

这是因为，https://github.com/open-mmlab/mmdetection/blob/main/mmdet/datasets/transforms/text_transformers.py#L50 当positive_label的长度达到了256个token时，这轮循环不会break，而是会保留。但这会导致使用self.language_model.forward或self.language_model.tokenizer.__call__处理text时，默认会加上[CLS]和[SEP]这两个分别放在开头和结尾的特殊token，导致text token的最大长度能达到258。

这会导致过bert时，由于设置了max_tokens，长度为258的token list会被截断，最后的'.'（token id为1012）和'[SEP]'（token id为102）会丢失，然后在https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/language_models/bert.py#L59 中，最后一个class的attention_mask和position_ids错误，此时position_ids为全0，即最后一个class对应的所有token之间不会做attention。

有两种修改方法，一种是在RandomSamplingNegPos限制text token最多是254，这样加上两个特殊token也不会overflow。

另一种就是我的修改方法，不加特殊token。由于使用了use_sub_sentence_represent，加不加特殊token对最后得到的text embedding没有任何影响，不加的话可以省两个token。我对bert.py文件的修改保留了默认加特殊token的兼容性。

mm-assistant bot assigned BIGWangYuDong Apr 26, 2024

BIGWangYuDong reviewed May 27, 2024

View reviewed changes

Fix: RandomSamplingNegPos forget to remove gt_ignore_flags

74124b6

jiangtann force-pushed the dev-3.x branch from b336f3c to 74124b6 Compare May 27, 2024 17:13

jiangtann changed the title ~~Fix: RandomSamplingNegPos forget to remove gt_ignore_flags~~ Fix: some bugs when text input reaches max_tokens of language_model May 27, 2024

Fix 2: adding special token results in token overflow

cc18632

jiangtann requested a review from BIGWangYuDong May 27, 2024 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: some bugs when text input reaches max_tokens of language_model #11669

Fix: some bugs when text input reaches max_tokens of language_model #11669

jiangtann commented Apr 26, 2024 •

edited

CLAassistant commented Apr 26, 2024

BIGWangYuDong May 27, 2024

jiangtann May 27, 2024

jiangtann commented May 27, 2024 •

edited

Fix: some bugs when text input reaches max_tokens of language_model #11669

Are you sure you want to change the base?

Fix: some bugs when text input reaches max_tokens of language_model #11669

Conversation

jiangtann commented Apr 26, 2024 • edited

CLAassistant commented Apr 26, 2024

BIGWangYuDong May 27, 2024

Choose a reason for hiding this comment

jiangtann May 27, 2024

Choose a reason for hiding this comment

jiangtann commented May 27, 2024 • edited

jiangtann commented Apr 26, 2024 •

edited

jiangtann commented May 27, 2024 •

edited