Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

File 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx' is Not Found #185

Open
lostnighter opened this issue Feb 14, 2022 · 2 comments
Open

File 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx' is Not Found #185

lostnighter opened this issue Feb 14, 2022 · 2 comments

Comments

@lostnighter
Copy link

Hi! This file is needed for pretraining on Large corpus, but is not found. Could you share this file?

Thanks!

@jontooy
Copy link

jontooy commented Feb 16, 2022

Hi lostnighter,

I had the same problem when using OSCAR to fine-tune on image captioning with a custom dataset. I used this function to genereate the '.lineidx'-file

I guess that in your case you have a 'coco_flickr30k_googlecc_gqa_sbu_oi.tsv' file. If that is true, you should try the function above, with parameters:

`
filein, idxout = 'coco_flickr30k_googlecc_gqa_sbu_oi.tsv', 'coco_flickr30k_googlecc_gqa_sbu_oi.lineidx'

Let me know if it works!
`

@lostnighter
Copy link
Author

Hi jontooy,
I download this file via azcopy as follows:
path/to/azcopy copy https://biglmdiag.blob.core.windows.net/vinvl/pretrain_corpus/coco_flickr30k_googlecc_gqa_sbu_oi.lineidx ./ --recursive

This url is not given. I just try it out.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants