Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on large channels #71

Open
trifle opened this issue Jan 9, 2023 · 8 comments
Open

Fails on large channels #71

trifle opened this issue Jan 9, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@trifle
Copy link

trifle commented Jan 9, 2023

Hi, thanks for this very nice project. It's really polished and takes a lot of complexity out of yt-dlp, which is great.

I tried running yark on a couple of large-ish channels (10.000s of videos), and it seems to have some issues that yt-dlp also exhibits (if I recall correctly): The initial metadata download takes several hours and requires short of 10GB of RAM, then the subsequent downloads fail after only a handfull of videos.

I haven't looked into the details, but this might be due to some download tokens expiring, or perhaps it's just insufficient retries or so. In any case, it would benefit yark enormously to keep some sort of record regarding the videos that were already downloaded and to then continue archival in chunks, instead of trying to do all in one. yt-dlp has some of this functionality with --download-archive, but that doesn't have any "comfort features", i.e. no checking, pruning, displaying, or automatic management of that resume file.

@Owez
Copy link
Owner

Owez commented Jan 9, 2023

I've been looking for a way to make the metadata step smaller because it includes a lot of extra information which Yark's archive format doesn't use; I'll look into download-archive. At least theres --skip-metadata so you can do downloads in chunks if you get the metadata now :)

Can you send the error of the failed download? That might be a seperate bug

@Owez Owez added the bug Something isn't working label Jan 9, 2023
@Owez Owez mentioned this issue Jan 9, 2023
@trifle
Copy link
Author

trifle commented Jan 9, 2023

Thanks, that sounds great! There was a yt-dlp update today that might have helped, as I'm not seeing anything since tonight. The last instance was:

  • Downloading jVhTmcfQgx4, at 0.7%..
  • Unknown error whilst downloading videos, details below:
[download] Got error: Downloaded 70656 bytes, expected 10820198 bytes, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks
  • Sorry, failed to download {name}
Please file a bug report if you think this is a problem with Yark!

PS: {name} probably needs a f in front of the f-formatted string :)

@trifle
Copy link
Author

trifle commented Jan 9, 2023

Yup, this is really not worth a PR, so here's the line:

_err_msg(" • Sorry, failed to download {name}", True)

Needs a f.

@Owez
Copy link
Owner

Owez commented Jan 9, 2023

Thanks, that sounds great! There was a yt-dlp update today that might have helped, as I'm not seeing anything since tonight. The last instance was:

  • Downloading jVhTmcfQgx4, at 0.7%..
  • Unknown error whilst downloading videos, details below:
[download] Got error: Downloaded 70656 bytes, expected 10820198 bytes, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks, retrying in a few seconds..
  • Unknown error whilst downloading videos, details below:
ERROR: Did not get any data blocks
  • Sorry, failed to download {name}
Please file a bug report if you think this is a problem with Yark!

PS: {name} probably needs a f in front of the f-formatted string :)

Are you on yark v1.2.3? This should be fixed as of last night

Yup, this is really not worth a PR, so here's the line:

_err_msg(" • Sorry, failed to download {name}", True)

Needs a f.

Whoops yep, will add

@trifle
Copy link
Author

trifle commented Jan 9, 2023

Yes I updated yesterday but thought the error persisted - sorry if that was wrong! Regardless, some sort of chunked metadata + download stage would definitely be a nice addition to reduce memory consumption and make everything smoother.

BTW, I guess youtube doesn't like parallel downloads. I don't know your stance towards lots of external dependencies, but my experience with the fasteners library was quite positive. Might be worth adding a file lock like

lock = fasteners.InterProcessLock('yark.lockfile')

before performing yt-dlp options to prevent multiple instances.

@Owez
Copy link
Owner

Owez commented Jan 9, 2023

Yep definately. I've purposefully let yt-dlp download using default values so far to reduce complexity in these early versions, but chunking + parralelism (if youtube can do it) is needed.

I don't mind having extra dependencies as long as they're worth it compared to downloading and the vuln risk. When downloads are being processed Yark generates a full list of the videos to download and pipes it into yt-dlp so hopefully it'll be easy to parrelelise using yt-dlp's options or otherwise.

Downloading videos is safe to stop at any time so I think metadata is the main concern when tackling this issue because its all or nothing and has that issue with RAM.

@Owez
Copy link
Owner

Owez commented Jan 9, 2023

If I have time this should hopefully be in v1.3 in a months time :)

@Owez
Copy link
Owner

Owez commented Jan 14, 2023

A good extension to this would be to make it only download metadata for --videos=x maximums so you can quickly archive the last important year or two a massive channel

I don't wan't just downloaded videos to be archived so I think a x+x/2 maximum would be good. For example, if you say yark refresh foobar --videos=20 it will download 20 videos and get the metadata for 30 videos so you have some more history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: Todo
Development

No branches or pull requests

2 participants