-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
examples: Add test run for lv2v #283
Conversation
Wasn't able to run due to go environment but it should be close to working
…vepeer/ai-worker into feat/test-live-video-to-video
…vepeer/ai-worker into feat/test-live-video-to-video
…t/test-live-video-to-video
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sending some questions from as far as i got, i'll finish this tmr!
- name: Build Docker images | ||
env: | ||
MODEL_ID: ${{ matrix.model_config.id }} | ||
run: | | ||
cd runner | ||
docker build -t livepeer/ai-runner:live-base -f docker/Dockerfile.live-base . | ||
if [ "${MODEL_ID}" = "noop" ]; then | ||
docker build -t livepeer/ai-runner:live-app-noop -f docker/Dockerfile.live-app-noop . | ||
else | ||
docker build -t livepeer/ai-runner:live-base-${MODEL_ID} -f docker/Dockerfile.live-base-${MODEL_ID} . | ||
docker build -t livepeer/ai-runner:live-app-${MODEL_ID} -f docker/Dockerfile.live-app__PIPELINE__ --build-arg PIPELINE=${MODEL_ID} . | ||
fi | ||
cd .. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think this will make the test take waaaay longer like 30 minutes.
Some ideas:
- Somehow run this test after the separate docker build workflow, which also has some optimizations to try only building what's necessary (like it skips the base images if no changes). Ideally we wouldn't have it in the same workflow file tho, but I'm not sure if it's possible to make cross-workflow dependencies.
- Simplify the build here always skipping the base. Meaning that we would only copy the app code into the specific pipeline base image that has been published to docker hub (docker would automatically pull the base). This is less perfect in the sense that, when we do change the base images, it won't pick up the change, but at least it won't take 30 minutes each to run.
- Maybe a little more sophisticated would be if we could start by first pulling the app image from dockerhub. Then all these builds would be optimized using the docker layer caches. Not sure if this would work tho, I've seen the cache not working even after having pulled base :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if we use the same node each time, which is the case, the layers are cached anyway right? but yeah if docker build workflow already exists then can just try to use that
- i've added base because we have to account for changes in req packages -- maybe can only run base docker build only when dockerfiles or req files are updated
- i think its important that there is caching, otherwise any optimizations won't cut it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe can only run base docker build only when dockerfiles or req files are updated
Yeah we do that on the docker build workflow. But it's kinda of a pain TBH, pretty complex, and it hurts even more to repeat all of it here 💀
Would be great if we could reuse somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the self-hosted runners, we clean up disk frequently, so that we dont run out of disk space for other jobs.
this means the intermediate layers etc. are also gone. as victor mentioned above (and i mentioned in the discord thread), best would be to trigger this workflow after docker build has finished and pull those images for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hjpotter92 can we not wipe the models dir if possible? that would save a lot of time, can just link the persistent path to the req path
@victorges thoughts?
strategy: | ||
max-parallel: 1 | ||
matrix: | ||
model_config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use pipeline
instead of model
everywhere in this file. It is only called model on the legacy code since we reused the existing model_id
param to change which live pipeline to run, but ideally we keep that "misnaming" on the minimum places possible (and always with a comment disclaimer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw for more context: https://discord.com/channels/423160867534929930/1308107200522227712/1315767248710930465
- name: Clean up runner dockers | ||
run: | | ||
docker ps -aq --filter name=live-video-to-video_${MODEL_ID}* | xargs -r docker rm -f || true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these leak? 😳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't deploy different dockers with the same open ports, live-video-to-video uses 8900
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah but why would there be a running docker when the workflow starts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same workflow runs for all the models (noop, livepotrait, etc) one after the other, after each model run have to delete the docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
elapsed := time.Since(startTime) | ||
sleepTime := interval - elapsed | ||
if sleepTime > 0 { | ||
time.Sleep(sleepTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to refactor, but just a tip for the future the idiomatic way of doing this in go would be with a time.Ticker
. You'd just have another select
case for it here and you'd get consistent iterations every interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
image.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used? thought we only had the example_data
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah my bad, was part of testing
optimizationFlags := worker.OptimizationFlags{ | ||
"STREAM_PROTOCOL": "zeromq", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was so confused. I always thought these "optimization flags" were something for docker not the app lol
@@ -18,7 +18,7 @@ RUN pyenv install $PYTHON_VERSION && \ | |||
ARG PIP_VERSION=23.3.2 | |||
|
|||
# Install any additional Python packages | |||
RUN pip install uvicorn==0.30.0 fastapi==0.111.0 numpy==1.26.4 torch==2.5.1 diffusers==0.30.0 transformers==4.43.3 aiohttp==3.10.9 pyzmq==26.2.0 | |||
RUN pip install uvicorn==0.30.0 fastapi==0.111.0 numpy==1.26.4 torch==2.5.1 diffusers==0.30.0 transformers==4.43.3 aiohttp==3.10.9 pyzmq==26.2.0 pynvml==12.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already fixed this on main, you can revert this here!
(other changes are from `prettierd` hook)
…t/test-live-video-to-video
…t/test-live-video-to-video
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.