Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples: Add test run for lv2v #283

Merged
merged 31 commits into from
Jan 14, 2025
Merged

Conversation

victorges
Copy link
Member

@victorges victorges commented Nov 20, 2024

No description provided.

Wasn't able to run due to go environment but
it should be close to working
@victorges victorges changed the title examples: Add possible example for lv2v examples: Add test run for lv2v Nov 20, 2024
@varshith15 varshith15 marked this pull request as ready for review December 5, 2024 20:39
@varshith15 varshith15 requested a review from rickstaa as a code owner December 5, 2024 20:39
Copy link
Member Author

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sending some questions from as far as i got, i'll finish this tmr!

Comment on lines 40 to 52
- name: Build Docker images
env:
MODEL_ID: ${{ matrix.model_config.id }}
run: |
cd runner
docker build -t livepeer/ai-runner:live-base -f docker/Dockerfile.live-base .
if [ "${MODEL_ID}" = "noop" ]; then
docker build -t livepeer/ai-runner:live-app-noop -f docker/Dockerfile.live-app-noop .
else
docker build -t livepeer/ai-runner:live-base-${MODEL_ID} -f docker/Dockerfile.live-base-${MODEL_ID} .
docker build -t livepeer/ai-runner:live-app-${MODEL_ID} -f docker/Dockerfile.live-app__PIPELINE__ --build-arg PIPELINE=${MODEL_ID} .
fi
cd ..
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think this will make the test take waaaay longer like 30 minutes.

Some ideas:

  • Somehow run this test after the separate docker build workflow, which also has some optimizations to try only building what's necessary (like it skips the base images if no changes). Ideally we wouldn't have it in the same workflow file tho, but I'm not sure if it's possible to make cross-workflow dependencies.
  • Simplify the build here always skipping the base. Meaning that we would only copy the app code into the specific pipeline base image that has been published to docker hub (docker would automatically pull the base). This is less perfect in the sense that, when we do change the base images, it won't pick up the change, but at least it won't take 30 minutes each to run.
  • Maybe a little more sophisticated would be if we could start by first pulling the app image from dockerhub. Then all these builds would be optimized using the docker layer caches. Not sure if this would work tho, I've seen the cache not working even after having pulled base :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • if we use the same node each time, which is the case, the layers are cached anyway right? but yeah if docker build workflow already exists then can just try to use that
  • i've added base because we have to account for changes in req packages -- maybe can only run base docker build only when dockerfiles or req files are updated
  • i think its important that there is caching, otherwise any optimizations won't cut it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe can only run base docker build only when dockerfiles or req files are updated

Yeah we do that on the docker build workflow. But it's kinda of a pain TBH, pretty complex, and it hurts even more to repeat all of it here 💀
Would be great if we could reuse somehow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the self-hosted runners, we clean up disk frequently, so that we dont run out of disk space for other jobs.

this means the intermediate layers etc. are also gone. as victor mentioned above (and i mentioned in the discord thread), best would be to trigger this workflow after docker build has finished and pull those images for testing.

Copy link
Collaborator

@varshith15 varshith15 Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hjpotter92 can we not wipe the models dir if possible? that would save a lot of time, can just link the persistent path to the req path
@victorges thoughts?

strategy:
max-parallel: 1
matrix:
model_config:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use pipeline instead of model everywhere in this file. It is only called model on the legacy code since we reused the existing model_id param to change which live pipeline to run, but ideally we keep that "misnaming" on the minimum places possible (and always with a comment disclaimer)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 59 to 61
- name: Clean up runner dockers
run: |
docker ps -aq --filter name=live-video-to-video_${MODEL_ID}* | xargs -r docker rm -f || true
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these leak? 😳

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't deploy different dockers with the same open ports, live-video-to-video uses 8900

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah but why would there be a running docker when the workflow starts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same workflow runs for all the models (noop, livepotrait, etc) one after the other, after each model run have to delete the docker

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok got it

Copy link
Member Author

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

elapsed := time.Since(startTime)
sleepTime := interval - elapsed
if sleepTime > 0 {
time.Sleep(sleepTime)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to refactor, but just a tip for the future the idiomatic way of doing this in go would be with a time.Ticker. You'd just have another select case for it here and you'd get consistent iterations every interval

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

image.png Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used? thought we only had the example_data now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my bad, was part of testing

Comment on lines +301 to +303
optimizationFlags := worker.OptimizationFlags{
"STREAM_PROTOCOL": "zeromq",
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was so confused. I always thought these "optimization flags" were something for docker not the app lol

@@ -18,7 +18,7 @@ RUN pyenv install $PYTHON_VERSION && \
ARG PIP_VERSION=23.3.2

# Install any additional Python packages
RUN pip install uvicorn==0.30.0 fastapi==0.111.0 numpy==1.26.4 torch==2.5.1 diffusers==0.30.0 transformers==4.43.3 aiohttp==3.10.9 pyzmq==26.2.0
RUN pip install uvicorn==0.30.0 fastapi==0.111.0 numpy==1.26.4 torch==2.5.1 diffusers==0.30.0 transformers==4.43.3 aiohttp==3.10.9 pyzmq==26.2.0 pynvml==12.0.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already fixed this on main, you can revert this here!

Copy link
Member Author

@victorges victorges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@varshith15 varshith15 merged commit 9485d75 into main Jan 14, 2025
11 checks passed
@varshith15 varshith15 deleted the feat/test-live-video-to-video branch January 14, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants