-
Notifications
You must be signed in to change notification settings - Fork 12.2k
Add Conv2d for CPU #14388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add Conv2d for CPU #14388
Conversation
dd75e85
to
29b85f1
Compare
Now works for f16 operations also, speed up is less. Additional memory is consistent with
Tested further by changing all
|
Just tested it with stable-diffusion.cpp, couldn't notice any significant difference in the inference speed, but the memory savings during the VAE phase are huge, about 2.6x less memory usage: 640 MB vs 1664 MB for 512x512 and 2560 MB vs 6656 MB for 1024x1024 |
@rmatif thank you for testing! |
Looks great! One thing to consider is to reuse/merge with the existing im2col code for the stand-alone OP_IM2COL, so there aren't 2 implementations doing essentially the same thing. However when I had a brief go at this, it turned out to require quite a lot of boilerplate code and made things somewhat less readable, so might not be worth it |
Yes I also thought of doing that and came to the same conclusion, it would be replacing just those for loops with a lot of boilerplate |
You're correct, at high resolution the attention blocks are the most dominant in compute time, but I guess the speedup is not that big to see it propagate at low resolution or even in the VAE phase where we have fewer attention blocks |
@@ -984,7 +985,7 @@ static const char * GGML_OP_NAME[GGML_OP_COUNT] = { | |||
"OPT_STEP_ADAMW", | |||
}; | |||
|
|||
static_assert(GGML_OP_COUNT == 84, "GGML_OP_COUNT != 84"); | |||
static_assert(GGML_OP_COUNT == 85, "GGML_OP_COUNT != 85"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up, even though GitHub gives no indication of it (since it's identical), this now conflicts with master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! I ran into the same issue on a prior rebase of this PR. I will do a rebase once this is ready to merge
@@ -1081,7 +1083,7 @@ static const char * GGML_OP_SYMBOL[GGML_OP_COUNT] = { | |||
"adamw(x)", | |||
}; | |||
|
|||
static_assert(GGML_OP_COUNT == 84, "GGML_OP_COUNT != 84"); | |||
static_assert(GGML_OP_COUNT == 85, "GGML_OP_COUNT != 85"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too.
Follow up #14320, added a tilled Im2col + GEMM approach for kernels upon @Acly's excellent suggestion. This is inspired from his branch which does the CHWN conv, this one does the WHCN one, which is a bit different.
Only works for F32, but sizeable speedups. I will work on the f16 version as well, but putting this up for reviewPerformance
EDIT: added f16 version also