New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a local storage of uniform constants values to the OpenGL renderer #8167
base: dev
Are you sure you want to change the base?
Conversation
Test project - https://github.com/aglitchman/defold-drawcalls-test Before PR: 2994 calls. What is optimized is shown in the following pseudo-code from RenderDoc. I got it from the test project, and it's a single draw call of the 3d model. I pointed the optimized stuff with the commented lines: glUseProgram(Program 112)
glUniform4fv(Program 112, { 1.00, 1.00, 1.00, 1.00 })
glUniformMatrix4fv(Program 112, False, float[16])
glUniformMatrix4fv(Program 112, False, float[16])
// glUniformMatrix4fv(Program 112, False, float[16])
// glUniformMatrix4fv(Program 112, False, float[16])
// glUniform4fv(Program 112, { 1.00, 1.00, 1.00, 1.00 })
// glUniform4fv(Program 112, { 1.00, 1.00, 1.00, 1.00 })
glActiveTexture(GL_TEXTURE0)
glBindTexture(GL_TEXTURE_2D, Texture 113)
glTexParameteri(Texture 113, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST)
glTexParameteri(Texture 113, GL_TEXTURE_MAG_FILTER, GL_LINEAR)
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE)
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE)
glUniform1i(Program 112, { 0 })
glTexParameteri(Texture 113, GL_TEXTURE_MIN_FILTER, GL_LINEAR)
glTexParameteri(Texture 113, GL_TEXTURE_MAG_FILTER, GL_LINEAR)
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE)
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE)
glBindBuffer(GL_ARRAY_BUFFER, Buffer 108)
glEnableVertexAttribArray(Vertex Array 68, 0)
glVertexAttribPointer(Vertex Array 68, Buffer 108, 0, 3, GL_FLOAT, False, 68, 0)
glEnableVertexAttribArray(Vertex Array 68, 2)
glVertexAttribPointer(Vertex Array 68, Buffer 108, 2, 3, GL_FLOAT, False, 68, 12)
glEnableVertexAttribArray(Vertex Array 68, 1)
glVertexAttribPointer(Vertex Array 68, Buffer 108, 1, 2, GL_FLOAT, False, 68, 52)
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, Buffer 109)
glDrawElements(6)
glDisableVertexAttribArray(Vertex Array 68, 0)
glDisableVertexAttribArray(Vertex Array 68, 2)
glDisableVertexAttribArray(Vertex Array 68, 1)
glBindBuffer(GL_ARRAY_BUFFER, No Resource)
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, No Resource)
glActiveTexture(GL_TEXTURE0)
glBindTexture(GL_TEXTURE_2D, No Resource) Is it worth it? In fact, it would be great to measure performance. But how? Via ARB_timer_query?.. |
xcode open gl profiler (a separate tool) and trace function was pretty nice for it, but it seems like it's fully broken since I used it for the last time (at least on sonoma) maybe try https://apitrace.github.io istead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally good, but left some feedback for minor changes.
@@ -2267,14 +2274,51 @@ static void LogFrameBufferError(GLenum status) | |||
CHECK_GL_ERROR; | |||
} | |||
|
|||
static bool StoreConstantValue(HContext context, const void* data, const size_t size, HUniformLocation base_location) | |||
{ | |||
assert(size <= 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we don't do defensive coding when the engine runs. While this file contains them, I'd say if's more for things like init/exit of engine and creation of the global state.
Numbers like this should generally be checked before this function. Do we even need to?
It helps us keep the engine size smaller too.
assert(size <= 64); | ||
|
||
OpenGLProgram* program = (OpenGLProgram*)((OpenGLContext*)context)->m_ActiveProgram; | ||
assert(program); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
OpenGLProgram* program = (OpenGLProgram*)((OpenGLContext*)context)->m_ActiveProgram; | ||
assert(program); | ||
|
||
dmArray<OpenGLConstantValue>& values = program->m_ConstantValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be reset to 0, when it is disabled.
Otherwise we'll hit the memcmp() code path every time.
Which could ofc be an option to always use memcmp() to find out if it changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. something like this:
bool StoreConstantValue(in_value)
{
...
dst.changed = memcmp(dst.value, in_value) != 0;
memcpy(dst.value, in_value, size);
return dst.changed;
}
Also, in terms of struct size / complexity, I wonder if a simple hash32 would suffice?
E.g.:
bool StoreConstantValue(in_value)
{
...
uint32 old_checksum = dst.checksum;
dst.checksum = dmHashBuffer32(in_value, size);
return old_checksum != dst.checksum;
}
It takes a few extra cycles I would think, but also simplifies the logic, and saves some storage (admittedly not much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a simple hash32 would suffice?
I thought about that, but hash functions have a small chance of giving the same values, right? It seems to me that users can be paranoid about this. So I'll have to make an option in game.project like "optimize uniform constants". So you can toggle it on and off if you are in doubt.
@JCash Thanks for the comments! I'll take everything into consideration and rework it. But first I will test the rationale of this PR. I mean I need to see through apitrace or some similar app that the rendering time is reduced. Then I will finish the PR or cancel it. PS Unfortunately, I won't have time to do benchmarks and to finish the PR before the end of Hacktoberfest (my free time ran out), but that's okay. |
@aglitchman have you had a chance to work on this? |
Sadly, not yet 🙌 |
This PR adds a local storage of uniform constant values to the OpenGL renderer. The point is not to send the same constants unless they differ between draw calls.
Fixes #8166
PR checklist