Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a local storage of uniform constants values to the OpenGL renderer #8167

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

aglitchman
Copy link
Contributor

@aglitchman aglitchman commented Oct 20, 2023

This PR adds a local storage of uniform constant values to the OpenGL renderer. The point is not to send the same constants unless they differ between draw calls.

Fixes #8166

PR checklist

  • Code
    • Add engine and/or editor unit tests.
    • New and changed code follows the overall code style of existing code
    • Add comments where needed
  • Documentation
    • Make sure that API documentation is updated in code comments
    • Make sure that manuals are updated (in github.com/defold/doc)
  • Prepare pull request and affected issue for automatic release notes generator
    • Pull request - Write a message that explains what this pull request does. What was the problem? How was it solved? What are the changes to APIs or the new APIs introduced? This message will be used in the generated release notes. Make sure it is well written and understandable for a user of Defold.
    • Pull request - Write a pull request title that in a sentence summarises what the pull request does. Do not include "Issue-1234 ..." in the title. This text will be used in the generated release notes.
    • Pull request - Link the pull request to the issue(s) it is closing. Use on of the approved closing keywords.
    • Affected issue - Assign the issue to a project. Do not assign the pull request to a project if there is an issue which the pull request closes.
    • Affected issue - Assign the "breaking change" label to the issue if introducing a breaking change.
    • Affected issue - Assign the "skip release notes" is the issue should not be included in the generated release notes.

@aglitchman
Copy link
Contributor Author

Test project - https://github.com/aglitchman/defold-drawcalls-test
It contains a simple scene with 50 models + 50 sprites + 1 label = 101 draw call per frame.

Before PR: 2994 calls.
After PR: 2643 calls.

What is optimized is shown in the following pseudo-code from RenderDoc. I got it from the test project, and it's a single draw call of the 3d model. I pointed the optimized stuff with the commented lines:

glUseProgram(Program 112)                                                         
glUniform4fv(Program 112, { 1.00, 1.00, 1.00, 1.00 })                             
glUniformMatrix4fv(Program 112, False, float[16])                                 
glUniformMatrix4fv(Program 112, False, float[16])                                 
// glUniformMatrix4fv(Program 112, False, float[16])                              
// glUniformMatrix4fv(Program 112, False, float[16])                              
// glUniform4fv(Program 112, { 1.00, 1.00, 1.00, 1.00 })                          
// glUniform4fv(Program 112, { 1.00, 1.00, 1.00, 1.00 })                          
glActiveTexture(GL_TEXTURE0)                                                      
glBindTexture(GL_TEXTURE_2D, Texture 113)                                         
glTexParameteri(Texture 113, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST)     
glTexParameteri(Texture 113, GL_TEXTURE_MAG_FILTER, GL_LINEAR)                    
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE)                 
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE)                 
glUniform1i(Program 112, { 0 })                                                   
glTexParameteri(Texture 113, GL_TEXTURE_MIN_FILTER, GL_LINEAR)                    
glTexParameteri(Texture 113, GL_TEXTURE_MAG_FILTER, GL_LINEAR)                    
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE)                 
glTexParameteri(Texture 113, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE)                 
glBindBuffer(GL_ARRAY_BUFFER, Buffer 108)                                         
glEnableVertexAttribArray(Vertex Array 68, 0)                                     
glVertexAttribPointer(Vertex Array 68, Buffer 108, 0, 3, GL_FLOAT, False, 68, 0)  
glEnableVertexAttribArray(Vertex Array 68, 2)                                     
glVertexAttribPointer(Vertex Array 68, Buffer 108, 2, 3, GL_FLOAT, False, 68, 12) 
glEnableVertexAttribArray(Vertex Array 68, 1)                                     
glVertexAttribPointer(Vertex Array 68, Buffer 108, 1, 2, GL_FLOAT, False, 68, 52) 
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, Buffer 109)                                 
glDrawElements(6)                                                                 
glDisableVertexAttribArray(Vertex Array 68, 0)                                    
glDisableVertexAttribArray(Vertex Array 68, 2)                                    
glDisableVertexAttribArray(Vertex Array 68, 1)                                    
glBindBuffer(GL_ARRAY_BUFFER, No Resource)                                        
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, No Resource)                                
glActiveTexture(GL_TEXTURE0)                                                      
glBindTexture(GL_TEXTURE_2D, No Resource)                                         

Is it worth it? In fact, it would be great to measure performance. But how? Via ARB_timer_query?..

@AGulev
Copy link
Contributor

AGulev commented Oct 20, 2023

xcode open gl profiler (a separate tool) and trace function was pretty nice for it, but it seems like it's fully broken since I used it for the last time (at least on sonoma)
https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/OpenGLProfilerUserGuide/GettedStarted/GettingStarted.html#//apple_ref/doc/uid/TP40006475-CH20-SW1

maybe try https://apitrace.github.io istead

@britzl britzl requested review from Jhonnyg and JCash October 21, 2023 06:39
Copy link
Contributor

@JCash JCash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally good, but left some feedback for minor changes.

@@ -2267,14 +2274,51 @@ static void LogFrameBufferError(GLenum status)
CHECK_GL_ERROR;
}

static bool StoreConstantValue(HContext context, const void* data, const size_t size, HUniformLocation base_location)
{
assert(size <= 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we don't do defensive coding when the engine runs. While this file contains them, I'd say if's more for things like init/exit of engine and creation of the global state.

Numbers like this should generally be checked before this function. Do we even need to?

It helps us keep the engine size smaller too.

assert(size <= 64);

OpenGLProgram* program = (OpenGLProgram*)((OpenGLContext*)context)->m_ActiveProgram;
assert(program);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

OpenGLProgram* program = (OpenGLProgram*)((OpenGLContext*)context)->m_ActiveProgram;
assert(program);

dmArray<OpenGLConstantValue>& values = program->m_ConstantValues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be reset to 0, when it is disabled.
Otherwise we'll hit the memcmp() code path every time.

Which could ofc be an option to always use memcmp() to find out if it changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. something like this:

bool StoreConstantValue(in_value)
{
    ... 
    dst.changed = memcmp(dst.value, in_value) != 0;
    memcpy(dst.value, in_value, size);
    return dst.changed;
}

Also, in terms of struct size / complexity, I wonder if a simple hash32 would suffice?

E.g.:

bool StoreConstantValue(in_value)
{
    ... 
    uint32 old_checksum = dst.checksum;
    dst.checksum = dmHashBuffer32(in_value, size);
    return old_checksum != dst.checksum;
}

It takes a few extra cycles I would think, but also simplifies the logic, and saves some storage (admittedly not much)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a simple hash32 would suffice?

I thought about that, but hash functions have a small chance of giving the same values, right? It seems to me that users can be paranoid about this. So I'll have to make an option in game.project like "optimize uniform constants". So you can toggle it on and off if you are in doubt.

@aglitchman
Copy link
Contributor Author

@JCash Thanks for the comments! I'll take everything into consideration and rework it. But first I will test the rationale of this PR. I mean I need to see through apitrace or some similar app that the rendering time is reduced. Then I will finish the PR or cancel it.

PS Unfortunately, I won't have time to do benchmarks and to finish the PR before the end of Hacktoberfest (my free time ran out), but that's okay.

@britzl
Copy link
Contributor

britzl commented Dec 20, 2023

PS Unfortunately, I won't have time to do benchmarks and to finish the PR before the end of Hacktoberfest (my free time ran out), but that's okay.

@aglitchman have you had a chance to work on this?

@aglitchman
Copy link
Contributor Author

@aglitchman have you had a chance to work on this?

Sadly, not yet 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise OpenGL renderer by utilizing OpenGL's state machine
4 participants