Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve app rendering performance #156

Closed
wants to merge 1 commit into from
Closed

Conversation

mbooth101
Copy link
Contributor

@mbooth101 mbooth101 commented Jun 17, 2024

Description

Since firmware 1.7.0 I noticed a performance regression. I wrote a simple app that does nothing except show the framerate and the time-per-frame on screen. Compare badge Firmwares 1.6.0 versus 1.7.0 in the footage below.

As you can see from the video, I've experienced a drop from around 10 fps to 8fps, which as a percentage, is quite significant.

Whilst investigating this, one potentially very low hanging optimisation fruit I found would be to avoid rendering all the applications in the foreground stack. Since the first thing every app does at the start of the render loop is clear the framebuffer, the top-most app in the stack will overwrite everything rendered by all the other apps below it in the stack.

The change I'm proposing in this PR renders only the top-most app in the foreground apps stack. This completely eliminates the time spend rendering the launcher (and any other app that happens to be in the stack) and more or less halves the time taken for the end_frame step because the ctx drawlists are much shorter. No change is made to the rendering behaviour of the always on top stack of apps.

This yields a noticeable improvement in the framerate of tildagon apps with realtime graphics. You can see from the Firmware 1.7.0-Patched footage below that the test app now consistently achieves around ~14 fps.

Firmware 1.6.0

Video of the app running:

framerate_one_six.mp4

Here's a screenshot in case the video doesn't play in your browser

image

Firmware 1.7.0

Video of the app running:

framerate_one_seven.mp4

Here's a screenshot in case the video doesn't play in your browser

image

Firmware 1.7.0-Patched

Video of the app running:

framerate_one_seven_patch.mp4

Here's a screenshot in case the video doesn't play in your browser

image

Test App Source

For reference this is the source of the app I am using to show the framerate

import app
import asyncio
import ota
import time

from app_components import tokens

from system.eventbus import eventbus
from system.patterndisplay.events import *
from system.scheduler.events import *

# Firmware version
FW_VER = ota.get_version()

# Font sizes
PERF_FONT = 6 * tokens.one_pt


class TestApp(app.App):

    def __init__(self):
        # Performance metrics
        self.current_t = 0
        self.last_t = 0
        self.accumulated_t = 0
        self.sample_idx = 0
        self.frametime_samples = [0,0,0,0,0,0,0,0]
        self.frametime = 0
        self.framerate_samples = [0,0,0,0,0,0,0,0]
        self.framerate = 0

        eventbus.on_async(RequestForegroundPushEvent, self._resume, self)
        eventbus.on_async(RequestForegroundPopEvent, self._pause, self)
        eventbus.emit(PatternDisable())

    async def _resume(self, event: RequestForegroundPushEvent):
        # Disable firmware led pattern when foregrounded
        eventbus.emit(PatternDisable())

    async def _pause(self, event: RequestForegroundPopEvent):
        # Renable firmware led pattern when backgrounded
        eventbus.emit(PatternEnable())

    async def run(self, render_update):
        self.last_t = time.ticks_us()
        while True:
            # Calculate time since last frame
            self.current_t = time.ticks_us()
            delta_t = time.ticks_diff(self.current_t, self.last_t)
            self.accumulated_t = self.accumulated_t + delta_t
            self.last_t = self.current_t

            # Calculate some performance metrics
            self.frametime_samples[self.sample_idx] = delta_t
            self.framerate_samples[self.sample_idx] = 1_000_000 / delta_t
            self.sample_idx = (self.sample_idx + 1) % 8
            if self.accumulated_t > 250_000:
                self.accumulated_t = self.accumulated_t - 250_000
                self.frametime = int(sum(self.frametime_samples) / 8)
                self.framerate = sum(self.framerate_samples) / 8

            # Perform the update
            if self.update(delta_t) is not False:
                await render_update()
            else:
                await asyncio.sleep(0.05)

    def update(self, delta_t):
        pass

    def draw(self, ctx):
        ctx.text_align = ctx.CENTER
        ctx.font_size = PERF_FONT
        ctx.rgb(0,0,0).rectangle(-120,-120,240,240).fill().rgb(1, 1, 1)
        ctx.move_to(0, -80).text(f"{self.framerate:.2f} fps")
        ctx.move_to(0, -60).text(f"{self.frametime} us")
        ctx.move_to(0, -40).text(f"FW: {FW_VER}")


# Set the entrypoint for the app launcher
__app_export__ = TestApp

Since the first thing every app does every frame is clear the
framebuffer, there seems to be no point in asking every foreground
application to render itself because only the top-most app in the stack
will overwrite everything rendered by all the other apps in the stack.

This change renders only the top-most app in the foreground apps stack,
completely eliminating the time spend rendering the launcher and more or
less halving the end_frame step because the ctx drawlists are much
shorter.

This should yield a noticable improvement in the framerate of graphical
tildagon apps
@MatthewWilkes
Copy link
Member

Hi @mbooth101!

#147 does this also, as well as reducing the duplication with the on top stack. Would you be happy with that implementation,or is there a benefit of this one? I'm happy to merge either.

@MatthewWilkes
Copy link
Member

Thank you for the test app, also. This is really handy :)

@mbooth101
Copy link
Contributor Author

Hi, I didn't notice there was already a PR for this sorry for the noise!

Yes your implementation is fine -- I didn't want to mess to the on_top stack in case there was a reason it was separate. Feel free to merge your change and close this one :-)

@MatthewWilkes
Copy link
Member

Will do. I've also applied a change to the framebuffer ram location that gets us another 18% or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants