Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static object instances with C++ destructors depend on atexit and eat up RAM #72

Open
dpgeorge opened this issue Jan 13, 2016 · 19 comments

Comments

@dpgeorge
Copy link
Contributor

I recently upgraded my arm-none-eabi-gcc toolchain and now I have a new libc_nano.a library. With this new library the MicroPython port runs out of memory (again) because libc_nano.a includes a new 264 byte structure in the bss.

This new structure is for handling atexit behaviour. They (newlib devs) recently changed atexit so that it uses static RAM to register on-exit functions, instead of dynamically allocating with malloc. See here for background: https://sourceware.org/ml/newlib/2015/msg00936.html

atexit is used by the DAL because it has static object instances which have C++ destructors (eg ManagedString::EmptyString, MicroBitImage::EmtpyImage). When the program exits these destructors should be called (according to the C++ standard). Therefore when these static instances are initialised they register a function with atexit (cxa_atexit via __aeabi_atexit). The big 264 byte data structure in the bss is used to store these function pointers.

Note that even before this change there was 140 bytes used by atexit in the bss. So now there is 140+264 = way too much!

It seems this is a hard problem to solve/workaround. If possible it would be good to provide a dummy implementation of cxa_atexit, or __aeabi_atexit, which overrides the one from libc_nano.a.

Thoughts?

@dpgeorge
Copy link
Contributor Author

Surprisingly there is an easy fix for this: __register_exitproc is a weak symbol and if you override it then it doesn't pull in the big data structures. So I put the following definition into my main.cpp file and I reclaim back about 400 bytes RAM:

void __register_exitproc() {
}

@finneyj do you think that's the best approach? I can do it in the MicroPython code so it doesn't need to be part of the DAL.

@finneyj
Copy link
Contributor

finneyj commented Jan 13, 2016

thanks @dpgeorge - looks good. We should consider if there would be any gains by putting this in the DAL side for other languages when we have a moment, but this sounds like a good approach - suggest you go ahead wit this for now. We're running on minutes ot hit other deadlines at the moment (keep the bug reports coming though!)

@mmoskal
Copy link
Contributor

mmoskal commented Jan 13, 2016

Micro:bit programs never exit, so I'm sure all languages would be delighted to have the 400 bytes back. Seems like a no-brainer to me!

@remay
Copy link
Contributor

remay commented Jan 13, 2016

True that main() never returns. Technically I can write code in app_main() that calls exit(), but not sure I can see a reason to ever do this though. I'd certainly like the extra 400 bytes, as to squeeze in the enormous S130 BLE stack I'm down to a heap of only 512 bytes at the moment, and regularly hit out-of-memory conditions.

@dpgeorge, thanks for pointing this out. If I upgrade my toolchain does this result in overall more available memory? Is there any other compelling reason to upgrade my toolchain?

@finneyj
Copy link
Contributor

finneyj commented Jan 13, 2016

Absolutely - we definitely all would like to see more memory! Apologies for any confusion. I was just trying to say that James and I have a time critical backlog still to work through, so testing this through on micropython then upstreaming if all seems well felt like an easy win.

@mmoskal, @remay, if either of you guys have a few spare cycles and can try this out, measure how much memory it does free up and do a few basic smoke tests i'd love to see pull request.

@dpgeorge
Copy link
Contributor Author

You'll only get about 140 bytes back if you're using a non-bleeding-edge version of the toolchain. Still, that's relatively a lot.

@jamesadevine
Copy link
Contributor

jamesadevine commented May 11, 2016

@dpgeorge I've recently tried experimenting with this, and couldn't find the __register_exitproc symbol using arm-none-eabi-nm.

What are the steps to reproduce?

@dpgeorge
Copy link
Contributor Author

What are the steps to reproduce?

It depends heavily on your compiler and C++ library. The functions may be called something else. Look through the .map file and find everything that's in the BSS. They will be named and you can see if there is anything suspiciously large.

@jamesadevine
Copy link
Contributor

Hi @dpgeorge, thanks for that, I believe i have found my equivalent resident at exit overhead:

.bss. _on_exit_args_instance
                0x0000000000000000      0x108 .......gcc/arm-none-eabi/5.3.1/../../../../arm-none-eabi/lib/armv6-m/libc_nano.a(lib_a-on_exit_args.o)

This is on GCC 5.3.1, 4.9.x did not have this overhead as you stated.

The way I managed to remove this symbol was to provide a barebones __cxa_atexit function in my main.cpp:

extern "C"{
    void __cxa_atexit() {
    }
}

This overrides the usual symbol, and doesn't perform the step that generates _on_exit_args_instance. Overriding the __register_exitproc did not work.

I've never really delved this much into GCC before, so i'm unable to conclude whether my solution is correct...

I feel that having different solutions is a bad thing, and we should endeavour to find a solution that covers the general case.

@dpgeorge
Copy link
Contributor Author

I feel that having different solutions is a bad thing, and we should endeavour to find a solution that covers the general case.

I don't think that's possible, due to these functions (__cxa_atext etc) being internal and subject to change between compiler/library versions.

The alternative is to not use destructors, or have separate classes for objects that are statically allocated.

@jamesadevine
Copy link
Contributor

jamesadevine commented May 11, 2016

@dpgeorge is a simpler option to just override __aeabi_atexit?

I've been doing some investigation, and I noticed that the LPC platforms on mbed override this in their startup code.

I've replicated this in a main.cpp, and it removes the 264 byte RAM overhead:

extern "C"{
    int __aeabi_atexit(void *object, void (*destructor)(void *), void *dso_handle) {
        return 0;
    }
}

Does this work with your configuration?

@dpgeorge
Copy link
Contributor Author

Does this work with your configuration?

Nope.

@jamesadevine
Copy link
Contributor

@dpgeorge Can you verify that your .bss section actually changes size based on the application of your symbol definition?

Even though the symbol disappears from my .map file, the overall bss for the hex does not change.

Here is my sample program:

MicroBit uBit;


extern "C"{
    int __aeabi_atexit(void *object, void (*destructor)(void *), void *dso_handle) {
        return 0;
    }

    void __cxa_atexit() {
    }

    void __register_exitproc() {
    }
}

int main()
{
    // Initialise the micro:bit runtime.
    uBit.init();

    // Insert your code here!
    uBit.display.scroll("HELLO WORLD! :)");

    // If main exits, there may still be other fibers running or registered event handlers etc.
    // Simply release this fiber, which will mean we enter the scheduler. Worse case, we then
    // sit in the idle task forever, in a power efficient sleep.
    release_fiber();
}

With the extern'd symbols, my arm-none-eabi-size gives:

   text    data     bss     dec     hex filename
  73604     172    2148   75924   12894 build/bbc-microbit-classic-gcc/source/microbit-samples

And without:

   text    data     bss     dec     hex filename
  73636     172    2148   75956   128b4 build/bbc-microbit-classic-gcc/source/microbit-samples

@dpgeorge
Copy link
Contributor Author

Using the same example program that you have (from a fresh clone of microbit-samples) I get:

With dummy functions:
   text    data     bss     dec     hex filename
  71620     172    2144   73936   120d0 build/bbc-microbit-classic-gcc/source/microbit-samples

Without dummy functions:
   text    data     bss     dec     hex filename
  72112     172    2552   74836   12454 build/bbc-microbit-classic-gcc/source/microbit-samples

So, yes, there is a difference. A saving of nearly 500 bytes code and 308 bytes RAM.

gcc version: arm-none-eabi-gcc (Arch Repository) 5.3.0

@jamesadevine
Copy link
Contributor

@dpgeorge Thanks, sorry to come across pedantic. @jaustin and ARM have filed a bug with newlib, and I am just trying to get as much info as possible! 😄

@jaustin
Copy link
Contributor

jaustin commented May 17, 2016

@jamesadevine @dpgeorge I wonder if the 'Arch Repository' part is the key here? Is your newlib configured differently there?

I'm getting my toolchains from https://launchpad.net/gcc-arm-embedded - James, what about you?

@jamesadevine
Copy link
Contributor

@jaustin same!

@dpgeorge
Copy link
Contributor Author

The newlib package is automatically installed for me. Pacman reports: local/arm-none-eabi-newlib 2.4.0-1

@jamesadevine
Copy link
Contributor

@dpgeorge Here's a reply, courtesy of @jaustin

Sorry for the delay, this ticket got lost somewhere underneath the big
pile of "other stuff". I have been looking again at it today and I had
the same issue you did with overwriting '__register_exitproc'. However,
overwriting __on_exit_args did work save half the bss on a small example.

A mere global 'char __on_exit_args;' will do.

Let me know if you would prefer some other approach.

Cheers,
Andre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants