Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping Screens and associated memory leaks #111

Open
nia-e opened this issue May 6, 2023 · 15 comments
Open

Dropping Screens and associated memory leaks #111

nia-e opened this issue May 6, 2023 · 15 comments
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested

Comments

@nia-e
Copy link
Collaborator

nia-e commented May 6, 2023

Currently we provide no way to delete a Screen after it's been instantiated. Having the user manually call a deletion function also feels... un-rusty and generally unwise as it would be easy to forget. Also, making the screen be deleted on Drop carries the risk of e.g. a user calling get_scr_act() multiple times and double-deleting a screen -> a double free -> undefined behaviour. I'm honestly kind of stumped as to what to do here; we can't store some kind of is_active: bool field either as every call to get_scr_act() returns a different Screen and there's no way to dynamically check on drop if there are other Screens pointing to the same underlying object.

The other option is to have the Display store a copy of a pointer to the last Screen it returned and simply refuse to return multiple identical Screens thus allowing us to delete on drop, but this does make things slightly more inconvenient (and can be trivially circumvented by changing the Screen and then going back, unless we store a collection).

Also, how do we prevent accessing child objects of a Screen (or any object) and thus a use-after-free post-deletion?

@nia-e nia-e added bug Something isn't working help wanted Extra attention is needed question Further information is requested labels May 6, 2023
@nia-e
Copy link
Collaborator Author

nia-e commented May 6, 2023

Thinking about it, we can also modify the API so as to require a reference to the Screen on widget/object creation (and not get just the active screen) and then store a PhantomData<&'a T> where T: NativeObject such that they will automatically be invalidated on drop.

@cydergoth
Copy link
Contributor

This seems to be the case for any Widget created, unfortunately my understanding of Rust FFI isn't yet good enough to help with this :-(

@cydergoth
Copy link
Contributor

Some thoughts on lifecycle management for LVGL resources in rust

The problem

LVGL is a C language libray which uses either the global malloc/free
or a custom allocator. When an LVGL resource is created memory is
allocated on the heap and a constant mutable pointer is returned
(ignoring static styles for the purposes of this discussion).

Some LVGL resources hold pointers to other LVGL resource, e.g. a
widget to the associated set of styles and containers to their
children. In this case, the referenced memory must have a lifecycle
equal to or greater than the containing resource.

When binding to LVGL as a rust FFI library, then the binding needs to
provide appropriate support for the rust user to manage these
resources. This includes creation, lifecycle and then destruction.

Errors to mitigate include:

  • Use before allocation. Pointers should be null or valid.
  • Dereference null pointer. Null pointers should not be usable in a context
    where they may be dereferenced
  • Use after free. A referenced resource must not be freed whilst existing
    references still point to it. References which point to a freed resource
    must be unusable (or null-ed) after free.
  • Memory leak. A referenced object should have a defined lifecycle and
    should not exceed that lifecycle
  • Multiple concurrent access from different threads

Usage patterns for resources in LVGL

LVGL has a number of different ways in which resources are used.

Global lifespan

The most common usage pattern in an embedded systems context is to
allocate LVGL displays, callbacks, screens, styles and widgets early
in the startup sequence of the embedded application and never release
them. This is sufficient for many embedded system UI requirements. In
this scenario, styles may be statically allocated in read-only memory
to save ram. For resources which require dynamic allocation, the lower
level lvgl_sys:: API may be used.

Where this patten is usable, it does not require free-ing of certain
LVGL resources. The common exception is animations. Pagination, tab or
carousel type requirements may be satisfied by pre-allocated
Fragments and the FragmentManager.

Dynamic lifespan

In this usage pattern, which may be required either to low ram
necessitating more fine grained management of ram resource, or in
scenarios where the UI layout and components may not be known at
firmware build time, then LVGL resources including screens may be
dynamically allocated and deallocated during the runtime of the
application.

Mixed lifespan

This mode is a combination of the Global and Dynamic lifespans in
which some resources exist for the entire lifetime of the application
and some are created and destroyed dynamically.

Resource ownership

In LVGL, a widget is (with one exception) always parented by another
Widget. Parentage is a one-to-many relationship, and when the parent
widget is deallocated then the children are also deallocated in a
cascade delete pattern.

For styles, there is a many-to-many relationship as a Widget may
reference many styles for different scenarios, and a style may be
referenced by many widgets. Fonts also have a many-to-many
relationship, but are usually stored in read-only memory.

Images may be either dynamically allocated or stored in ROM.

Thread safety

AFAIK, LVGL is not thread safe. All mutation of LVGL resources
should be done in critical regions which lock the entire LVGL domain.

Potential approaches

When binding an FFI resource to a rust library, it is common to wrap
each external pointer in a corresponding rust struct. In the
lv_binding_rust current implementation, this is done via the
bindings rust library and via a code generator which uses proc
macros to generate the rust structs and trait impls.

When a rust struct is associated with an FFI pointer, then the
lifecycle of the rust struct must be tied to the lifecycle of the FFI
pointer. Usually that means that the rust struct exposes a ::new()
function which calls the FFI library to create the pointer, and a
Drop::drop() implementation which frees the FFI function. As such,
it is the rust struct lifetime which dominates the binding. When the
rust struct is initialized so is the FFI pointer and when the lifetime
of the rust struct expires due to it leaving scope, then the FFI
pointer resource is freed.

Rust programmers should expect to be aware of the lifetime management
rules of rust, and the borrow checker enforces many of them.

In the case of LVGL, there are two additional considerations. The
concept of LVGL widgets being managed in a single parent multi-child
tree, and the many-to-many relationship between widgets and styles.

The current implementation of Obj (the base class for a widget) is:

//! Native LVGL objects
//!
//! Objects are individual elements of a displayed surface, similar to widgets.
//! Specifically, an object can either be a widget or a screen. Screen objects
//! are special in that they do not have a parent object but do still implement
//! `NativeObject`.

use crate::lv_core::style::Style;
use crate::{Align, LvError, LvResult};
use core::fmt::{self, Debug};
use core::ptr;

/// Represents a native LVGL object. Note: This is a _trait_ not a struct! It does not _contain_ the pointer
pub trait NativeObject {
    /// Provide common way to access to the underlying native object pointer.
    fn raw(&self) -> LvResult<ptr::NonNull<lvgl_sys::lv_obj_t>>;
}

/// Generic LVGL object.
///
/// This is the parent object of all widget types. It stores the native LVGL raw pointer.
pub struct Obj {
    // We use a raw pointer here because we do not control this memory address, it is controlled
    // by LVGL's global state.
    raw: *mut lvgl_sys::lv_obj_t,
}

impl Debug for Obj {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        f.debug_struct("NativeObject")
            .field("raw", &"!! LVGL lv_obj_t ptr !!")
            .finish()
    }
}

// We need to manually impl methods on Obj since widget codegen is defined in
// terms of Obj
impl Obj {
    pub fn create(parent: &mut impl NativeObject) -> LvResult<Self> {
        unsafe {
            let ptr = lvgl_sys::lv_obj_create(parent.raw()?.as_mut());
            if ptr::NonNull::new(ptr).is_some() {
                //(*ptr).user_data = Box::new(UserDataObj::empty()).into_raw() as *mut _;
                Ok(Self { raw: ptr })
            } else {
                Err(LvError::InvalidReference)
            }
        }
    }

    pub fn new() -> crate::LvResult<Self> {
        let mut parent = crate::display::DefaultDisplay::get_scr_act()?;
        Self::create(&mut parent)
    }
}

impl NativeObject for Obj {
    fn raw(&self) -> LvResult<ptr::NonNull<lvgl_sys::lv_obj_t>> {
        if let Some(non_null_ptr) = ptr::NonNull::new(self.raw) {
            Ok(non_null_ptr)
        } else {
            Err(LvError::InvalidReference)
        }
    }
}

Note that:

  • this implementation does not implement Drop::drop(). As such, no
    structs of type Obj or their corresponding LVGL resources will ever
    be de-allocated. For the Global lifespan use case, this is not an
    issue. Any exceptions to the Global case may be handled using the
    low level bindings. As a general purpose bindings library, however,
    it is highly constraining for us to assume that use case.
  • as a consequence of not implementing Drop::drop() if the Obj
    goes out of scope then the rust struct will be released but not
    the memory referenced by the underlying C FFI pointer, thus
    resulting in a memory leak.
  • NativeObject for Obj implements our constraint on de-referencing
    null pointers.
  • many references are &mut in this code. This is due to LVGL
    performing a mutate-in-place on resources in many function calls as
    it is a C style library, which assumes mutability by default (as
    opposed to rust assuming immutability by default).
  • Note that there are two new() methods, one which assumes there is
    a current Screen in scope and one which takes the parent reference
    explicitly
  • Note that Obj is not Send or Sync by virtue of containing a
    raw pointer which implies !Send and !Sync trait restrictions. In
    the current implementation, an impl of Send trait would have
    unproven (and nearly impossible to prove) safety, precluding it as a
    sane option in a general purpose library. Without Send, however,
    mutation is restricted to the thread which created the rust struct
    and thus async is unusable.

In order to address the issues noted above, it is proposed that an
explict destructor be implemented for LVGL resources. Implementing
such a destructor requires some careful consideration of the issues of
parental ownership and many-to-many references. In the case of the
destructor being invoked then the resource on the C side will be
released and the Obj will become a null stub. In the correct context,
this means that the Obj will then be removed by the rust runtime.

Implementing destructor for Obj

It is proposed that the destructor be implemented as follows:

  • Allocate all Obj structs on the heap, using a constructor/builder
    method and Arc pointers. Arc pointers are reference counted and
    may be used with Mutex to ensure single thread access. A possible
    alternative is a single global mutex which locks all objects in the
    LVGL domain.
  • Provide a global mutable mapping table of raw pointers to rust Obj
    structs using weak pointers. This requires 32 or 64 bits per raw
    pointer, and sizeof(Arc) for the corresponding rust struct Id. The
    raw pointer shall be the key. Note that this does not require the
    complexity of a hash table as a linear scan or binary search will be
    sufficient for the number of entries. Weak pointers prevent the
    pointer locking the Obj as "in use".
  • On creation of an LVGL widget, assign it a new unique Id and insert
    it into the table, along with the raw pointer value. Note that this
    promotes the widget lifecycle to be 'static until explicit drop.
  • On drop() of a widget, determine if it has children. If it does not,
    then call lv_obj_del for the FFI pointer, set the pointer in the
    rust struct to null, and remove the Obj from the mapping table. If,
    however, it does have children then recurse the tree of children and
    delete them working backwards from the leaf nodes. Recursing the
    tree of children should be done in the LVGL C space and will
    return lv_obj_t pointers. The global lookup table shall be used to
    retrieve the correponding rust struct. This pattern moves the
    cascade delete from the C domain to the rust domain, and ensures
    that there will not be rust structs with dangling references to
    memory in the C domain. Similar strategies may be required for
    other resources on the rust side which hold FFI pointers.
  • Styles must also be implemented using ARC pointers, however they are
    not owned by widgets as they may have a many-to-many relations with
    widgets. This also applies to Fonts, although fonts are usually in
    read-only memory and would have a no-op drop function. To ensure
    that styles exist for the duration of the referencing resources, we
    should either keep a copy of the style references in the Obj, which
    involves handling potentially multiple styles per widget and thus an
    array of Arc references, or in a global table.
  • Care must be taken of the need to support #[no_std] for embedded
    systems and thus alloc needs to be handled internally in the
    bindings libary for that case.
  • Other resources may include associated CStrings such as Label text.
  • These capabilities may be placed behind feature gates to disable
    them in contexts such as Global lifespan.

Side effects of this proposal include:

  • Increased memory consumption on the rust side for additional fields in Object
  • Increased indirection for ARC pointers
  • Additional complexity and memory with ARC+Mutexes, thus requiring
    the embedded HAL to support mutexes.
  • This would be a breaking change for existing code using the
    lv_binding_rust library.

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 17, 2023

Hey @cydergoth, thank you so much for the detailed writeup! It definitely echoes some concerns I've had also. I mostly agree that we need to tackle dropping somehow and it's been a big headscratcher for me; I considered something tangentially related in #107 (specifically, this) which vaguely correlates.

On Obj, I have tried the approach of holding it in a Box<_> instead of just the pointer, but we currently don't handle dropping children of dropped objects properly and it resulted in segfaults all over the place. However, I'm not entirely certain that wrapping everything in an Arc is a great idea simply for performance reasons; this actually sort of used to be how we did things (though only for examples, so it was up to the user to decide how and what to do). Keeping everything behind fat pointers and dynamically handling memory has performance and memory overhead, and LVGL is supposed to run on baremetal with minimal hardware. Reference counting and mutexes are not particularly taxing, but costs add up. Keeping a global mutex is probably less memory-intensive but might degrade actual performance by a lot.

I do like the idea of keeping a global list of live objects, though we are burdened by the fact that - as you pointed out - we don't have an allocator by default except for LVGL's (which has access to very little memory). I'm unsure thus if it's possible to implement it for all cases (i.e. without alloc).

I do think there's a happy compromise possible to allow dropping and also not tax resources or performance too much. Rust's lifetimes do allow us to automatically invalidate a lot of things; changing e.g. Obj to hold raw: Box<_>, parent: PhantomData<&'a Obj> which is recursively invalidated on parent deletion might get us most of the way towards dynamic dropping (and we can simply have users wrap objects they want to live globally as ManuallyDrop<Obj>).

This doesn't address threading though, so I am curious what can be done there (current approach being "nothing, just don't touch LVGL-space from multiple threads"). There's also the issue of "what do we do if something manifests an object pointer out of thin air" which can occur (see the animation PR I recently merged, where via witchcraft I need to turn a *mut lv_obj_t into a &mut Obj where the original Obj is then leaked - leaking is useless now, but I specifically did it that way to be resilient against the future possibility of implementing real dropping). Here a global pointer table could be useful, though maybe we could also just pretend any witchcraft-derived Obj is 'static or make it ManuallyDrop and never drop it since outside of special-case exemptions it can "only" happen if it's secretly a duplicate of an existing object; but then we must somehow ensure said object isn't dropped before this ghost one is, so we need to make sure these ghost objects only exist in their own very short scope.

Honestly I'm not sure what to do, but I worry about the cost involved in dynamic checking of anything. Though if you want to, I would be curious to see benchmarks.

@cydergoth
Copy link
Contributor

So the performance of rendering should be mostly on the C side, which would be using all native pointers. The extra overheard of Arc should only come into play when you need to make a call via the rust domain, which should be mostly during UI creation, right? After that it would just be small updates, with LVGL doing all the Rasterizing and rendering.

The mutex would need to be locked during rendering, then any events would have to wait for rendering to complete, but it would stop one thread like an event thread trying to update the LVGL model whilst another thread is deleting that object 😀

If you take a look at my repo, you'll see it uses an unsafe impl Send and async to update the UI. I don't think I could do that without async, unless I move completely to a command model and have the render thread poll a command queue for mutation requests - which whilst possible seems a little clunky as async effectively does the poll

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 17, 2023

Send was actually implemented for Obj until a few days ago when I intentionally removed it cause I realised it could be dangerous; maybe creating a distinct LvThreaded<T> where T: Widget and allowing users to work with those behind mutexes etc might work?

@cydergoth
Copy link
Contributor

Do you think prototyping with Arc would be valuable to determine precisely what the tradeoffs are?

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 19, 2023

Sure. Though if there is any noticeable impact on memory use or speed, we will probably need this as an explicit opt-in

@cydergoth
Copy link
Contributor

cydergoth commented Jun 19, 2023 via email

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 20, 2023

Would you be interested in working on this with me? I can ask that you be added as a collaborator

@cydergoth
Copy link
Contributor

cydergoth commented Jun 20, 2023 via email

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 20, 2023

That's fair. This probably won't be getting merged "soon", but I'll open a dedicated issue for threading and we can discuss it there.

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 20, 2023

See #140. I'll get started on trying to figure things out, but it's probably going to be a while before it works. But in case you want to help out with code - @kisvegabor can you give @cydergoth write perms on branches? I can't manage access for people w/ my current perms.

@kisvegabor
Copy link
Member

@nia-e I've added Admin rights for you. Now you should be able to invite people and change permissions.

@nia-e
Copy link
Collaborator Author

nia-e commented Jun 21, 2023

Thanks! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants
@cydergoth @kisvegabor @nia-e and others