Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Rusty" API is NOT Idiomatic Rust #25

Open
4LT opened this issue Oct 26, 2023 · 14 comments
Open

"Rusty" API is NOT Idiomatic Rust #25

4LT opened this issue Oct 26, 2023 · 14 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@4LT
Copy link

4LT commented Oct 26, 2023

First, let me say that I am by no means a Rust expert. However, I have noticed several cases while reviewing the Unit WASM "Rusty" API where the code is not very conventional (lit. doesn't follow conventions) of typical Rust code.

Right off the bat, I see the use of the uwr namespace in the name of functions, like uwr_init_ctx. That makes the fully qualified name unit_wasm::rusty::uwr_init_ctx. This is an instance of smurf naming. Generally this would be avoided by importing the module instead of the function, as follows:

use wasm_unit::rusty;

// in the body of a function...
  rusty::init_ctx(...);

Otherwise* there are cases where ergonomic constructs could be used (avoiding raw pointers, not requiring writing code that is unsafe, use of higher-level types, etc.). I know that the WASM host-plugin interface is inherently low-level (basically limited to C-language constructs) but I believe the convention for FFI-centered APIs is to split into a low-level crate (usually suffixed -sys) interaction with C interfaces and a high-level crate that provides ergonomic and safe abstractions. As I see it, the current wasm_unit "ffi" module performs the lowest-level interactions, but the "rusty" module is still very low-level itself. "rusty" even suffers as a medium-level abstraction because it still requires writing unsafe code to use (e.g. manipulating raw pointers).

Thus I believe that the "rusty" API ought not be named "rusty" because the name suggests a more idiomatic interface, or the API should be redesigned to be more high-level and idiomatic, or at least medium-level providing safe abstractions to ease the development of a high-level API.

*there are more cases of naming breaking convention, but I feel it would be pedantic to list them here in full.

@4LT
Copy link
Author

4LT commented Oct 26, 2023

Rust API guidelines can be found here:

https://rust-lang.github.io/api-guidelines/

@ac000
Copy link
Member

ac000 commented Oct 26, 2023

Hi!

First, let me say that I am by no means a Rust expert. However, I have noticed several cases while reviewing the Unit WASM

That makes two of us! (me most likely even less so, this was my first rust code).

"Rusty" API where the code is not very conventional (lit. doesn't follow conventions) of typical Rust code.

Indeed. As you can probably tell, it was written by a C programmer!.

Right off the bat, I see the use of the uwr namespace in the name of functions, like uwr_init_ctx. That makes the fully qualified name unit_wasm::rusty::uwr_init_ctx. This is an instance of smurf naming. Generally this would be avoided by importing the module instead of the function, as follows:

Never thought of it like that, smells a bit like C++...

use wasm_unit::rusty;

// in the body of a function...
  rusty::init_ctx(...);

Hmm, given the choice between

rusty::init_ctx(...);

and

uwr_init_ctx(...);

I do much prefer the latter (maybe that's just my C brain...).

Otherwise* there are cases where ergonomic constructs could be used (avoiding raw pointers, not requiring writing code that is unsafe, use of higher-level types, etc.). I know that the WASM host-plugin interface is inherently low-level (basically limited to C-language constructs) but I believe the convention for FFI-centered APIs is to split into a low-level crate (usually suffixed -sys) interaction with C interfaces and a high-level crate that provides ergonomic and safe abstractions. As I see it, the current wasm_unit "ffi" module performs the lowest-level interactions, but the "rusty" module is still very low-level itself. "rusty" even suffers as a medium-level abstraction because it still requires writing unsafe code to use (e.g. manipulating raw pointers).

Right, 'rusty' is really just a more (for some value of more) rust like interface on top of the raw bindings.

Thus I believe that the "rusty" API ought not be named "rusty" because the name suggests a more idiomatic interface, or the API should be redesigned to be more high-level and idiomatic, or at least medium-level providing safe abstractions to ease the development of a high-level API.

Well, it's 'rusty' because it's more rusty than the raw bindings luw_*

If we do write an even more rust like interface, it can be called rusted!

Thanks for your comments.

@Nemo157
Copy link

Nemo157 commented Oct 28, 2023

More of an issue is that just looking at the function signatures I'm 99% sure it is unsound. If I were to call unit_wasm::rusty::uwr_get_http_path(0x12345 as *const unit_wasm::rusty::luw_ctx_t) does this not actually dereference that invalid pointer? (If so, why does it take the pointer as an input?)

@ac000
Copy link
Member

ac000 commented Oct 28, 2023

You think something like (from the rust echo-request example)

uwr_write_str!(ctx, "REQUEST_PATH = {}\n", uwr_get_http_path(ctx));

is unsafe?

uwr_get_http_path(ctx) simply returns a pointer to a nul-terminated string contained in memory (well, the rust version of that, but that's how it is underneath).

It needs ctx so it knows where to look in memory, ctx contains a pointer to the memory in question.

I'm not sure why you think ctx is invalid...

@A248
Copy link

A248 commented Oct 28, 2023

A safe function, invoked by safe code, must not cause undefined behavior, no matter which inputs are passed to that safe function. A violation of this rule is considered unsound. The idea of this rule is to enforce a clear separation between safe, free of undefined behavior, and unsafe code, which if written incorrectly may cause undefined behavior. See https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html.

Rather than working with raw pointers, it would make far more sense for a Rust-based API to take references. A reference in Rust is always guaranteed to be valid, which means safe code can't accidentally pass a broken reference. (Only unsafe code can break this semantic by creating references in some roundabout, errant fashion).

@ac000
Copy link
Member

ac000 commented Oct 29, 2023

Seeing as it's just a wrapper around the C API it needs to be 'unsafe' at somepoint AFAICT and to make it otherwise would require a native Rust API. Given various constraints the C API wrapper was the chosen method.

@Nemo157
Copy link

Nemo157 commented Oct 29, 2023

The key is that you should either:

  • expose the fact that the API is unsafe by declaring it unsafe fn (and then describe in the documentation what preconditions are required by users to avoid undefined behavior)
  • wrap the C API types in Rust types with a safe public API that does not allow users to cause undefined behavior

Currently, your public API claims to be safe, but it has undocumented preconditions required of the users (e.g. that they pass in a ctx pointing to a valid instance).

@tippexs tippexs added help wanted Extra attention is needed question Further information is requested labels Oct 29, 2023
@ac000
Copy link
Member

ac000 commented Oct 29, 2023 via email

@Nemo157
Copy link

Nemo157 commented Oct 29, 2023

Where is the undefined behaviour? Or do you really mean UB (where the compiler is free to do what it likes) or something else?

In my example:

unit_wasm::rusty::uwr_get_http_path(0x12345 as *const unit_wasm::rusty::luw_ctx_t)

This is safe code constructing a pointer to an invalid luw_ctx_t, then passing it into the function. This pointer is non-dereferencable for multiple reasons: it doesn't point to a valid object and is unaligned. If the luw_get_http_path call inside uwr_get_http_path then attempts to dereference it this results in undefined behavior.

Besides it's not undocumented, it's quite clear that you pass around an initialised ctx structure...

I don't see any documentation on uwr_get_http_path. It's part of Rust's design that each unsafe fn locally documents its required preconditions, or references out to somewhere else that defines them. And note that this is only for unsafe fn, a safe fn must not be able to cause UB even if the user reads the documentation about what is required and then explicitly does the opposite.

@ac000
Copy link
Member

ac000 commented Oct 29, 2023

Where is the undefined behaviour? Or do you really mean UB (where the compiler is free to do what it likes) or something else?

In my example:

unit_wasm::rusty::uwr_get_http_path(0x12345 as *const unit_wasm::rusty::luw_ctx_t)

This is safe code constructing a pointer to an invalid luw_ctx_t, then passing it into the function. This pointer is non-dereferencable for multiple reasons: it doesn't point to a valid object and is unaligned. If the luw_get_http_path call inside uwr_get_http_path then attempts to dereference it this results in undefined behavior.

That makes no sense.

At this point ctx is either initialised or not and it's quite clear that ctx should be previously initialised, even without reading the docs or even knowing the API, surely common sense dictates as much? The fact ctx is passed in as const must be a big clue...

@Nemo157
Copy link

Nemo157 commented Oct 29, 2023

Rust's safety rules do not require common sense of users, they are designed to guarantee memory safety in the face of bugs in safe code.

@ac000
Copy link
Member

ac000 commented Oct 29, 2023

Scary...

@A248
Copy link

A248 commented Oct 29, 2023

I think you are both missing the forest for the trees. Typical applications written in Rust will not interface with raw pointers. They may use unsafe rarely or not at all. Dangling pointers become a thing of the past. You can write entirely safe code and never have to debug a use-after-free or double-free again, except for the occasional bug in a library.

Since unit-wasm interfaces with a C API, it necessarily needs unsafe. A pattern you could use for FFI purposes would be to wrap a pointer -- which you internally guarantee to be valid -- in a proper struct. Then API users can pass around references to this struct safely. The API calls might utilize the pointer under the hood, but such details are hidden from the API user, which prevents mistakenly using de-initialized memory. E.g.:

use std::ptr;

/// Low-level FFI struct
pub struct luw_ctx_t {
    // TODO
}

/// Safe interface for UWR context
pub struct UwrContext {
    // Guarantees that fields are fully initialized
    inner: luw_ctx_t
}

impl UwrContext {
    // Creates a context
    pub fn new() -> Self {
        use ptr::null_mut;

        let mut inner = luw_ctx_t {
            addr: null_mut(),
            mem: null_mut(),
            req: null_mut(),
            resp: null_mut(),
            resp_hdr: null_mut(),
            resp_offset: 0,
            req_buf: null_mut(),
            hdrp: null_mut(),
            reqp: null_mut(),
            resp_hdr_idx: -1,
        };
        unsafe {
            // SAFETY
            // (Document why safety constraints are upheld)
            let ptr = &mut inner as *mut luw_ctx_t;
            // I don't know enough low-level C to fill out these details
            let addr = todo!();
            let offset = todo!();
            luw_init_ctx(ptr, addr, offset);
        }
        Self { inner }
    }

    pub fn http_send_response(&self) {
        unsafe {
            // SAFETY
            // self.inner is guaranteed to be valid and initialized
            let ptr = &self.inner as *const luw_ctx_t;
            luw_http_send_response(ptr);
        }
    }

    pub fn http_add_header(&mut self, name: &str, value: &str) {
        unsafe {
            // SAFETY
            // self.inner is guaranteed to be valid and initialized
            let ptr = &mut self.inner as *mut luw_ctx_t;
            luw_http_add_header(
                ptr,
                S2C!(name).as_ptr() as *const i8,
                S2C!(value).as_ptr() as *const i8,
            );
        }
    }
}

The act of wrapping an unsafe interface in a fully safe API with no runtime overhead is Rust's great innovation over existing native languages. In Rust culture, safe code is the norm; unsafety is borne of necessity where the compiler cannot check invariants. Instead, unsafe code will commonly use SAFETY comments describing why memory-safety invariants are upheld. Assuming unsafe code fulfills it promises, the result is a safe API that is equally runtime-efficient yet vastly more developer-friendly than using raw pointers.

@tippexs
Copy link
Contributor

tippexs commented Oct 29, 2023

Thanks for sharing! We are aware, that the current implementation of the C-API in Rust is not 100% ready! It was our first implementation of raw FFI bindings and is under objective to be changed over time.

The code you have shared is indeed a concept I saw while researching other FFI Implementations. We can use this issue to discuss the necessary changes OR create a new issue for some specific parts and link them together! I like the idea of Rust Wrappers for the low level FFI.
That's the reason why we have the sys crate and this as a wrapper implementation. Furthermore PRs are always welcome.

ac000 added a commit to ac000/unit-wasm that referenced this issue Nov 1, 2023
From discussions on GitHub, people would like to see (besides a more
idiomatic Rust API) the Rust functions more explicitly marked as
'unsafe'.

Uplift 'unsafe' from the function body into the function prototype.

Link: <nginx#25>
Signed-off-by: Andrew Clayton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants