Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derive macro for complex enums #166

Closed
jordens opened this issue Jul 31, 2023 · 28 comments · Fixed by #233
Closed

derive macro for complex enums #166

jordens opened this issue Jul 31, 2023 · 28 comments · Fixed by #233
Labels
enhancement New feature or request

Comments

@jordens
Copy link
Member

jordens commented Jul 31, 2023

would only change their variant at runtime outside Miniconf, i.e. be like option, untagged enum
They would implement exactly one depth: the max of the inner types.

@jordens jordens added enhancement New feature or request good first issue Good for newcomers labels Jul 31, 2023
@MathiasKoch
Copy link

MathiasKoch commented Jul 26, 2024

I started an implementation of general enum support, but I think we might need to discuss slightly how it should work :p

Basic enum keys

For instance, a simple unit test of the node iteration:

#[derive(PartialEq, Debug, Clone, Default, Tree)]
struct Inner {
    data: u32,
}

#[derive(Debug, Clone, Tree)]
enum Either {
    A
    B(Inner)
}

#[derive(Debug, Clone, Tree)]
enum EitherDeep {
    A
    #[tree(depth = 1)
    B(Inner)
}

#[test]
fn iterate_keys() {
    let mut expected_keys = HashSet::from(["/A", "/B"]);

    for node in Either::nodes::<Path<String, '/'>>() {
        let (path, node) = node.unwrap();
        assert!(node.is_leaf());

        assert!(expected_keys.remove(path.as_str()));
    }
    assert!(expected_keys.is_empty());
}


#[test]
fn iterate_keys_deep() {
    let mut expected_keys = HashSet::from(["/A", "/B/data"]);

    for node in EitherDeep::nodes::<Path<String, '/'>>() {
        let (path, node) = node.unwrap();
        assert!(node.is_leaf());

        assert!(expected_keys.remove(path.as_str()));
    }
    assert!(expected_keys.is_empty());
}

Intuitively, I would expect both of these tests to pass.

But what happens when we want to support all forms of enums (tree attr removed for simplicity):

#[derive(Debug, Clone, Tree)]
enum EitherDeep {
    A,
    B(Inner),
    C(Inner, Inner),
    D { value: usize }
}

Maybe the more correct way would be to have this result in keys: ["/A", "/B/0/data", "/C/0/data", "/C/1/data", "/D/value"], assuming that would be possible with the current traits?

That way it should also be super simple to reuse the same code for supporting tuples, in the same manner as arrays?

The issue is, that now enums of the format B(Inner) has a different key scheme than a newtype struct, and Option would have, as I would expect both of these to be "transparent" (as is already the case with Option)

It would also be possible to go with something in between, where single field variants (B(Inner)) are flattened to "/B/data", while keeping the multifield variants as ["/C/0/data", "/C/1/data"]. This might be intuitive with the argument that it fits with newtype wrapper syntax and behavior, and that B(Inner) == B { data: u32 } in keys

Obviously, another option is to only allow a single unnamed field in the B cases, and if users need more fields, they would need to wrap it in a struct or tuple? Though I would personally think it would be nicer to design for the full syntax support eventually?

Other issues or thoughts

Are there any other stuff I need to know about, or have not though about, before I venture down the rabbit hole of implementing this?

@jordens
Copy link
Member Author

jordens commented Jul 26, 2024

Thanks for the interest, Mathias!
Currently on vacation, I'll read your post fully later, but in any case here are my thoughts:

From previous discussions and existing behavior of Option, I'd think we want this (conceptually, modulo coding errors, not tested). I'll refine it later if I see issues.

#[derive(Serialize, Deserialize)]
enum En0 {
    Uni,
    Tup(f32),
    Var { field: f32 },
    Ar(
        [f32; 1],
    ),
}

#[derive(Tree)]
enum En1 {
    Uni,
    Tup(f32),
    Var { field: f32 },
    Ar(
        [f32; 1],
    ),
}

#[derive(Tree)]
enum En2 {
    Uni,
    Tup(f32),
    Var { field: f32 },
    Ar(
        [f32; 1],
    ),
}

#[derive(Tree)]
enum En3 {
    Uni,
    Tup(f32),
    Var { field: f32 },
    Ar(
        #[tree(depth=1)]
        [f32; 1],
    ),
}


#[derive(Tree)]
struct Stru {
    // atomic serialize/deserialize, no Tree API usage, ability to switch variant through Miniconf
    at: En0,
    
    // one tree API level, one path hierarchy level
    // atomic usage of variants,
    // inability to switch variant through Miniconf, only through other means
    #[tree(depth=1)]
    depth1: En1,
    
    #[tree(depth=2)]
    depth2: En2,

    #[tree(depth=3)]
    depth3: En3,

}

assert_eq!(
    Stru::nodes::<Path<String, '/'>>().map(Result::unwrap).collect::<Vec<_>>(),
    [
        "/at",
        "/depth1/Uni", "/depth1/Tup", "/depth1/Var", "/depth1/Ar",
        "/depth2/Uni", "/depth2/Tup/0", "/depth2/Var/field", "/depth2/Ar/0",
        "/depth3/Uni", "/depth3/Tup/0", "/depth3/Var/field", "/depth3/Ar/0/0",
    ]
);

// Paths in/on variants that are not present are `Traversal::Absent`

@jordens
Copy link
Member Author

jordens commented Jul 26, 2024

Maybe we can implement the flatten attribute to kill the /0 on single-field tuple variants. This is analogous to what we'd want for newtypes.

@jordens
Copy link
Member Author

jordens commented Jul 26, 2024

Switching between variants can later be implemented via some "tagging" field outside (adjacent/external/internal like serde) that then implements the switch in its handler.

@MathiasKoch
Copy link

Switching between variants can later be implemented via some "tagging" field outside (adjacent/external/internal like serde) that then implements the switch in its handler.

Yeah, I thought about these as well, but as I see it, anything but the default externally tagged, would result in multiple keys in order to get one field in the enum, which I think would be difficult in the current design?

@ryan-summers
Copy link
Member

ryan-summers commented Jul 26, 2024

The summary that @jordens has above for the examples makes the most sense with what already exists in miniconf for sure. While the extra /0 on newtypes is annoying, I believe this is what we are already doing for newtypes. We can spawn a separate issue to support #[tree(flatten)] later to get rid of it.

Yeah, I thought about these as well, but as I see it, anything but the default externally tagged, would result in multiple keys in order to get one field in the enum, which I think would be difficult in the current design?

I'm not certain it's actually possible with Miniconf to guarantee that variants could be swapped. I think this is only possible if there's a single leaf within the enum variant that you want to swap to. I think to do this we would have to allow mutable access to the root of a complex enum. For example:

enum Foo {
   Bar,
   #[tree(depth = 1)]
   Qux{value: f32, other: f32},
}

// Has paths ["/Bar", "/Qux/value", "/Qux/other"]

#[derive(Tree, Default)]
struct Settings {
    #[tree(depth = 1)]
    foo: Foo,
}

let mut settings = Settings::default();
settings.foo = Foo::Bar;

// Read the current value of `foo`.
// Note: This is pseudo-code.
assert!(settings.get_json("/foo/Bar") == "null"); // What's the serialized unit type? Might be "" or "()"
assert!(settings.get_json("/foo/Qux/value").is_err());

// We now want to switch `settings.foo` to `Foo::Qux`. This is a bit awkward because `/foo` isn't actually a valid path
// because its an internal node.
settings.set_json("/foo", {"type": "Qux", "value": "10", "other": 9});
assert(settings.get_json("/foo/Qux/value") == 10);
assert!(settings.get_json("/foo/Bar").is_err());

I think both adjacent and external tagging would work fine, but we may need to make some modifications to serde-json-core

@MathiasKoch
Copy link

MathiasKoch commented Jul 26, 2024

I'm not certain it's actually possible with Miniconf to guarantee that variants could be swapped. I think this is only possible if there's a single leaf within the enum variant that you want to swap to.

My use-case might be slightly off from miniconf's intended behaviour, but I am mostly interested in using miniconf to turn my json tree into individually flattened key-value pairs of all leaf values individually (with few exceptions around arrays)

I thought about this as well, and came to the conclusion that I would be okay with complex enums leaving "unused" keys in my key-value storage, as long as I know that they will be hit again next time the enum variant changes, such that there is a strict upper limit on this defined by the enum tree.

@MathiasKoch
Copy link

What's the serialized unit type? Might be "" or "()"

I would think this would work like serialized json, where unit types act as a string serialization on the parent key?:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cf0abd49efe1a0cd5cb24a4da251c5e8

Such that (using your snippet)
/foo would have serialized value Bar
/foo/Qux would have { "value": "10", "other": 9 }

@ryan-summers
Copy link
Member

ryan-summers commented Jul 26, 2024

Such that (using your snippet) /foo would have serialized value Bar

This seems reasonable to me, since there's no other value represented. So for the simple unit enum case, we treat them how we're already treating enums (i.e. they become a terminal leaf node immediately). @jordens Does that sound appropriate to you?

That would modify the above as follows:

assert_eq!(
    Stru::nodes::<Path<String, '/'>>().map(Result::unwrap).collect::<Vec<_>>(),
    [
        "/at",
-        "/depth1/Uni", "/depth1/Tup", "/depth1/Var", "/depth/1/Ar",
-        "/depth2/Uni", "/depth2/Tup/0", "/depth2/Var/field", "/depth2/Ar/0",
-        "/depth3/Uni", "/depth3/Tup/0", "/depth3/Var/field", "/depth3/Ar/0/0",
+        "/depth1", "/depth1/Tup", "/depth1/Var", "/depth/1/Ar",
+        "/depth2", "/depth2/Tup/0", "/depth2/Var/field", "/depth2/Ar/0",
+        "/depth3", "/depth3/Tup/0", "/depth3/Var/field", "/depth3/Ar/0/0",
    ]
);

Hmm but doesn't this introduce complexities, where /depth1 simultaneously would be a leaf and internal node? Is that a problem?

@ryan-summers
Copy link
Member

@MathiasKoch I unintentionally editted your comment above, adding my reply here and reverting it shortly:

My use-case might be slightly off from miniconf's intended behaviour, but I am mostly interested in using miniconf to turn my json tree into individually flattened key-value pairs of all leaf values individually (with few exceptions around arrays)

The goal of miniconf is simply to turn complex data structures into key-value pairs, so it sounds like you want the exact thing that we're trying to provide :)

I thought about this as well, and came to the conclusion that I would be okay with complex enums leaving "unused" keys in my key-value storage, as long as I know that they will be hit again next time the enum variant changes, such that there is a strict upper limit on this defined by the enum tree.

I'm not sure I'm completely following what you mean here. Do you have a small example? In general, when you iterate over the keys of the tree, you will get a Error::Absent for the keys that don't actually have any values associated with them, so you can omit saving them to your key-value storage entirely.

@ryan-summers
Copy link
Member

It's just occurred to me that there may be complications in loading complex enums that expose their members as subfields. When you're loading the values from some external key-value storage, you'd get something like ("/foo/Qux/inner", and "/foo/Qux/other"). How would miniconf behave if Settings::foo was currently Foo::Bar and not Foo::Qux? Is it the users responsibility to swap over the foo variant before iterating over the key-value storage and applying the values? If not, I'm unsure how miniconf could apply half of a Qux, (i.e. inner is specified, but other has not yet been defined)

@MathiasKoch
Copy link

MathiasKoch commented Jul 26, 2024

I'm not sure I'm completely following what you mean here. Do you have a small example?

I can try. Building on your example:

enum Foo {
   Bar,
   #[tree(depth = 1)]
   Qux{value: f32, other: f32},
}

// Using above notion on unit variants, this has paths ["/", "/Qux/value", "/Qux/other"]

#[derive(Tree, Default)]
struct Settings {
    #[tree(depth = 1)]
    foo: Foo,
}

let mut settings = Settings::default();
settings.foo = Foo::Bar;

assert!(settings.get_json("/foo") == "Bar"); 
assert!(settings.get_json("/foo/Qux/value").is_err());

// I would now persist these to e.g. `sequential-storage`, which persists following key-value pairs:
// [("/foo", "Bar")]

// We now want to switch `settings.foo` to `Foo::Qux`. 

settings.set_json("/foo", {"type": "Qux", "value": "10", "other": 9}); // Not sure about how this should be handled, but pseudo-code for now

assert(settings.get_json("/foo/Qux/value") == 10);
assert!(settings.get_json("/foo/Bar").is_err());

// I would now persist these which persists following key-value pairs:
// [("/foo/Qux/value", 10), ("/foo/Qux/other", 9)]

// But i still have the key from `("/foo", "Bar")` which is kinda stranded until we update the config back

@MathiasKoch
Copy link

MathiasKoch commented Jul 26, 2024

If not, I'm unsure how miniconf could apply half of a Qux, (i.e. inner is specified, but other has not yet been defined)

But wouldn't this be exactly the same in struct cases (existing implementation)?

@ryan-summers
Copy link
Member

Your code example makes sense, so you'd have the following key-values in storage [("/foo", "Bar"), ("/Qux/inner", "9"), ("/Qux/other", "10")]. There's really two possibilities:

  1. When attempting to set paths that don't currently exist (i.e. because of the existing Enum variant present), just ignore the errors
  2. Dynamically swap out the variant as they're received as keys

(2) doesn't quite seem possible, so simply ignoring the errors seems the most normal case. We actually do this with Stabilizer, since there's different firmware variants that can run on the same hardware and they segregate their settings using prefixes in sequential-storage. It's pretty normal and works well, although I think in your case you'd have to do a check for miniconf returning Error::Absent when trying to configure some paths.


But wouldn't this be exactly the same in struct cases (existing implementation)?

In the struct case, the tree already exists in memory when calling set_json or get_json. so the default state of unspecified values is known because they already exist in memory. However, what if you currently have the Foo::Bar variant selected and try to apply a Foo::Qux key? I believe this should result in an Error::Absent with the current proposed design.

@MathiasKoch
Copy link

Alright, so to sum up so far, we are down to a few bigger (uncovered) issues here?

  1. How would it work, having a path simultaneously be a leaf and internal node for unit enum variant cases?
  2. How to use mutation operations (set_json) to change between enum variants, if possible?
  3. Given the following key-values in storage [("/foo", "Bar"), ("/foo/Qux/inner", "9"), ("/foo/Qux/other", "10")], how to determine which enum variant gets constructed if just loading from storage? (Kinda same issue as 2)

@ryan-summers
Copy link
Member

Alright, so to sum up so far, we are down to a few bigger (uncovered) issues here?

  1. How would it work, having a path simultaneously be a leaf and internal node for unit enum variant cases?

I believe this gets resolved based on the current value of the enum. If the value of the enum during iteration is the unit type, the node is a leaf. If the enum variant is non-unit, then the node will not be a leaf. I don't think this is actually a problem.

  1. How to use mutation operations (set_json) to change between enum variants, if possible?

I would defer this to a future task for now. Is this something that you actively need? In any case, I would implement it separately from initial support for enums.

  1. Given the following key-values in storage [("/foo", "Bar"), ("/foo/Qux/inner", "9"), ("/foo/Qux/other", "10")], how to determine which enum variant gets constructed if just loading from storage? (Kinda same issue as 2)

For now, I would say that the enum currently in the structure would be the only valid one. I.e. if the structure has the Foo::Bar variant, then the /foo/Qux/* paths would generate an Error::Absent when calling set_json().

@MathiasKoch
Copy link

MathiasKoch commented Jul 26, 2024

Makes sense overall. The only one I am seeing a slight issue with is the last one.
For my application, I need to be able to load the full struct from my storage (at initialization, for instance), and if we design it in such a way that we cannot change the structure, I am not sure how that would be possible if the last stored enum variant is different from the default enum variant?

Eg. device starts from clean state (reboot), loads the following key-values in storage [("/foo", "Bar"), ("/foo/Qux/inner", "9"), ("/foo/Qux/other", "10")]. (last persisted keys are ("/foo/Qux/inner", "9"), ("/foo/Qux/other", "10"), but the device doesn't know that)

Device is now unable to reconstruct the last persisted state?

EDIT:
I could handle this outside of miniconf, by deleting "/foo" key when persisting the ("/foo/Qux/inner", "9"), ("/foo/Qux/other", "10") and opposite.

@ryan-summers
Copy link
Member

You could also handle this outside of miniconf by detecting the Error::Absent being returned and potentially then updating the enum based on the key and reapplying it for now.

@MathiasKoch
Copy link

You could also handle this outside of miniconf by detecting the Error::Absent being returned and potentially then updating the enum based on the key and reapplying it for now.

I think the point is, that given above example, I don't see how it would ever return Absent?

@ryan-summers
Copy link
Member

ryan-summers commented Jul 26, 2024

I would expect the behavior to be as follows. I.e. when attempting to set Foo::Qux, Error::Absent is returned becaused Foo::Bar is present:

#[derive(tree)]
enum Foo {
   Bar,
   #[tree(depth = 1)]
   Qux{value: f32, other: f32},
}

#[derive(Tree, Default)]
struct Settings {
    #[tree(depth = 1)]
    foo: Foo,
}

let mut settings = Settings::default();
settings.foo = Foo::Bar;

assert_eq!(settings.set_json("/foo/Qux/value", b"1"), Err(Error::Absent))

@jordens
Copy link
Member Author

jordens commented Jul 26, 2024

Thanks for the discussion! I think we can draw a couple conclusions. Let me know what you think.

  • A node (i.e. a Path/Indices/Packed) can not simultaneously (for a given Tree<Y>) be (resolve to a) leaf and non-leaf. Unit variants for depth > 0 need to be like "/depth1/Uni" == Ok("") (semantically) when present. They can't end up "/depth1" == Ok("Uni"). Conveniently, they are the only type that is intrinsically flattened. Also Unit fields (end up as syn::Type::Tuple, don't know why) aren't supported yet at all so we can decide what we do.
  • Since each field has exactly one location in the tree (it's either a leaf or part of a leaf but not both), switching variants can only be done
    • depth>1: requiring Default for all variants: "/depth2/tag" = "Tup" then "/depth2/Tup/0" == Ok("3.0") (again semantically) or
    • depth=1: "/depth1/Uni" = "" (TBC) or "/depth1/Var" = "{ 'field': 3.0 }"
    • depth=0: (i.e. atomic, non-Tree) by setting the entire thing with any standard serde tagging: "/at" = "{ 'tag': 'Var', 'content': { 'field': 3.0 }") for adjacent
  • I don't think we want internally/adjacently/untagged enums in the serde sense (at least not now). The resulting namespace collisions sound complicated for miniconf. Does serde_json_core even support them? Instead for depth > 1 I'd like to support the these distinct styles (names up for bikeshedding):
    • tag=None,explicit: you can't switch the variant natively through the miniconf API. There is no tag node. This doesn't preclude you from adding another field on an outer struct that switches variants in its handler.
    • tag="tag",explicit: requiring Default for all variants, you can switch using the tag node. Tag node name configurable.
    • tag=None,implicit: requiring Default for all variants, we implicitly switch variants. No node is ever Absent on deserialize. No tag node.
    • tag="tag",implicit: requiring Default for all variants, we implicitly switch variants. No node is ever Absent on deserialize. Tag node with given name also switches variants.
  • We want the flatten attribute.
  • Any node (e.g. tag) can be made read-only via some attribute or handler.
  • Your case @MathiasKoch , at least as far as I understand, sounds well within scope. Might be similar to what we do in serial-settings. Let's try to support it!

I am not sure how that would be possible if the last stored enum variant is different from the default enum variant?
Device is now unable to reconstruct the last persisted state?

Good point! For depth > 1 we need to make sure the tag comes first for tag="typ",explicit:

/foo/typ = "Bar"
/foo/Bar = ""
/foo/typ = "Qux"
/foo/Qux/value = "3"
/foo/Qux/other = "9"

Or you'd use implicit:

/foo/Bar = ""
/foo/Qux/value = "3"
/foo/Qux/other = "3"

Both would work.

Certainly not all of these features need to land at the same time. Let's boil it down to a MVP that doesn't preclude later completion. Maybe we want to kick implicit down the road, it might be a bit of a footgun (side effects). depth=1, tag="tag", and flatten could also be deferred. depth=0 is already supported. Leaving the depth>1, explicit, tag=None case as a potential MVP (though not covering your use case in #166 (comment) natively. You might still use the handler approach mentioned above though).

@ryan-summers
Copy link
Member

ryan-summers commented Jul 26, 2024

Thanks for the discussion! I think we can draw a couple conclusions. Let me know what you think.

  • A node (i.e. a Path/Indices/Packed) can not simultaneously (for a given Tree<Y>) be (resolve to a) leaf and non-leaf. Unit variants for depth > 0 need to be like "/depth1/Uni" == Ok("") (semantically) when present. They can't end up "/depth1" == Ok("Uni"). Conveniently, they are they only type that is intrinsically flattened. Also Unit fields (end up as syn::Type::Tuple, don't know why) aren't supported yet at all so we can decide what we do.

This runs a bit counter to the thoughts above, where depth > 0 (depth of the whole enum, not necessarily the active tree) would actually resolve to "/depth1" == Ok("Uni"). I'm confused why you're saying they can't end up like that. This would make them behave as normal enums already do in Miniconf (after all, a unit type enum is a normal enum).

@jordens
Copy link
Member Author

jordens commented Jul 26, 2024

"/depth1" == Ok("Uni")

/depth1 must be an internal node.
Because if there is /depth1/Var, then /depth1 by definition is internal.
And node type is statically known (transcode() and nodes() are associated).

@ryan-summers
Copy link
Member

I don't understand why though. Could it not be the case where it is a leaf node when depth1 = En1::Uni and an internal node when using a different enum variant?

@ryan-summers
Copy link
Member

ryan-summers commented Jul 26, 2024

Ah that's right, I forgot that nodes are statically defined because all paths are present during iteration even if they're not run-time available. That kind of makes complex enums a little weird in comparison to normal enums though, but I guess this makes sense given the depth parameter.

I guess that makes a unit enum variant somewhat similar to Option<T> = None, where the path exists, but nothing is there. Although it's slightly different in showing that the path exists with an empty value. There may be applications for that to end users, which is perhaps a meaningful distinction.

@jordens
Copy link
Member Author

jordens commented Jul 26, 2024

I guess that makes a unit enum variant somewhat similar to Option = None, where the path exists, but nothing is there.

#[tree(flatten)] // to kill the "/Some"
enum Option<T> {
    #[tree(skip)] None,
    #[tree(flatten)] Some(T), // to kill the "/0"
}

Option (enums with at most one non-unit/non-skip variant) is the only scenario where such an "untagged" enum (the outer flatten) does not lead to namespace collisions. Therefore I'd be cautious about an outer flatten in other situations.

@jordens
Copy link
Member Author

jordens commented Jul 27, 2024

The entire tag stuff might be implementable just fine using a custom handler. No need for proc macro machinery. And the implicit stuff does sound fragile on second reading.

@jordens
Copy link
Member Author

jordens commented Sep 23, 2024

I've added derive support for a MVP in #233

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants