Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for #[serde(flatten)] #223

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gootorov
Copy link

@gootorov gootorov commented Mar 3, 2021

Hi!

We have a use case for serializing maps using your library.

Consider the following structure:

struct Inner {
    inner_a: i32,
    inner_b: i32,
}

struct Middle {
    #[serde(flatten)]
    inner: Inner,
    middle_a: i32,
    middle_b: i32,
}

struct Outer {
    #[serde(flatten)]
    middle: Middle,
    outer_a: i32,
    outer_b: i32,
}

initialized as

let outer = Outer {
    middle: Middle {
        inner: Inner {
            inner_a: 0,
            inner_b: 1,
        },
        middle_a: 2,
        middle_b: 3,
    },
    outer_a: 4,
    outer_b: 5,
};

The expected serialized data is

inner_a,inner_b,middle_a,middle_b,outer_a,outer_b
0,1,2,3,4,5

I.e. a completely flat structure. There is no limit to the depth/width or anything else using the current solution. This particular example is added as a unit test in the PR.
Additionally, serialization inside the "flattened" structures works as expected - for example, we can achieve the desired behavior mentioned in #197 (comment).
We also avoid the workaround mentioned in #98 (comment), as it is very tedious, especially if the structure is big and/or deep.

Serializing fields which are HashMaps will still lead to a runtime error, however serializing HashMaps which are marked by #[serde(flatten)] will result in a flat structure (as long as both the key and the value in the HashMap are scalars).

If there is any room for improvement and/or changes that are needed to be done -- please let me know, I'll be happy to update the PR.

@gootorov
Copy link
Author

gootorov commented Mar 4, 2021

cc @amrhassan, have a look if this covers your use cases. This works as expected for the use case you mentioned in #197 (comment), but perhaps you have other use cases you could try this on.

@gootorov
Copy link
Author

@BurntSushi, have you had a chance to have a look at this? Is there anything stopping from merging?

@BurntSushi
Copy link
Owner

@gootorov No, I haven't had a chance, sorry. The problem is that I'm short on time and serde changes to this library are incredibly complicated and usually have unintended effects. So in order me to accept this, I'd probably have to spend at least a couple of hours re-contextualizing serde semantics and making sure that the behavior we add here is something that we can live with going forward. If a mistake in the public API gets made, then we either have to live with it or release a csv 2, and I don't really want to release a csv 2 for at least a few more years (if that).

And on top of all of that, serde(flatten) is particularly complex.

@m-lima
Copy link

m-lima commented Oct 5, 2021

I am really looking forward to have #[serde(flatten)] support added to this crate. And I really appreciate you taking a jab at it :)
Though, for a robust solution and with pedantry aside, I think I should mention this:

I might be mistaken, but this doesn't seem to "add support for #[serde(flatten)]" as it would not simply work as expected. What I mean is that a Reader without headers would not be able to deserialize entries.

For example:

#[derive(Debug, serde::Serialize, serde::Deserialize)]
struct Inner {
    inner: u32,
}

#[derive(Debug, serde::Serialize, serde::Deserialize)]
struct Outer {
    #[serde(flatten)]
    inner: Inner,
    outer: u32,
}

let input = "0,2\n4,7";

let mut reader = csv::ReaderBuilder::new()
    .has_headers(false)
    .from_reader(input.as_bytes());

for r in reader.deserialize::<Outer>() {
    println!("{:?}", r);
}

The same would work with has_headers(true), though.
Please, correct me if I'm wrong.

I know that you focused on the serializer part. Maybe, then, this PR should re-scope itself to reflect that? There's more to go for flatten support, IMO. And scoping it to what it is trying to achieve may get other contributors to chip in with the other facets that need to be tackled.

@SuperCuber
Copy link

I encountered this today and would love to see this implemented, and I support re-scoping the PR to serializing only (and return an error on the non-header deserialize example)

@mhvelplund
Copy link

I tried the version in the head of the branch, and it worked swimmingly for serialization (my only requirement).

@qrilka
Copy link

qrilka commented Mar 25, 2022

Tried to use 1.1.6 with nested records but failed with value type error ("1" was in a string field), will try to check HEAD and get a test case later

@lukesneeringer
Copy link

lukesneeringer commented Apr 5, 2022

I too tried the version at the head of this branch, and it worked perfectly for my use case. Furthermore, I couldn't have accomplished what I needed without it, because serialize_struct (which would otherwise work) requires the keys to be &'static str, and I compute my key names at runtime.

Given that this PR is 13 months old, it'd sure be nice to see it in the "true" crate.

@silverjam
Copy link

This works for our use case as well

Copy link
Contributor

@Ten0 Ten0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using this for a while and having a fork is starting to bother me ^^

Other than that LGTM.

Alternately to the proposed solutions in the comment it would also be somewhat correct (although probably not ideal) to have these functions return errors and saying we don't support serializing maps out of serialize_entry.

Alternately again, we could decide that the if !matches!(self.state, /* the new sate for serialize_value */) { is inexpensive enough that it's not worth the code duplication and remove the serialize_entry call.

}

fn serialize_value<T: ?Sized + Serialize>(
&mut self,
_value: &T,
) -> Result<(), Self::Error> {
unreachable!()
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of these is required, and calling serialize_key then serialize_value should produce the same results as calling serialize_entry, as documented here: https://docs.rs/serde/latest/serde/ser/trait.SerializeMap.html#method.serialize_entry
Shouldn't this do:

Suggested change
Ok(())
value.serialize(&mut **self)

The serialize_entry overload can then be removed, as it's equivalent to this.

fn serialize_key<T: ?Sized + Serialize>(
&mut self,
_key: &T,
) -> Result<(), Self::Error> {
unreachable!()
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of these is required, and calling serialize_key then serialize_value should produce the same results as calling serialize_entry, as documented here: https://docs.rs/serde/latest/serde/ser/trait.SerializeMap.html#method.serialize_entry
Shouldn't this do:

Suggested change
Ok(())
let old_state =
mem::replace(&mut self.state, HeaderState::EncounteredStructField);
if let HeaderState::ErrorIfWrite(err) = old_state {
return Err(err);
}
let mut serializer = SeRecord {
wtr: self.wtr,
};
key.serialize(&mut serializer)?;
self.state = /* some new state representing this */;
Ok(())

}

fn serialize_value<T: ?Sized + Serialize>(
&mut self,
_value: &T,
) -> Result<(), Self::Error> {
unreachable!()
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation of these is required, and calling serialize_key then serialize_value should produce the same results as calling serialize_entry, as documented here: https://docs.rs/serde/latest/serde/ser/trait.SerializeMap.html#method.serialize_entry
Shouldn't this do:

Suggested change
Ok(())
if !matches!(self.state, /* the new sate for serialize_value */) {
return Err(/* some error of incorrect impl of Serialize for the container */);
}
// Check that there aren't any containers in the value.
self.state = HeaderState::InStructField;
value.serialize(&mut **self)?;
self.state = HeaderState::EncounteredStructField;
Ok(())

@qrilka
Copy link

qrilka commented May 20, 2022

qrilka@0183724 shows the problem we've seen with the current master, this results in

---- deserializer::tests::flatten_mix_types stdout ----
thread 'deserializer::tests::flatten_mix_types' panicked at 'called `Result::unwrap()` on an `Err` value: Error(Deserialize { pos: None, err: DeserializeError { field: None, kind: Message("invalid type: integer `1`, expected a string") } })', src/deserializer.rs:1254:58
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@arcstur
Copy link

arcstur commented Dec 23, 2022

The problem with this PR

This doesn't just "add support for #[serde(flatten)]", it just makes serializing maps work without addressing the concerns @BurntSushi commented on #98. For example, consider the following test:

#[test]
    fn struct_with_hash_map() {
        #[derive(Clone, Serialize)]
        struct MyMap {
            map: std::collections::HashMap<String, String>,
        }

        let mut map = std::collections::HashMap::new();
        map.insert("First".to_string(), "abc".to_string());
        map.insert("Second".to_string(), "def".to_string());
        map.insert("Third".to_string(), "ghi".to_string());
        let outer = MyMap { map };

        let (wrote, got) = serialize_header(outer.clone());
        assert!(wrote);
        assert_eq!(got, "map,First,Second,Third");

        let got = serialize(outer);
        assert_eq!(got, "abc,def,ghi\n");
    }

First of all, it writes 4 columns and 3 values (using the code from this PR). Second, it sometimes works and sometimes fail!! That's because the ordering of the HashMap is not defined, so it's random.
I think this should be closed.

The problem with serde flatten

I tried understanding how serde flatten works, and as we can see here it works with HashMaps. Under the hood, it seems that it uses maps to flatten out structs. I used cargo-expand to expand the Serialize derive macro on flattened structs, and it uses serialize_map instead of serialize_struct.
However, flattening a struct (not a HashMap) has a well-defined order and number of columns, but since
serde transforms it into a map, I think the csv crate can't know that.

The current working solution

As of now, I'm using the solution that @dtolnay described on issue #98, here, which is to implement Serialize ourselves.

The future

However, supporting maps can be useful for this crate, and if done correctly it will make serde flatten work.
I think that ordering of columns should be deterministic, and maybe for serializing maps
the user needs to explicitly point that order? Maybe a WriterBuilder attribute?

Edit: added Rust syntax highlighting; added cargo-expand info.

@Ten0
Copy link
Contributor

Ten0 commented Dec 23, 2022

First of all, it writes 4 columns and 3 values (using the code from this PR). Second, it sometimes works and sometimes fail!! That's because the ordering of the HashMap is not defined, so it's random.
I think this should be closed.

That doesn't seem like a blocker for implementing this feature: it looks like we could just cross-check the provided map keys with the serialized headers. Probably the same applies for structs since there is the skip_serializing_if attribute.

and as we can see here [#[serde(flatten)]] works with HashMaps

It works with FlatMapDeserializer which does not involve HashMaps. (https://github.com/serde-rs/serde/blob/0daafe423f2a3400e47e430c13e3209d606edc0f/serde_derive/src/de.rs#L2583)

I find that I much more typically flatten structs than HashMaps, like this, so typically there is no HashMap involved.

Afaict the blocker is #223 (comment).

If that helps in any way I could give a go at implementing this feature, checking on any necessary consistency constraint. I have worked quite a lot on serde's internals over the past months (latest in date being this crate first released less than 24 hours ago), so I'm fairly confident I would end up with something reasonable. But that is not time I'm willing to spend if there's not going to be any bandwidth for a review.

I suspect that may also be the case of the original author:

If there is any room for improvement and/or changes that are needed to be done -- please let me know, I'll be happy to update the PR.


I think that ordering of columns should be deterministic, and maybe for serializing maps
the user needs to explicitly point that order?

IMO we can leave that to the user, at least until there's enough additional demand : we can check it inexpensively, but reordering probably implies complexity and overhead that could well be implemented outside, by e.g. having the user use a btreemap, or seed their HashMaps the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants