Deserializing enums fails at run-time #211

hniksic · 2020-10-13T09:12:52Z

What version of the `csv` crate are you using?

1.1.3

Briefly describe the question, bug or feature request.

I can't get csv/serde to deserialize an enum.

I am processing a CSV file whose records store several different variants of data. It has a discriminator field and columns specific to each variant (which are empty when the discriminator shows the other variant), as well as some common columns, for example:

EventType,C,A1,B1
A,1,2,
B,3,,4

EventType is the discriminator that discriminates between two variants, A and B. C is the field common to both variants, and A1 and B1 are fields belonging to variants A and B respectively.

I would like to deserialize those rows Rust enums, something like:

enum Event {
    A(AEvent),
    B(BEvent),
}

struct AEvent {
    C: u32,
    A1: u32,
}

struct BEvent {
    C: u32,
    B1: u32,
}

...but I can't get that to successfully run with serde/csv using the code below (or variants thereof that I tested).

Include a complete program demonstrating a problem.

use serde::Deserialize;

#[derive(Debug, Deserialize)]
#[serde(tag = "EventType")]
enum Event {
    A(AEvent),
    B(BEvent),
}

#[allow(non_snake_case)]
#[derive(Debug, Deserialize)]
struct AEvent {
    C: u32,
    A1: u32,
}

#[allow(non_snake_case)]
#[derive(Debug, Deserialize)]
struct BEvent {
    C: u32,
    B1: u32,
}

fn main() {
    let src = std::io::Cursor::new(
        r#"EventType,C,A1,B1
A,1,2,
B,3,,4
"#,
    );
    let mut reciter = csv::ReaderBuilder::new()
        .from_reader(src)
        .into_deserialize::<Event>();
    dbg!(reciter.next());
    dbg!(reciter.next());
}

What is the observed behavior of the code above?

The program outputs errors for both fields, the error message being "invalid type: string \"A\", expected internally tagged enum". Full output:

[src/main.rs:34] reciter.next() = Some(
    Err(
        Error(
            Deserialize {
                pos: Some(
                    Position {
                        byte: 18,
                        line: 2,
                        record: 1,
                    },
                ),
                err: DeserializeError {
                    field: None,
                    kind: Message(
                        "invalid type: string \"A\", expected internally tagged enum",
                    ),
                },
            },
        ),
    ),
)
[src/main.rs:35] reciter.next() = Some(
    Err(
        Error(
            Deserialize {
                pos: Some(
                    Position {
                        byte: 25,
                        line: 3,
                        record: 2,
                    },
                ),
                err: DeserializeError {
                    field: None,
                    kind: Message(
                        "invalid type: string \"B\", expected internally tagged enum",
                    ),
                },
            },
        ),
    ),
)

What is the expected or desired behavior of the code above?

It should output the deserialized records, something like:

reciter.next() = A(
    AEvent {
        C: 1,
        A1: 2,
    },
)
reciter.next() = B(
    BEvent {
        C: 3,
        B1: 4,
    },
)

The text was updated successfully, but these errors were encountered:

BurntSushi · 2020-10-13T09:35:56Z

I don't see a way to do this. The only enum support offered is at the level of an individual field. There is no record level enum support like you want here and I don't really see a way to add it. You'll have to do this translation manually.

(In general, Serde and csv don't really fit together well. The existing support is a hodge podge of things with a bunch of arbitrary-seeming limitations. The main problem is that csv has no types and is not a recursive format.)

zeenix · 2021-05-02T16:20:54Z

I don't see a way to do this.

That's very understandable but if you could add this as a warning in the docs, that would be great. I spent a hour trying to find what I'm doing wrong and given that internally-tagged enums are quite common place in the serde world, I'm sure I'm not the only one who will hit this.

BurntSushi · 2021-05-03T10:16:27Z

@zeenix Could you perhaps propose a wording/location (best through a PR) where this warning would have save you time?

zeenix · 2021-05-03T10:53:34Z

@zeenix Could you perhaps propose a wording/location (best through a PR) where this warning would have save you time?

Sure, I'll add it to my TODO for the week. :)

zeenix · 2021-05-03T11:08:52Z

@BurntSushi here you go: #231

It is unfortunate that both [1] and [2] conspire to make this code way worse than it could otherwise be with a saner (de)serialization format. We both need to introduce `TransactionRecord` due to tagged enums not being powerful enough in CSV, and make its `amount` field optional to deal with the varying number of fields for each kind of transaction. [1]: BurntSushi/rust-csv#211 [2]: BurntSushi/rust-csv#172

zeenix · 2023-03-03T20:39:56Z

@zeenix Could you perhaps propose a wording/location (best through a PR) where this warning would have save you time?

It's been 2 years since I provided the PR but you didn't review. 😔

The0x539 · 2023-09-02T21:42:47Z

You'll have to do this translation manually.

I have no idea if there's a good way to do something along these lines in csv::deserializer, but this Deserialize impl should at least be a usable workaround for downstream users.
This approach has some limitations compared to what csv can do when deserializing a record into a plain struct, but it beats the brick wall presented by the status quo.

wiktor-k · 2024-05-22T12:45:25Z

I have no idea if there's a good way to do something along these lines in csv::deserializer, but this Deserialize impl should at least be a usable workaround for downstream users.

Thanks for the idea it looks quite simple! I've been previously looking at deserializing it as a map which would allow the discriminant field to be on any position.

BurntSushi closed this as completed Oct 13, 2020

This comment was marked as duplicate.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserializing enums fails at run-time #211

Deserializing enums fails at run-time #211

hniksic commented Oct 13, 2020

BurntSushi commented Oct 13, 2020 •

edited

Loading

zeenix commented May 2, 2021

BurntSushi commented May 3, 2021

zeenix commented May 3, 2021

zeenix commented May 3, 2021

This comment was marked as duplicate.

zeenix commented Mar 3, 2023

The0x539 commented Sep 2, 2023

wiktor-k commented May 22, 2024

Deserializing enums fails at run-time #211

Deserializing enums fails at run-time #211

Comments

hniksic commented Oct 13, 2020

What version of the csv crate are you using?

Briefly describe the question, bug or feature request.

Include a complete program demonstrating a problem.

What is the observed behavior of the code above?

What is the expected or desired behavior of the code above?

BurntSushi commented Oct 13, 2020 • edited Loading

zeenix commented May 2, 2021

BurntSushi commented May 3, 2021

zeenix commented May 3, 2021

zeenix commented May 3, 2021

This comment was marked as duplicate.

zeenix commented Mar 3, 2023

The0x539 commented Sep 2, 2023

wiktor-k commented May 22, 2024

What version of the `csv` crate are you using?

BurntSushi commented Oct 13, 2020 •

edited

Loading