Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: please add invalid_result deserializer #345

Open
lucatrv opened this issue Dec 19, 2023 · 7 comments · May be fixed by #356
Open

Feature request: please add invalid_result deserializer #345

lucatrv opened this issue Dec 19, 2023 · 7 comments · May be fixed by #356

Comments

@lucatrv
Copy link

lucatrv commented Dec 19, 2023

When dealing with invalid data with serde, sometimes it is useful to get also invalid fields when present. The current invalid_option deserializer discards all invalid fields. It would be convenient to have a similar invalid_result deserializer, which returns Ok(value) for all valid fields, and Err(string_field) for all invalid fields.

@BurntSushi
Copy link
Owner

If this is possible to do in a small patch, like adding a new invalid_result free function, then I'd be happy to take a PR.

@lucatrv
Copy link
Author

lucatrv commented Dec 19, 2023

I could create a PR when I have some time... do you see any other need apart from adding the invalid_result function and document it?

@lucatrv
Copy link
Author

lucatrv commented Dec 22, 2023

@BurntSushi, for now I implemented the following function:

pub fn invalid_result<'de, D, T>(
    de: D,
) -> result::Result<std::result::Result<T, String>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    Ok(T::deserialize(de).or_else(|err| Err(err.to_string())))
}

This works but reports the error message, not the original invalid string value as it should be. So for instance for good values, empty strings, or invalid strings, I am getting respectively:

Record { latitude: 60.5544444, longitude: -151.2583333, population: Ok(7610), city: "Kenai", state: "AK" }

Record { latitude: 65.2419444, longitude: -165.2716667, population: Err("field 2: cannot parse integer from empty string"), city: "Davidsons Landing", state: "AK" }

Record { latitude: 37.3433333, longitude: -86.7136111, population: Err("field 2: invalid digit found in string"), city: "Flint Springs", state: "KY" }

Do you know how to get the original invalid string value from within the function?

@lucatrv
Copy link
Author

lucatrv commented Jan 2, 2024

@BurntSushi, sorry to bother you but I've been trying to implement this for a while without success, and I can't really figure out how to solve this. I'm now wondering if it is feasible at all... should the Error type be extended to report the original csv string? If you can give me some hints on how to modify the function that I reported above (if feasible) then I will complete my PR and issue it. Thanks

@lucatrv
Copy link
Author

lucatrv commented Jan 19, 2024

@BurntSushi I finally got two working functions. The first one is simpler and relies on Content.as_str() for String conversion, thus it can fail. In the second function instead I manually managed each Content variant, applying lossy string conversion when appropriate, so it never fails. However I'm not sure if I managed correctly the None / Some / Unit / Newtype / Seq / Map variants.

pub fn invalid_result<'de, D, T>(
    de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    // Need to use `Content` and `ContentRefDeserializer`, which currently are in
    // Serde private API but may become public in the future, see:
    // https://github.com/serde-rs/serde/issues/741
    // https://github.com/arcnmx/serde-value/issues/16
    let content = serde::__private::de::Content::deserialize(de)?;
    let de = serde::__private::de::ContentRefDeserializer::<D::Error>::new(
        &content,
    );
    if let Ok(t) = T::deserialize(de) {
        Ok(Ok(t))
    } else if let Some(s) = content.as_str() {
        Ok(Err(s.to_owned()))
    } else {
        Err(serde::de::Error::custom("input data is not a valid UTF-8 string"))
    }
}
pub fn invalid_result<'de, D, T>(
    de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    // Need to use `Content` and `ContentRefDeserializer`, which currently are in
    // Serde private API but may become public in the future, see:
    // https://github.com/serde-rs/serde/issues/741
    // https://github.com/arcnmx/serde-value/issues/16
    let content = serde::__private::de::Content::deserialize(de)?;
    let de = serde::__private::de::ContentRefDeserializer::<D::Error>::new(
        &content,
    );
    let result = if let Ok(t) = T::deserialize(de) {
        Ok(t)
    } else {
        Err(match content {
            serde::__private::de::Content::Bool(b) => b.to_string(),
            serde::__private::de::Content::U8(u) => u.to_string(),
            serde::__private::de::Content::U16(u) => u.to_string(),
            serde::__private::de::Content::U32(u) => u.to_string(),
            serde::__private::de::Content::U64(u) => u.to_string(),
            serde::__private::de::Content::I8(i) => i.to_string(),
            serde::__private::de::Content::I16(i) => i.to_string(),
            serde::__private::de::Content::I32(i) => i.to_string(),
            serde::__private::de::Content::I64(i) => i.to_string(),
            serde::__private::de::Content::F32(f) => f.to_string(),
            serde::__private::de::Content::F64(f) => f.to_string(),
            serde::__private::de::Content::Char(c) => c.to_string(),
            serde::__private::de::Content::String(s) => s,
            serde::__private::de::Content::Str(s) => s.to_owned(),
            serde::__private::de::Content::ByteBuf(buf) => {
                String::from_utf8_lossy(&buf).into_owned()
            }
            serde::__private::de::Content::Bytes(bytes) => {
                String::from_utf8_lossy(bytes).into_owned()
            }
            serde::__private::de::Content::None => String::new(),
            serde::__private::de::Content::Some(some) => {
                format!("{:?}", some)
            }
            serde::__private::de::Content::Unit => String::new(),
            serde::__private::de::Content::Newtype(newtype) => {
                format!("{:?}", newtype)
            }
            serde::__private::de::Content::Seq(seq) => format!("{:?}", seq),
            serde::__private::de::Content::Map(m) => format!("{:?}", m),
        })
    };
    Ok(result)
}

I prefer the second function, but please let me know your preference and advice, or otherwise if I should use serde-value as reported in the following docstring. After I get your answer I can create a PR.

/// Used from generated code to buffer the contents of the Deserializer when
/// deserializing untagged enums and internally tagged enums.
///
/// Not public API. Use serde-value instead.
#[derive(Debug, Clone)]
pub enum Content<'de> {

@lucatrv
Copy link
Author

lucatrv commented Feb 13, 2024

@BurntSushi, I think I found a better implementation. I hope you like it, the only drawback is that it requires Serde's derive feature, but I cannot find other valid solutions (either relying on Serde's private API, or on serde-value, or on Serde's derive feature). I am going to create a PR with this implementation.

pub fn invalid_result<'de, D, T, F>(
    de: D,
) -> result::Result<result::Result<T, F>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
    F: Deserialize<'de>,
{
    #[derive(Deserialize)]
    #[serde(untagged)]
    enum DeFallback<T, F> {
        Expected(T),
        Fallback(F),
    }

    DeFallback::<T, F>::deserialize(de).map(|d| match d {
        DeFallback::Expected(t) => Ok(t),
        DeFallback::Fallback(f) => Err(f),
    })
}

@lucatrv
Copy link
Author

lucatrv commented Feb 13, 2024

Never mind, the above code using Serde's derive feature does not work as expected. For instance if a column contains integers, strings and floats, and one tries to deserialize it to Result<i64, String> to retain all invalid values (floats and strings) as Strings, the deserialization would error out at the first float.

So at the moment the only two options that I could find are either relying on Serde's private API or on serde-value:

pub fn invalid_result_private<'de, D, T>(
    de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    let content = serde::__private::de::Content::deserialize(de)?;
    let de = serde::__private::de::ContentRefDeserializer::<D::Error>::new(
        &content,
    );
    let result = T::deserialize(de).map_err(|_| match content {
        serde::__private::de::Content::Bool(b) => b.to_string(),
        serde::__private::de::Content::U8(u) => u.to_string(),
        serde::__private::de::Content::U16(u) => u.to_string(),
        serde::__private::de::Content::U32(u) => u.to_string(),
        serde::__private::de::Content::U64(u) => u.to_string(),
        serde::__private::de::Content::I8(i) => i.to_string(),
        serde::__private::de::Content::I16(i) => i.to_string(),
        serde::__private::de::Content::I32(i) => i.to_string(),
        serde::__private::de::Content::I64(i) => i.to_string(),
        serde::__private::de::Content::F32(f) => f.to_string(),
        serde::__private::de::Content::F64(f) => f.to_string(),
        serde::__private::de::Content::Char(c) => c.to_string(),
        serde::__private::de::Content::String(s) => s,
        serde::__private::de::Content::Str(s) => s.to_owned(),
        serde::__private::de::Content::ByteBuf(buf) => {
            String::from_utf8_lossy(&buf).into_owned()
        }
        serde::__private::de::Content::Bytes(bytes) => {
            String::from_utf8_lossy(bytes).into_owned()
        }
        serde::__private::de::Content::None => String::new(),
        serde::__private::de::Content::Some(some) => {
            format!("{:?}", some)
        }
        serde::__private::de::Content::Unit => String::new(),
        serde::__private::de::Content::Newtype(newtype) => {
            format!("{:?}", newtype)
        }
        serde::__private::de::Content::Seq(seq) => format!("{:?}", seq),
        serde::__private::de::Content::Map(map) => format!("{:?}", map),
    });
    Ok(result)
}
pub fn invalid_result_serdevalue<'de, D, T>(
    de: D,
) -> result::Result<result::Result<T, String>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    let value = serde_value::Value::deserialize(de)?;
    let result = T::deserialize(value.clone()).map_err(|_| match value {
        serde_value::Value::Bool(b) => b.to_string(),
        serde_value::Value::U8(u) => u.to_string(),
        serde_value::Value::U16(u) => u.to_string(),
        serde_value::Value::U32(u) => u.to_string(),
        serde_value::Value::U64(u) => u.to_string(),
        serde_value::Value::I8(i) => i.to_string(),
        serde_value::Value::I16(i) => i.to_string(),
        serde_value::Value::I32(i) => i.to_string(),
        serde_value::Value::I64(i) => i.to_string(),
        serde_value::Value::F32(f) => f.to_string(),
        serde_value::Value::F64(f) => f.to_string(),
        serde_value::Value::Char(c) => c.to_string(),
        serde_value::Value::String(s) => s,
        serde_value::Value::Unit => String::new(),
        serde_value::Value::Option(option) => {
            format!("{:?}", option)
        }
        serde_value::Value::Newtype(newtype) => {
            format!("{:?}", newtype)
        }
        serde_value::Value::Seq(seq) => format!("{:?}", seq),
        serde_value::Value::Map(map) => format!("{:?}", map),
        serde_value::Value::Bytes(bytes) => {
            String::from_utf8_lossy(&bytes).into_owned()
        }
    });
    Ok(result)
}

lucatrv added a commit to lucatrv/rust-csv that referenced this issue Feb 26, 2024
@lucatrv lucatrv linked a pull request Feb 26, 2024 that will close this issue
lucatrv added a commit to lucatrv/rust-csv that referenced this issue Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants