Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avro does not respect default values defined in schema #416

Open
basimons opened this issue Dec 4, 2023 · 8 comments
Open

Avro does not respect default values defined in schema #416

basimons opened this issue Dec 4, 2023 · 8 comments
Labels

Comments

@basimons
Copy link

basimons commented Dec 4, 2023

Hello,

I encountered something strange while doing some tests with the avro decoding.

Example here, was ran in version 2.16.0:

 String avroWithDefault = """
        {
        "type": "record",
        "name": "Employee",
        "fields": [
         {"name": "name", "type": ["string", "null"], "default" : "bram"},
         {"name": "age", "type": "int"},
         {"name": "emails", "type": {"type": "array", "items": "string"}},
         {"name": "boss", "type": ["Employee","null"]}
        ]}
        """;

// Notice no name field
String employeeJson = """
{
    "age" : 26,
    "emails" : ["[email protected]"],
    "boss" : {
         "name" : "test",
         "age" : 33,
         "emails" : ["[email protected]"]
    }
}
""";

SchemaFormat schema = new AvroMapper().schemaFrom(avroWithDefault);
JsonNode jsonObject = new ObjectMapper().reader().readTree(payload);
byte[] objectAsBytes = new AvroMapper().writer().with(formatSchema).writeValueAsBytes(jsonObject);

// Decode it again
JsonNode decodedObject = new AvroMapper().reader(schema).readTree(payload);

System.out.println(decodedObject.toString());

If you look at this object you see that the default value is not filled. It is just a null, all the other fields are filled just as expected. I tried this with different schemas and not having a union with a null, but just the default, but that would result in a JsonMappingException.

Am I doing something wrong here, or is this not supported? It doesn't say that it does not support default values like it says in the protobuffer one.

Thanks in advance

EDIT: This makes sense that it does not work, as you cannot write a AVRO file with a default without a value for it. I think it should've thrown an error on writing. But the main question is why it doesn't work with a reading schema that has a default, but a writing schema that does have one. See my other question.

@cowtowncoder
Copy link
Member

I think this is not supported, at least with Jackson's native Avro read implementation. Apache Avro-lib -backed variant, while slower, might handle default values correctly.

As to how to enable Apache Avro lib backend, I think there are unit tests that do that.

I agree, it'd be good to document this gap.

@basimons
Copy link
Author

basimons commented Dec 5, 2023

Thanks for your response.

I tried looking for a unit test, but I couldn't find one. I did however find the ApacheAvroparserImpl. When I implemented it like this:

  try (AvroParser parser =new ApacheAvroFactory(new AvroMapper()).createParser(payload)) {
            parser.setSchema(schema);
            
            TreeNode treeNode = parser.readValueAsTree();
            System.out.println(treeNode);
        };

It does not work unfortunately (as in no default values). Am I doing it correctly or should I also use a different codec?

@basimons
Copy link
Author

basimons commented Dec 5, 2023

I made some changes, as of course the code that I showed in my first message does not fully make sense. You cannot not write a value, even if it has a default. So I changed it to this:

 String writingSchema = """
        {
        "type": "record",
        "name": "Employee",
        "fields": [
         {"name": "age", "type": "int"},
         {"name": "emails", "type": {"type": "array", "items": "string"}},
         {"name": "boss", "type": ["Employee","null"]}
        ]}
        """;

        String readingSchema = """
        {
        "type": "record",
        "name": "Employee",
        "fields": [
         {"name": "name", "type": ["string", "null"], "default" : "bram"},
         {"name": "age", "type": "int"},
         {"name": "emails", "type": {"type": "array", "items": "string"}},
         {"name": "boss", "type": ["Employee","null"]}
        ]}
        """;


        String employeeJson = """
            {
                "age" : 26,
                "emails" : ["[email protected]", "[email protected]"],
                "boss" : {
                    "age" : 33,
                    "emails" : ["[email protected]"]
                }
            }
            """;

When I do this, when I read the values, I get the following exception: java.io.IOException: Invalid Union index (26); union only has 2 types. Which is the same as reported here: #164

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 5, 2023

The only other note I have is that this:

new ApacheAvroFactory(new AvroMapper()).

is wrong way around: it should be

new AvroMapper(new ApacheAvroFactory)

to have correct linking; and then you should be able to create ObjectReader / ObjectWriter through which you can assign schema.

But I suspect that won't change things too much: you should either way have ApacheAvroFactory that is using Apache Avro lib.

@basimons
Copy link
Author

basimons commented Dec 6, 2023

Ah thanks, didn't know that. I tried it, but as you said it did indeed not work.

Whats weird, I even tried decoding it with the apache avro library myself. I just used GenericDatumReader (and all things that come with it), but I would get exactly the same error. This does not make sense right? As I'm sure that what I'm doing is allowed by Avro (adding a default field in a reader schema, that is not in the write schema), as I have done it many times in my Kafka cluster.

Do you happen to know what the difference might be? Do my Kafka clients do anything special for this?

@basimons
Copy link
Author

basimons commented Dec 6, 2023

I finally get it. In your kafka cluster it saves the writing schema with it. If you parse it like this:

  Schema avroSchema = ((AvroSchema) schema).getAvroSchema();
        GenericDatumReader<GenericRecord> objectGenericDatumReader = new GenericDatumReader<>(writingschema, avroSchema);

BinaryDecoder binaryDecoder = DecoderFactory.get().binaryDecoder(payload, null);
GenericRecord read = objectGenericDatumReader.read(null, binaryDecoder);

So with the specific writer schema.

It does work. Normally kafka does it this way for you, but I don't think the AvroMapper has a way do to it with 2 schemas.

@cowtowncoder
Copy link
Member

cowtowncoder commented Dec 6, 2023

@basimons Avro module does indeed allow use of 2 schema (read/write) configuration -- it's been a while so I'll have to see how it was done. I think AvroMapper has methods to construct Jackson AvroSchema from 2 separate schemas.

@cowtowncoder
Copy link
Member

Ah. Close: AvroSchema has method withReaderSchema(AvroSchema rs) where you get both schema instances, then call method on "writer schema" (one used on writing records). From ArrayEvolutionTest:

        final AvroSchema srcSchema = MAPPER.schemaFrom(SCHEMA_XY_ARRAY_JSON);
        final AvroSchema dstSchema = MAPPER.schemaFrom(SCHEMA_XYZ_ARRAY_JSON);
        final AvroSchema xlate = srcSchema.withReaderSchema(dstSchema);

and then you construct ObjectReader as usual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants