-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCF with reserved namespace name cannot be decoded due to sanity check #101
Comments
I'm not an expert on Avro, but it looks like the producer wanted to produce that entity with a null namespace, but did so erroneously using
|
The issue is the type's According to the spec
The problem of using the primitive type name for named types is when there is a reference to this type
There is no way to tell if the |
https://github.com/confluentinc/kafka-connect-hdfs - not entirely under my control, and in any case already produced :). @zmstone your analysis, makes sense but what's your conclusion here ? Do you say that the fact that it's refusing to try to parse the OCF file is normal ? If I may offer another interesting piece of information: avro's Java implementation, running in a Flink job, can read that same file without any kind of problem. It seems to me that it would be better to match the de facto reference implementation, wouldn't it ?
Then wouldn't it be better to fail when that happens ("you used null and you have a type named edit: thank you both for your responsivness ! |
Yes, normal, if we allow primitive type name for named types, it may work for you, but will fail others.
Decoder will actually pick the primitive type to decode.
It's maybe 'de facto', but it's also a bug, which should be fixed by confluent. In the mean time (before confluent can fix it), there are workarounds:
|
Thanks for taking the time to reply to me.
It would be the Apache Avro project's responsibility, rather. Though I highly doubt they'll change something that would make an unknown count of existing avro files unreadable :). And if they did, they would provide a way to keep existing files such as mine here readable. Those files exist, and beyond the spec the purpose of a lib is to match how things are in reality. Let's say this was a CSV file reader lib rather than an avro file lib. Would this lib support only RFC4180 because that's the standard ? Probably not, because in practice many CSV files don't follow that spec -- it would make sense to make that lib useable to as many CSV use cases as possible.
I understand that risk (though the second one is improbable), and it would of course make sense to have a "strict" versus "relaxed" mode for anybody worried about this kind of situation. In any case, since the file doesn't follow the spec, is it really a problem how it fails ? It's an invalid file anyway. Might as well do everything you can and let the user take the risk. I'll implement my own OCF reader to circumvent this problem -- please understand my point of view though; it makes sense to have library also work in practice, not only in theory. |
Hi @jbruggem
Is this because confluent uses Apache Avro's Java library? Could you help me to understand from where exactly is your schema generated ?
Sure. |
I'm trying to decode an OCF with invalid values for the namespace:
Despite the namespace, erlavro would be able to decode it correctly except for a sanity check in avro_utils:579:
Which produces:
Should that verification be here when deserializing existing data ? Even if "null" is not a value accepted by the spec, erlavro works fine if we bypass that check; it is able to read the OCF file correctly.
I don't know the code base enough to understand if it makes sense from that point of view. From a user of the lib's point of view, on the other hand, it seems regrettable to prevent erlavro from being able to decode a file correctly when it's capable of doing it, if you see what I mean. Maybe there could be a "strict mode" option ?
If relevant, we are ready to work on a PR to improve the situation.
The text was updated successfully, but these errors were encountered: