Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circe throws an exception when parsing nested JSONB field that is null. #2123

Open
slavaschmidt opened this issue Oct 28, 2024 · 10 comments
Open

Comments

@slavaschmidt
Copy link

If a query returns a nested structure from a JSON(b) field, the empty field must be absent for the query to work. If the field is present and null, there is a circe exception thrown.

Example query:

sql"SELECT address->'coordinates' FROM profile"

Definition of the Coordinates and Address types:

case class Coordinates(latitude: Double, longitude: Double)
case class Address(coordinates: Option[Coordinates])

Definition of the custom mapping:

given Meta[Coordinates] = new Meta(pgDecoderGet, pgEncoderPut)
given Meta[Address] = new Meta(pgDecoderGet, pgEncoderPut)

The query works for the following contents of the address column:
NULL
{}

The query throws and exception for the following contents of the address column:
{"coordinates":null}
The exception message is: io.circe.DecodingFailure$DecodingFailureImpl: DecodingFailure at : Got value 'null' with wrong type, expecting object

@jatcwang
Copy link
Collaborator

In Postgres a JSON null and SQL null are different, so you probably need to cast it to a SQL null in the case of {"coordinates":null}. I think using ->> instead of -> may be what you need?

https://mbork.pl/2020-02-15_PostgreSQL_and_null_values_in_jsonb

If it still doesn't work for you, can you help by provided a runnable minimal reproduction of the issue?

@slavaschmidt
Copy link
Author

Thank you for your suggestions, @jatcwang.

Yep, I'm aware of the differences in arrows.

In this simple case, the ->> would solve the issue for null, but it will create an issue for not null. For example, in the case of nested objects like the following:
{"address": {"coordinates":{"latitude": 10, "longitude":10}}} query address->>'coordinates' will fail to convert to Coordinates. The reason is that the ->> converts the result to TEXT, so we won't have JSON anymore, leading to different kind of exception.

@slavaschmidt
Copy link
Author

If it still doesn't work for you, can you help by provided a runnable minimal reproduction of the issue?

Here we go

doobie-issue.zip

@jatcwang
Copy link
Collaborator

Thanks @slavaschmidt can you try something like profile->>'address'->>'coordinates' :: jsonb? That way doobie only sees a not-null JSONB or SQL NULL.

Ideally the circe integration will be able to handle top level JSON nulls like your examples, but will need to think about it more because Read[Option[A]] does not / should not know anything about the underlying implementation of Read[A].

@slavaschmidt
Copy link
Author

Thanks @slavaschmidt can you try something like profile->>'address'->>'coordinates' :: jsonb? That way, doobie only sees a not-null JSONB or SQL NULL.

Thanks @jatcwang, unfortunately, this won't work. The first double-arrow returns TEXT, so the next double-arrow can't be applied. I checked it to be sure and the exception is: org.postgresql.util.PSQLException: ERROR: operator does not exist: text ->> unknown

@jatcwang
Copy link
Collaborator

Ah I see sorry. Using #>> operator seems to give the expected answer for all 4 of your examples

e.g. I replaced line 53 in your example with
brokenNull <- sql"SELECT (profile #>> '{address,coordinates}') :: jsonb FROM customer".query[Option[Coordinates]].to[List]

and it yields List(Some(Coordinates(Some(1),Some(2))), None, None, None)

@slavaschmidt
Copy link
Author

The workaround works. Thank you very much for your help! Still, I believe it would be cool if JSON null would be interpreted in the same way as SQL NULL as an empty Option and not throw an exception.

@satorg
Copy link
Contributor

satorg commented Oct 29, 2024

@slavaschmidt ,

Another quick workaround that allows using

SELECT profile->'address'->'coordinates' 

is to split the data processing into two steps: query it as Option[Json] first, then decode it as Option[Coordinates] afterwards:

sql"SELECT profile->'address'->'coordinates' FROM customer WHERE id = <any ID from the example>"
  .query[Option[Json]]
  .unique
  .flatMap(_.flatTraverse {
    _.as[Option[Coordinates]].liftTo[ConnectionIO]
  })

This emits correct results regardless of whether the query returns SQL NULL or JSON NULL.

@slavaschmidt
Copy link
Author

slavaschmidt commented Oct 29, 2024

Another quick workaround

Hi @satorg, thanks for the hint! This is precisely where I'm coming from.

The issue is that with dozens of tables and lots of JSON columns, there are currently hundreds of LOCs dedicated to this manual parsing approach in our code base. Not only that. When a query returns multiple JSON columns, one must either work with tuples, which affects readability or create two different case classes, one with JSON and another with correctly typed fields.

Using custom mappings elegantly solves both of these deficiencies.

@satorg
Copy link
Contributor

satorg commented Oct 29, 2024

@jatcwang , the above workaround can be converted into a more generic solution.

In order to accomplish that we need a constructor for Read instead of Get straight out of Decoder, i.e. something like the following:

def pgDecoderReadOption[A: Decoder]: Read[Option[A]] = {
  val getJson = Get[Json]
  new Read(
    // To be honest, I cannot understand why do we need to store `Get` in `Read`.
    // Doobie does it everywhere but doesn't seem using the stored value.
    List((getJson, Nullability.Nullable)),
    { (rs, n) =>
      getJson
        .unsafeGetNullable(rs, n)
        .flatTraverse(_.as[Option[A]])
        // Unfortunately, cannot avoid `throw` from `Read` ¯\_(ツ)_/¯
        .valueOr(throw _)
    }
  )
}

Having such a constructor, a user can create their our type specific Read implicit:

implicit val coordOptRead: Read[Option[Coordinates]] = pgDecoderReadOption[Coordinates]

And now the original query from the example works for any id as well:

sql"SELECT profile->'address'->'coordinates' FROM customer WHERE id = <any ID from the example>"
  .query[Option[Coordinates]]
  .unique

Unfortunately, we cannot make pgDecoderReadOption implicit itself – it would create a lot of type collisions if we did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants