Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8501: Json Conversion UDF Not Respecting System JSON Options #2921

Merged
merged 3 commits into from
Jul 9, 2024

Conversation

cgivre
Copy link
Contributor

@cgivre cgivre commented Jul 4, 2024

DRILL-8501: Json Conversion UDF Not Respecting System JSON Options

Description

The convert_fromJSON() function was ignoring Drill system configuration variables for reading JSON. This PR adds support for allTextMode and readNumbersAsDouble to this function. Once merged, the convert_fromJSON() function will follow the system settings.

I also split one of the unit test files because it had all the UDF tests mixed with NaN tests.

Documentation

This PR adds the possibility of setting allTextMode and readNumbersAsDouble in the function call itself. The new usage is:

SELECT convert_fromJSON(<field>, <allTextMode>, <readNumbersAsDouble>)

Thus you could write a query:

SELECT convert_fromJSON(jsonField, false, true) FROM dfs.somedata

Testing

Added unit tests.

@cgivre cgivre added bug minor-update backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Jul 4, 2024
@cgivre cgivre self-assigned this Jul 4, 2024
@jnturton
Copy link
Contributor

jnturton commented Jul 6, 2024

Before I approve, did you consider making these JSON parsing settings parameters of the function itself? It feels odd to me that store.json.* settings could influence UDFs too. I'm not sure why they aren't storage plugin config, rather than global config, in the first place...

@cgivre
Copy link
Contributor Author

cgivre commented Jul 7, 2024

Before I approve, did you consider making these JSON parsing settings parameters of the function itself? It feels odd to me that store.json.* settings could influence UDFs too. I'm not sure why they aren't storage plugin config, rather than global config, in the first place...

@jnturton I thought about doing exactly what you're describing. Here's the thing. We started some work a while ago to get rid of all the non-EVF2 readers in Drill. It turns out that there are a few places which still use the old non-EVF JSON reader. Specifically, this UDF, the Druid Storage Plugin and the MongoDB storage plugin. I started work on Drill-8316 which addresses the Druid plugin and Drill-8329 addresses converting the UDF. Neither one of these were a high priority so they're kind of sitting at the moment.

I agree with your premise that the whole idea of having global settings for file formats (including parquet) is not the best idea.

Give me a day or two and I can add additional params to the UDF so that you can just change those settings in the query.

Copy link
Contributor

@jnturton jnturton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if this UDF is going to make use of the JsonReader (and perhaps one day it should stop doing that and just parse JSON itself using some lightweight parser) then it may as well be the JsonReader as configured by store.json.*.

@cgivre cgivre added the doc-impacting PRs that affect the documentation label Jul 7, 2024
@cgivre
Copy link
Contributor Author

cgivre commented Jul 7, 2024

@jnturton I added new versions of the UDF so that the user can specify in the function call whether they want allTextMode and the other option.

@jnturton
Copy link
Contributor

jnturton commented Jul 9, 2024

Oh that's great, thanks for the enhancements +1.

@cgivre cgivre merged commit d44aa98 into apache:master Jul 9, 2024
8 checks passed
@cgivre cgivre deleted the json_conversion_options_fix branch July 9, 2024 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there bug code-cleanup doc-impacting PRs that affect the documentation minor-update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants