-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow seed type casting #459
base: main
Are you sure you want to change the base?
Conversation
Apologizes for delay. I have verified that integration test worked with this change successfully. |
It's ok @moomindani, I just assumed there was other stuff. I just merged from main. |
csv_chunks = [{"test_column": "1.2345"}] | ||
model = {"name": "mock_model", "schema": "mock_schema"} | ||
column_mappings = [ColumnCsvMappingStrategy("test_column", agate.data_types.Text, "double")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add another case to test multiple mapping strategies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extended the test case to include different strategies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
[ | ||
"df", | ||
*[ | ||
f'withColumn("{mapping.column_name}", df.{mapping.column_name}.cast("{cast_value}"))' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withColumn works well for simple pattern, but if we have many withColumn, it easily causes StackOverflowException.
Do you think if this happens in real world use case? If yes, it will be safer to replace withColumn with select..as.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it depends, how many withColumns would cause a SO? If it's over 100 then I'd say it's not realistic for any of the use cases we're handling. However, I can look into implementing it with "select..as", but I'm unfamiliar on that syntax.
Do you mean something like this?
df = df.select(col("foo").cast("double"), col("bar").cast("string"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. df.select()
or df.selectExpr()
work here.
The condition depends on multiple factors, but sometimes it can occur with less than 100.
resolves #458
Description
This PR adds the configuration property
enable_spark_seed_casting
to enable casting the seed to the types specified oncolumn_types
on the statement that creates the seed.Checklist
CHANGELOG.md
and added information about my change to the "dbt-glue next" section.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.