Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modifying execute API to get column nullability state #686

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

minurajeeve
Copy link
Contributor

@minurajeeve minurajeeve commented Nov 27, 2023

Wrangler will now have two types of columns : Nullable and Non-nullable. This PR handles all the code changes that are required to set the column as Nullable or Non-nullable (column Nullability state).

@minurajeeve minurajeeve force-pushed the null-handling-in-wrangler branch 4 times, most recently from f62d151 to 1dc23e5 Compare November 28, 2023 08:17
@minurajeeve minurajeeve added the build Triggers unit test build label Nov 28, 2023
@minurajeeve minurajeeve self-assigned this Nov 28, 2023
@minurajeeve minurajeeve force-pushed the null-handling-in-wrangler branch from 1dc23e5 to f9fcbc1 Compare November 28, 2023 09:14
@minurajeeve minurajeeve force-pushed the null-handling-in-wrangler branch from f9fcbc1 to 080216e Compare November 28, 2023 11:38
@minurajeeve minurajeeve requested review from tivv and sau42shri December 5, 2023 16:11
Copy link
Contributor

@albertshau albertshau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left general style/practice comments, but would like to understand why we want to introduce null specific handling when existing directives can already do mostly the same thing with more flexibility.

@@ -0,0 +1,31 @@
/*
* Copyright © 2016-2019 Cask Data, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024

/**
* A Null Handling specific exception used for communicating issues with Null Handling in a column.
*/
public class NullHandlingException extends Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being thrown or caught anywhere, how are we expecting to use it? Would be better to leave this out of the PR and include it in whatever PR actually uses it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for future use, removed

@@ -56,10 +59,13 @@ public final class RecipePipelineExecutor implements RecipePipeline<Row, Structu
private final RecipeParser recipeParser;
private final ExecutorContext context;
private List<Directive> directives;
private HashMap<String, UserDefinedAction> nullabilityMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap -> Map. In general, when declaring variables we should be using the interface and not the specific implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be final. Basically anything passed into the constructor should be final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


public RecipePipelineExecutor(RecipeParser recipeParser, @Nullable ExecutorContext context) {
public RecipePipelineExecutor(RecipeParser recipeParser, @Nullable ExecutorContext context,
HashMap<String, UserDefinedAction> nullabilityMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashMap -> Map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

this.context = context;
this.recipeParser = recipeParser;
this.nullabilityMap = nullabilityMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should do a defensive copy to make sure the map cannot change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*/
public enum UserDefinedAction {
SKIP_ROW,
SEND_TO_ERROR_COLLECTOR,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERROR_COLLECTOR is a pipeline specific concept, this can just be SEND_TO_ERROR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But by SEND_TO_ERROR_COLLECTOR, I actually mean the pipeline specific error collector hence the name.

* UserDefinedAction enum.
*/
public enum UserDefinedAction {
SKIP_ROW,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a better name would be FILTER

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -472,6 +474,12 @@ private DirectiveExecutionResponse execute(NamespaceSummary ns, HttpServiceReque

WorkspaceDetail detail = wsStore.getWorkspaceDetail(workspaceId);
UserDirectivesCollector userDirectivesCollector = new UserDirectivesCollector();
HashMap<String, UserDefinedAction> nullabilityMap = executionRequest.getNullabilityMap() == null ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic should be in the getNullabilityMap() class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

} catch (Exception e) {
throw new RuntimeException("Error in setting nullabilityMap of columns ", e);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

new HashMap<>() : executionRequest.getNullabilityMap();
if (!nullabilityMap.isEmpty()) {
//change nullabilityMap in Workspace Object
changeNullability(nullabilityMap, workspaceId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should update the workspace after executing the directives (this is already happening), not before. Otherwise the execution can fail and now there's a partially updated workspace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the updated nullabilityMap before directives are executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers unit test build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants