Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5442] Improvement(iceberg-common): Overwrite the equals and hashCode methods to avoid frequently creating HiveClientPool instances #5443

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

caican00
Copy link
Collaborator

@caican00 caican00 commented Nov 4, 2024

What changes were proposed in this pull request?

Overwrite the equals and hashCode methods to avoid frequently creating HiveClientPool instances

Why are the changes needed?

Fix: #5442

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UT.

@caican00 caican00 self-assigned this Nov 4, 2024
@jerryshao jerryshao requested a review from FANNG1 November 5, 2024 02:36
@jerryshao
Copy link
Contributor

Hi @caican00 will there be any potential UGI issue when we reuse the client instance? CC @yuqi1129 @jerqi to also take a look.

@caican00
Copy link
Collaborator Author

caican00 commented Nov 5, 2024

Hi @caican00 will there be any potential UGI issue when we reuse the client instance? CC @yuqi1129 @jerqi to also take a look.

When Impersonation is disabled, we use a super user to access hms, reusing the clients are reasonable.

When Impersonation is enabled , we get a HiveClientPool instance from the clientPoolCache through a Key instance. If the ugi or user_name of the current thread is the same, the Key instance should be the same and the connection in the HiveClientPool should be taken instead of recreating a new HiveClientPool.

Now each request will create a HiveClientPool instance, the HiveClientPool is meaningless.

Is my understanding correct? If there is any problem, please help point out. Thank you very much cc @yuqi1129

@@ -107,7 +112,13 @@ public IcebergHiveCachedClientPool(Configuration conf, Map<String, String> prope
@VisibleForTesting
HiveClientPool clientPool() {
Key key = extractKey(properties.get(CatalogProperties.CLIENT_POOL_CACHE_KEYS), conf);
Copy link
Collaborator Author

@caican00 caican00 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the Impersonation is enabled , the CatalogProperties.CLIENT_POOL_CACHE_KEYS should not be null and must cantain ugi or user_name to avoid diffrent users reusing a same HiveClientPool instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants