Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Housekeeping] Flyteadmin digest comparison should rely on database semantics #4780

Open
2 tasks done
katrogan opened this issue Jan 25, 2024 · 11 comments · May be fixed by #6058
Open
2 tasks done

[Housekeeping] Flyteadmin digest comparison should rely on database semantics #4780

katrogan opened this issue Jan 25, 2024 · 11 comments · May be fixed by #6058
Assignees
Labels
enhancement New feature or request housekeeping Issues that help maintain flyte and keep it tech-debt free

Comments

@katrogan
Copy link
Contributor

Describe the issue

In order to compare an already registered task (and workflow, and launch plan) when an identical version is registered, we get the existing task from the db and manually compare its digest. This is prone to race conditions. It would be better to wrap this in a transaction where we attempt to upsert the task only on a matching digest

What if we do not do this?

Slightly more error prone method for determining already exists task structure

Related component(s)

flyteadmin

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@katrogan katrogan added the housekeeping Issues that help maintain flyte and keep it tech-debt free label Jan 25, 2024
Copy link

dosubot bot commented Jan 25, 2024

🤖

It's great to see your thoughtful analysis of the issue and proposed solution! Your suggestion to wrap the comparison in a transaction for better reliability makes a lot of sense. If you're able to implement this improvement and create a pull request, it would be a valuable contribution to the repository. Thank you for considering this enhancement!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the enhancement New feature or request label Jan 25, 2024
@popojk
Copy link
Contributor

popojk commented Nov 17, 2024

#take

@popojk
Copy link
Contributor

popojk commented Nov 18, 2024

Hi @katrogan , I want to clarify the issue before going forward. While creating task, flyte admin will check whether the id and taskDigest of new task is identical to existing task in db. Is this issue aiming to wrap this section of code in transaction, in order to prevent others precess possibly update the data of existing task while doing the taskDigest check?

@katrogan
Copy link
Contributor Author

yes exactly, we could have two users attempt to register the same task version in the same time frame such that an existing task digest is never found but only one succeeds in writing to the db

@popojk
Copy link
Contributor

popojk commented Nov 21, 2024

Hi @katrogan , please let me clarify this issue further. In this section of code, Flyte will check if identical task existed in the DB. If not, Flyte will create new task in DB, otherwise return error.
Let me make an example(please refer to bellow picture for better understanding). If there are 2 identical tasks, Task A and Task B, assigned to Flyte Admin at the same time frame, it is possible that both tasks will not find each other in the DB, and both tasks will be written into the DB.
If the example I made is the issue you are aiming to handle, I am not sure how transaction can solve the problem as it seems like using transaction will still cause duplicate task creation. In my mind this issue can be solve by something like distributed lock.
Please let me know if I have any misunderstanding, thanks!
截圖 2024-11-21 下午3 26 39

@katrogan
Copy link
Contributor Author

katrogan commented Nov 21, 2024

the way we get around duplicates in the two separate transactions is because we want to guard against two tasks with the same identifier (project, domain, name, version) being written to the db with different structures (which would result in different digests)

the database ensures we can never write duplicate entries with identical primary keys (in this case, TaskKey)

in your example above, A and B cannot execute in parallel due to transaction semantics, so one will fail when there is a conflict with the primary key.

if we move the digest comparison to within a transaction we can ensure we correctly returns already exists (with or without a different task structure)

so the pseudocode could look like

in the transaction:
try:
  create a task with given primary key
except:
  primary key already exists
  get existing entry with identical primary key
    if digest of existing == new entry's digest -> return `NewTaskExistsIdenticalStructureError`
    else -> return `NewTaskExistsDifferentStructureError`

the current code does return the unmarshalled closure of the existing task when we return NewTaskExistsDifferentStructureError here so the db transaction can return the marhsalled closure in a sentinel error and the manager can catch that sentinel error and correctly transform the closure to return a readable error message (as a suggestion)

@popojk
Copy link
Contributor

popojk commented Nov 26, 2024

Hi @katrogan , thanks for the detailed explanation. I think I understand the purpose of wrapping digest check within a transaction right now. Just want to clarify with you once again. If the transaction checks the primary key on create task first, and check digest if task of the same primary key found in DB, we can make sure that the digest will be checked even though two identical tasks registered in the same timeframe.

截圖 2024-11-22 下午5 28 50

Moreover, I am not sure about what you mentioned about NewTaskExistsDifferentStructureError. If the error occurred, the error message will be something like what in bellow picture, where the message states the difference of origin and new task by this section of code. Do you mean we should add additional task information in the closure to the error message and return to client side?

截圖 2024-11-26 上午11 05 24

@katrogan
Copy link
Contributor Author

great, that diagram looks like we're on the same page!

about NewTaskExistsDifferentStructureError that was more of an optimization. what we can do if if the postgres transaction fails due to primary key conflict, return AlreadyExists in the postgres error transformer. The calling code in task_manager can then fetch the existing task definition from the db and determine whether to return NewTaskExistsDifferentStructureError or NewTaskExistsIdenticalStructureError based on how the digests of the two tasks compare

@popojk
Copy link
Contributor

popojk commented Nov 29, 2024

Hi @katrogan , I've raised a PR to address this issue. Please feel free to share any additional thoughts or suggestions, thanks!

@popojk
Copy link
Contributor

popojk commented Dec 13, 2024

Hi @katrogan , i've raise a PR for this issue for a while. Could you please help me to review it and let me know if if any adjustment required? Thanks!

@katrogan
Copy link
Contributor Author

hi @popojk thank you so much for the ping, I'm so sorry I missed this notification before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request housekeeping Issues that help maintain flyte and keep it tech-debt free
Projects
Status: Ready
Development

Successfully merging a pull request may close this issue.

2 participants