-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate number of spares #417 #434
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## handle-property-migration-conflict-#412 #434 +/- ##
===========================================================================
+ Coverage 97.91% 98.00% +0.08%
===========================================================================
Files 48 48
Lines 1723 1800 +77
===========================================================================
+ Hits 1687 1764 +77
Misses 36 36 ☔ View full report in Codecov by Sentry. |
ce2ff73
to
e08e647
Compare
8655607
to
85a17d1
Compare
…-number-of-spares-#417
@VKTB @joshuadkitenge @asuresh-code Tagging you all just to say feel free to test this PR and see if you can think of any other cases I missed in the description. |
I have just tried the tested an alternative method of using an aggregate query on the list endpoint using
In the in the catalogue item repo list method instead of the find. (This is not using the spares definition usage status array though). This took too long for swagger to complete, and was well over 5 minutes for the case described in the description of setting the spares definition (6427 catalogue items, 9684 items). While limited by pagination and querying by catalogue item id the 100MB stage limit would be a bigger problem with lookup stage as I believe it would be a combined limit for the catalogue items and item documents that would have to be in memory at the same time. |
Updates the `number_of_spares` field using a given catalogue item id filter. | ||
|
||
:param catalogue_item_id: The ID of the catalogue item to update or `None` if updating all. | ||
:param number_of_spares: New number of spares to update to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make a note here to say why it is optional?
Counts the number of items within a catalogue item with a `usage_status_id` contained within the given list. | ||
|
||
:param catalogue_item_id: ID of the catalogue item for which items should be counted. | ||
:param usage_status_id: List of usage status IDs which should be included in the count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param usage_status_id: List of usage status IDs which should be included in the count. | |
:param usage_status_ids: List of usage status IDs which should be included in the count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using Python's ThreadPoolExecutor
to concurrently update the catalogue items, to see if it would improve performance. I only changed 1 function in the setting.py
in the services layer.
Using Postman, I got the following results:
- With 104 catalogue items, 159 items: 207ms w/ multithreading, 222ms w/o
- With 1194 catalogue items, 1946 items: 1.8 seconds w/, 2.33 seconds w/o
- With 4.7k catalogue items, 7.6k items: 17.92 seconds w/, 20.08 seconds w/o
from concurrent.futures import ThreadPoolExecutor
def update_spares_definition(self, spares_definition: SparesDefinitionPutSchema) -> SparesDefinitionOut:
"""
Updates the spares definition to a new value.
:param spares_definition: The new spares definition.
:return: The updated spares definition.
:raises MissingRecordError: If any of the usage statuses specified by the given IDs don't exist.
"""
# Ensure all the given usage statuses exist
for usage_status in spares_definition.usage_statuses:
if not self._usage_status_repository.get(usage_status.id):
raise MissingRecordError(f"No usage status found with ID: {usage_status.id}")
# Begin a session for transactional updates
with start_session_transaction("updating spares definition") as session:
# Upsert the new spares definition
new_spares_definition = self._setting_repository.upsert(
SparesDefinitionIn(**spares_definition.model_dump()), SparesDefinitionOut, session=session
)
# Lock catalogue items for updates
utils.prepare_for_number_of_spares_recalculation(None, self._catalogue_item_repository, session)
# Obtain all catalogue item IDs
catalogue_item_ids = self._catalogue_item_repository.list_ids()
# Precompute usage status IDs that define a spare
usage_status_ids = utils.get_usage_status_ids_from_spares_definition(new_spares_definition)
# Define the worker function for recalculations
def recalculate_spares(catalogue_item_id):
utils.perform_number_of_spares_recalculation(
catalogue_item_id, usage_status_ids, self._catalogue_item_repository, self._item_repository, session
)
# Use ThreadPoolExecutor for concurrent recalculations
logger.info("Updating the number of spares for all catalogue items concurrently")
with ThreadPoolExecutor(max_workers=10) as executor: # May need to experiment w/ max workers
executor.map(recalculate_spares, catalogue_item_ids)
return new_spares_definition
Description
See #417. Leaves modified time unchanged.
Concurrency notes
There are multiple cases where concurrency can potentially cause a problem in this PR, I have attempted to mitigate these. Here are some particular cases to mention.
Performance tests
Setting the spares definition (using postman)
This is much worse for high numbers of catalogue items as it iterates through them. I did look at aggregate queries but couldn't find examples close to what would be needed here. Still potentially worth investigating further. The main limitation would be the stage memory limit for a large number of items as the count would likely have to come from the
size
of alookup
stage. (This would also have been the case for using aggregate queries in all catalogue item requests)Testing instructions
Agile board tracking
Closes #417