Calculate number of spares #417 #434

joelvdavies · 2024-11-29T15:58:24Z

Description

See #417. Leaves modified time unchanged.

Concurrency notes

There are multiple cases where concurrency can potentially cause a problem in this PR, I have attempted to mitigate these. Here are some particular cases to mention.

Setting the spares definition first updates the spares definition to ensure it cannot end up with two sets of spares updates running simultaneously (the first update will write block future ones)
Multiple items can be created at once on the front end in quick succession - This should be handled by the write locking of the parent catalogue item for each request (with one backend instance it seems to be fine but could change later). Should two actually conflict and the first takes longer than the default transaction timeout (5ms) one could fail. This should show in the front end, but will lead to missing items when creating multiple. We could auto retry in such cases to increase the timeout if required.
When setting the spares definition - it is possible to delete the usage statuses involved in it prior to the transaction completion as we update it along with updating the spares of all catalogue items in the same transaction. This could be resolved by write locking the usage statuses, but as both editing usage statuses and the spares definition are admin functionality it should be rare. Currently the aggregate query will fail during the final get of the definition if it doesn't exist as it will return [], causing a schema error which is raised as a 500.
When recalculating the number of spares while setting the spares definition all catalogue items are initially write locked by setting the spares definition to None as an item could be updated in between it starting and completing. This also prevents any item create/delete requests or updates that modify the usage status. (These may need issues on the front end to handle)
The spares definition is write locked (even if currently non-existent by upserting a document) when doing a spares calculation and when creating a catalogue item to prevent a case where a brand new catalogue item and items are added during a long spares calculation which would subsequently not be updated.

Performance tests

Setting the spares definition (using postman)

With 104 catalogue items, 159 items: 216ms
With 6427 catalogue items, 9684 items: 41.4 seconds (with many log statements for each spares update - 35.5 when commenting out)
With 104 catalogue items, 2928 items: 355ms
With 104 catalogue items, 4710 items: 377ms

This is much worse for high numbers of catalogue items as it iterates through them. I did look at aggregate queries but couldn't find examples close to what would be needed here. Still potentially worth investigating further. The main limitation would be the stage memory limit for a large number of items as the count would likely have to come from the size of a lookup stage. (This would also have been the case for using aggregate queries in all catalogue item requests)

Testing instructions

Review code
Check Actions build
Review changes to test coverage

Agile board tracking

Closes #417

codecov · 2024-12-02T15:34:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.00%. Comparing base (2274dcb) to head (1a13d12).

Additional details and impacted files

@@                             Coverage Diff                             @@
##           handle-property-migration-conflict-#412     #434      +/-   ##
===========================================================================
+ Coverage                                    97.91%   98.00%   +0.08%     
===========================================================================
  Files                                           48       48              
  Lines                                         1723     1800      +77     
===========================================================================
+ Hits                                          1687     1764      +77     
  Misses                                          36       36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…hers #417

…tting spares definition #417

…417

…-number-of-spares-#417

joelvdavies · 2024-12-09T13:51:19Z

@VKTB @joshuadkitenge @asuresh-code Tagging you all just to say feel free to test this PR and see if you can think of any other cases I missed in the description.

joelvdavies · 2024-12-09T15:08:05Z

I have just tried the tested an alternative method of using an aggregate query on the list endpoint using

        catalogue_items = list(
            self._catalogue_items_collection.aggregate(
                [
                    {
                        "$lookup": {
                            "from": "items",
                            "localField": "_id",
                            "foreignField": "catalogue_item_id",
                            "as": "related_items",
                        }
                    },
                    {
                        "$addFields": {
                            "number_of_spares": {
                                "$size": {
                                    "$filter": {
                                        "input": "$related_items",
                                        "as": "item",
                                        "cond": {
                                            "$eq": [
                                                "$$item.usage_status_id",
                                                CustomObjectId("6756fc3b220c8ca1a0b8c7cb"),
                                            ]
                                        },
                                    }
                                }
                            }
                        }
                    },
                    {"$project": {"related_items": 0}},
                ]
            )
        )

In the in the catalogue item repo list method instead of the find. (This is not using the spares definition usage status array though). This took too long for swagger to complete, and was well over 5 minutes for the case described in the description of setting the spares definition (6427 catalogue items, 9684 items). While limited by pagination and querying by catalogue item id the 100MB stage limit would be a bigger problem with lookup stage as I believe it would be a combined limit for the catalogue items and item documents that would have to be in memory at the same time.

VKTB · 2024-12-16T15:08:38Z

inventory_management_system_api/repositories/catalogue_item.py

+        Updates the `number_of_spares` field using a given catalogue item id filter.
+
+        :param catalogue_item_id: The ID of the catalogue item to update or `None` if updating all.
+        :param number_of_spares: New number of spares to update to.


Could you make a note here to say why it is optional?

VKTB · 2024-12-16T15:36:20Z

inventory_management_system_api/repositories/item.py

+        Counts the number of items within a catalogue item with a `usage_status_id` contained within the given list.
+
+        :param catalogue_item_id: ID of the catalogue item for which items should be counted.
+        :param usage_status_id: List of usage status IDs which should be included in the count.


Suggested change

:param usage_status_id: List of usage status IDs which should be included in the count.

:param usage_status_ids: List of usage status IDs which should be included in the count.

asuresh-code

I tried using Python's ThreadPoolExecutor to concurrently update the catalogue items, to see if it would improve performance. I only changed 1 function in the setting.py in the services layer.

Using Postman, I got the following results:

With 104 catalogue items, 159 items: 207ms w/ multithreading, 222ms w/o
With 1194 catalogue items, 1946 items: 1.8 seconds w/, 2.33 seconds w/o
With 4.7k catalogue items, 7.6k items: 17.92 seconds w/, 20.08 seconds w/o

from concurrent.futures import ThreadPoolExecutor

def update_spares_definition(self, spares_definition: SparesDefinitionPutSchema) -> SparesDefinitionOut:
        """
        Updates the spares definition to a new value.

        :param spares_definition: The new spares definition.
        :return: The updated spares definition.
        :raises MissingRecordError: If any of the usage statuses specified by the given IDs don't exist.
        """
        # Ensure all the given usage statuses exist
        for usage_status in spares_definition.usage_statuses:
            if not self._usage_status_repository.get(usage_status.id):
                raise MissingRecordError(f"No usage status found with ID: {usage_status.id}")

        # Begin a session for transactional updates
        with start_session_transaction("updating spares definition") as session:
            # Upsert the new spares definition
            new_spares_definition = self._setting_repository.upsert(
                SparesDefinitionIn(**spares_definition.model_dump()), SparesDefinitionOut, session=session
            )

            # Lock catalogue items for updates
            utils.prepare_for_number_of_spares_recalculation(None, self._catalogue_item_repository, session)

            # Obtain all catalogue item IDs
            catalogue_item_ids = self._catalogue_item_repository.list_ids()

            # Precompute usage status IDs that define a spare
            usage_status_ids = utils.get_usage_status_ids_from_spares_definition(new_spares_definition)

            # Define the worker function for recalculations
            def recalculate_spares(catalogue_item_id):
                utils.perform_number_of_spares_recalculation(
                    catalogue_item_id, usage_status_ids, self._catalogue_item_repository, self._item_repository, session
                )

            # Use ThreadPoolExecutor for concurrent recalculations
            logger.info("Updating the number of spares for all catalogue items concurrently")
            with ThreadPoolExecutor(max_workers=10) as executor:  # May need to experiment w/ max workers
                executor.map(recalculate_spares, catalogue_item_ids)

        return new_spares_definition

joelvdavies added the enhancement New feature or request label Nov 29, 2024

joelvdavies added 14 commits December 5, 2024 12:53

Initial spares calculation when updating definition #417

1bebf69

Initial working spares calculation for items updates #417

6de588c

Add some comments and clean up slightly #417

996e7a8

Update mock data script and dump to include number of spares #417

48c60a4

Update and add repo unit tests #417

4237f52

Update SettingService unit tests #417

1559a05

Update ItemService unit tests #417

d1480fb

Add unit tests to service utils #417

4aae41e

Add missing assert in catalogue item service update test and align ot…

162dcc5

…hers #417

Initial spares calculation e2e tests #413

6d6789d

Resolve outstanding todos and fix linting #417

28c9838

Write lock all catalogue items when setting spares and reduce logs #417

e134cf9

Add WriteConflictError to routers #417

58a1e87

Update after merge #417

e08e647

joelvdavies force-pushed the calculate-number-of-spares-#417 branch from ce2ff73 to e08e647 Compare December 5, 2024 13:01

joelvdavies added 5 commits December 6, 2024 08:36

Fix tests and linting after rebase #417

49bb383

Initial implementation of preventing catalogue item creation while se…

0f8e745

…tting spares definition #417

Fix write lock on catalogue items when spares definition not assigned #…

7dcc444

…417

Merge branch 'handle-property-migration-conflict-#412' into calculate…

1008b31

…-number-of-spares-#417

Update and add more tests #417

85a17d1

joelvdavies force-pushed the calculate-number-of-spares-#417 branch from 8655607 to 85a17d1 Compare December 6, 2024 14:49

Merge branch 'handle-property-migration-conflict-#412' into calculate…

1a13d12

…-number-of-spares-#417

joelvdavies requested review from VKTB, joshuadkitenge and asuresh-code December 9, 2024 13:48

joelvdavies marked this pull request as ready for review December 9, 2024 13:51

Base automatically changed from handle-property-migration-conflict-#412 to develop December 9, 2024 14:19

VKTB reviewed Dec 16, 2024

View reviewed changes

asuresh-code reviewed Dec 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate number of spares #417 #434

Calculate number of spares #417 #434

joelvdavies commented Nov 29, 2024 •

edited by VKTB

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading

joelvdavies commented Dec 9, 2024

joelvdavies commented Dec 9, 2024 •

edited

Loading

VKTB Dec 16, 2024

VKTB Dec 16, 2024

asuresh-code left a comment

	:param usage_status_id: List of usage status IDs which should be included in the count.
	:param usage_status_ids: List of usage status IDs which should be included in the count.

Calculate number of spares #417 #434

Are you sure you want to change the base?

Calculate number of spares #417 #434

Conversation

joelvdavies commented Nov 29, 2024 • edited by VKTB Loading

Description

Concurrency notes

Performance tests

Setting the spares definition (using postman)

Testing instructions

Agile board tracking

codecov bot commented Dec 2, 2024 • edited Loading

Codecov Report

joelvdavies commented Dec 9, 2024

joelvdavies commented Dec 9, 2024 • edited Loading

VKTB Dec 16, 2024

Choose a reason for hiding this comment

VKTB Dec 16, 2024

Choose a reason for hiding this comment

asuresh-code left a comment

Choose a reason for hiding this comment

joelvdavies commented Nov 29, 2024 •

edited by VKTB

Loading

codecov bot commented Dec 2, 2024 •

edited

Loading

joelvdavies commented Dec 9, 2024 •

edited

Loading