Add ability to cache Roborock maps instead of always reloading #112047

Lash-L · 2024-03-02T18:20:46Z

Breaking change

Proposed change

After talking with Paulus, he has multiple maps and the time it takes to startup Roborock is costly (about 40s) because of the nature of how we have to load the images. Now we only have to do that once because they are cached.

lightly relates to #97311 and conversations on caching there.

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes #
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

home-assistant · 2024-03-02T18:20:52Z

Hey there @humbertogontijo, mind taking a look at this pull request as it has been labeled with an integration (roborock) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of roborock can trigger bot actions by commenting:

@home-assistant close Closes the pull request.
@home-assistant rename Awesome new title Renames the pull request.
@home-assistant reopen Reopen the pull request.
@home-assistant unassign roborock Removes the current integration label and assignees on the pull request, add the integration domain after the command.
@home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
@home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

balloob · 2024-03-02T21:12:29Z

homeassistant/components/roborock/models.py

+        """Take the serialized dict and convert to object."""
+        return RoborockImageExtraStoredData(
+            map_flag=serialized_dict["map_flag"],
+            cached_image=base64.b64decode(serialized_dict["cached_image"]),


Hm, this will make the restored entity cache huge as it now contains a multiple whole images in base64 (which takes up more space than original image).

So from what it seems for me (on my map at least)

The response from the api has a length of 269612 bytes.

The cached map aka the image has a length of 7909 bytes

And the base64 encoded serialized image has a length of 10548 bytes.

I'm open to a better way of storing the image, I just wanted to make sure that I was able to consistently serialize and serialize the image.

I think it's better stored on disk outside of the restore entity cache. That cache is written to disk every 15min. In my production the restore state cache is 123KB. That would increase with 25% (I have 3 maps).

If you write it to disk, make sure it's cleaned up when the config entry is removed. Maybe you can make a folder inside .storage: <config>/.storage/roborock_<config entry ID>/maps ?

That would be fine by me. My only question is how often do I write it to the disk? Should it only be written to the disk on config entry unloading? Or should it get done at the same point as the restore entity?

I also remembered that the response from the api before we decompress it is a gzip file. So I could also save that to disk and that would save some space. I'm not 100% sure a clean way to do that as it is kind of in the middle of the process within our package, but if we want it to be even smaller, I can do that.

I think firing off an async write when the map has changed is fine, and could include some rate limiting or debouncing (e.g. you don't necessarily need to write every time the robot moves).

I don't think its worth it to do anything fancy there in the middle of the library. I've gone down that road when writing lots of objects where nest passes in a storage interface into the client library for saving images/videos nest media source and the interface but i don't think its worth it here.

Need to do tests - but I took what you did in Nest and modified it. No injection within the library, but I did think it would be useful to have a central object for handling the storage.

I'll try to get tests done soon, but for now I'd appreciate any feedback on the implementation. Will mark as draft until I get tests done.

homeassistant/components/roborock/image.py

homeassistant/components/roborock/roborock_storage.py

homeassistant/components/roborock/image.py

homeassistant/components/roborock/roborock_storage.py

balloob · 2024-05-14T20:01:35Z

yeah, you can use our storage helper. create a file <domain>.<config entry ID>.

Lash-L · 2024-05-14T20:27:02Z

yeah, you can use our storage helper. create a file <domain>.<config entry ID>.

Marking as draft until i have a chance to do that

Lash-L · 2024-07-06T23:37:25Z

yeah, you can use our storage helper. create a file <domain>.<config entry ID>.

Marking as draft until i have a chance to do that

Finally got around to it. Should be ready for a look over

homeassistant/components/roborock/coordinator.py

homeassistant/components/roborock/roborock_storage.py

allenporter · 2024-07-08T13:11:35Z

homeassistant/components/roborock/roborock_storage.py

+        return self._hass.config.path(f"{MAP_PATH}/{self._entry_id}/{map_name}")
+
+    def exec_load_maps(self, map_names: list[str]) -> list[bytes | None]:
+        """Load map content. Should be called in executor thread."""


i think it would be safer to separate the map loading disk i/o from updating the python object state (_data). That is, it seems like the save and load can happen from different executor threads.

Perhaps provide async interfaces and let those update the object state and defer to the executor threads for I/O only.

When writing, i think its fine to only update _data after the map write succeeds rather than writing and handling a rollback case on error.

allenporter · 2024-07-08T13:16:41Z

homeassistant/components/roborock/roborock_storage.py

+
+import dataclasses
+import logging
+import os


Prefer pathlib for all the filename and file/directory manipulation. It will make some of the path concatenation and read/write operations a little less code.

I haven't used pathlib before, but I think I did it all correctly.

Although one test is failing now, perhaps i'm patching incorrectly. I will have to look later.

homeassistant/components/roborock/roborock_storage.py

allenporter · 2024-07-08T13:25:35Z

homeassistant/components/roborock/roborock_storage.py

+
+    async def async_load_map_info(self, coord_duid: str) -> dict[int, RoborockMapInfo]:
+        """Load a coordinator's map info."""
+        map_info: dict[str, dict[str, dict]] | None = await self._store.async_load()


This reads from disk every time load is called so the use of self._map_info does not seem to be adding value. Is the idea to treat _map_info as a cache or should it be dropped entirely? It seems like the intent was to load once at startup, then always read from it and write through directly to disk every time.

This is only called once. _map_info is a cache so that we can update the coord that changed and then rewrite it.

The coord that calls the save does not have access to all of the data, so roborock storage holds it instead

OK, my thinking since its not used when reading, it could just also be re-read before writing to simplify.

That would mean I'm doubling the IO everytime I am writing to the disk - I can do that but it seems a odd. Happy to do it if you think that is best

I assumed updates were rare, so i was trying to see about removing complexity. If this is really a performance sensitive flow then it could also be improved by not storing all the maps in a single file since updating maps for one coord_duid requires updating all of them.

If we want to make this work, then there are some either potential bugs here or semantics that need to be enforced in how this is used:

If you call save before calling load then the existing map data gets overwritten

Calling remove map info doesn't clear the map info so then calling save will write back old map data

My assumption is that two different coordinators can call load and save interleaved and i'm not sure how safe that is.

These work today because of assumptions in exactly how these are used by the callers today, which i think make this a little brittle, or spread logic across two files.

Removing _map_info sidesteps needing to address these bugs / or add documentation explaining assumptions / or add invariant checking to enforce a specific call pattern.

I think what makes this fancy is that its trying to do two things (1) provide a cache of the entire disk and (2) provide partial update semantics with read/write of a subset of the entire cached data.

Separating the responsibilities and having the cache at a lower level, reading /writing the entire dataset would make the semantics simpler and easier to reason about. (e.g. cache the entire data on load, overwrite the entire data on save.

But more concretely i think these fixes can make what is here work:

only ever read from disk once, always prefer to read from the existing data if it was already read

invalidate map info when removing it from disk

consider using async_delay_save on write to batch writes together from multiple devices

homeassistant/components/roborock/image.py

home-assistant · 2024-07-08T13:37:45Z

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

allenporter · 2024-07-09T02:44:16Z

homeassistant/components/roborock/roborock_storage.py

+
+    async def async_load_map_info(self, coord_duid: str) -> dict[int, RoborockMapInfo]:
+        """Load a coordinator's map info."""
+        map_info: dict[str, dict[str, dict]] | None = await self._store.async_load()


OK, my thinking since its not used when reading, it could just also be re-read before writing to simplify.

homeassistant/components/roborock/roborock_storage.py

allenporter · 2024-07-09T02:59:09Z

homeassistant/components/roborock/coordinator.py

-                )
+                # To prevent weirdness - if there is a map without a name
+                # - set its name to its flag
+                if roborock_map.mapFlag not in self.maps:


My impression is what is happening here is the we set self.maps = to whatever is currently in the local storage, then we ask for the list of maps from the API. The content of self.maps is never updated with a new list of names or rooms, which is not necessarily what I would expect if the API returned newer data. I may be misunderstanding though: Is this ok for another reason?

Writing my thoughts as I reread my code - so stick with me.

We load the saved map info from the storage.

We then get the list of maps from the api.

If we haven't seen this map before, we should add it to self._maps.

The api doesn't have room information, that information can only be gotten when we have that current map selected. so we can't get it here. instead, room updates would be done in get_rooms().

I think the point about names is fair though, so I will update the name here if the name has changed.

Although on second thought - I don't think entities can handle dynamic changing of attribute name, so i'm not sure if that makes sense to handle. I will log a warning for now and you can give input

OK stepping back: What is the intent here of refreshing this regularly? What are the cases in scope and not in scope.

As an alternative: Only query on startup if we don't want to handle all the corner cases. I'm having trouble understanding why some are handled and some are not and so not handling any of them after startup seems appealing to me.

What is purpose is the local disk storage of metadata / map info? The API is always consulted here anyway first and I don't see it read on its own anywhere.

allenporter · 2024-07-09T03:00:43Z

homeassistant/components/roborock/coordinator.py

+                        flag=roborock_map.mapFlag, name=map_name, rooms={}
+                    )
+                    update = True
+        if update:


What is expected behavior if the API no longer returns a map? I might expect it to be removed from the local disk cache?

Maybe? It probably should be - and the entity should be removed as well. Is the proper way to remove the entity just using the entity registry?

homeassistant/components/roborock/image.py

allenporter · 2024-07-09T03:14:31Z

homeassistant/components/roborock/image.py

@@ -64,25 +65,33 @@ def __init__(
        map_flag: int,
        starting_map: bytes,
        map_name: str,
+        roborock_storage: RoborockStorage,
+        create_map: bool,


Now starting map can either be map bytes or image bytes, and it seems like it would there should be a way to pick one to keep the abstraction here simpler. Can RoborockMap just deal with one or the other? Setting a more cirps boundary between the storage class, the coordinator, and the entity will avoid all parts of the system needing to know everything about how the caching works. (There are also multiple levels of caching happening here now so you can consider how to simplify that in the future as well, putting all the cache in a single layer)

Co-authored-by: Allen Porter <[email protected]>

github-actions · 2024-09-25T20:06:28Z

There hasn't been any activity on this pull request recently. This pull request has been automatically marked as stale because of that and will be closed if no further activity occurs within 7 days.
If you are the author of this PR, please leave a comment if you want to keep it open. Also, please rebase your PR onto the latest dev branch to ensure that it's up to date with the latest changes.
Thank you for your contribution!

Add ability to cache images instead of always reloading

44f581a

home-assistant bot added cla-signed has-tests integration: roborock new-feature by-code-owner Quality Scale: No score labels Mar 2, 2024

Lash-L changed the title ~~Add ability to cache images instead of always reloading~~ Add ability to cache Roborock maps instead of always reloading Mar 2, 2024

balloob reviewed Mar 2, 2024

View reviewed changes

Lash-L added 2 commits March 14, 2024 00:13

image caching temp work

59d3951

add storage for roborock images

dcd1c6c

Lash-L marked this pull request as draft March 14, 2024 02:34

remove annotations

67fd20a

Lash-L commented Mar 14, 2024

View reviewed changes

homeassistant/components/roborock/image.py Outdated Show resolved Hide resolved

Lash-L added 4 commits March 13, 2024 22:38

Merge branch 'dev' into image_caching

07c7bf4

dont save maps for tests

284fa41

add some testing

9f2ddec

add rest of tests

ced5334

Lash-L marked this pull request as ready for review March 31, 2024 02:26

Lash-L added 2 commits March 30, 2024 22:30

Merge branch 'dev' into image_caching

646c237

fix lint

051e123

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/image.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/image.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/image.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved

balloob reviewed Apr 1, 2024

View reviewed changes

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved

Lash-L marked this pull request as draft May 14, 2024 20:27

Lash-L added 6 commits June 5, 2024 17:18

Merge branch 'dev' into image_caching

992ceb5

Merge branch 'dev' into image_caching

59e69f1

fix merge

8a64324

Merge branch 'dev' into image_caching

a27b498

Add caching of map info

050334f

remove unneeded comment

3c0e208

Lash-L marked this pull request as ready for review July 6, 2024 23:36

Lash-L requested a review from balloob July 6, 2024 23:37

frenck added the smash Indicator this PR is close to finish for merging or closing label Jul 7, 2024

allenporter requested changes Jul 8, 2024

View reviewed changes

home-assistant bot marked this pull request as draft July 8, 2024 13:37

some mr comments

c9f0e55

allenporter requested changes Jul 9, 2024

View reviewed changes

frenck removed the smash Indicator this PR is close to finish for merging or closing label Jul 9, 2024

Lash-L and others added 3 commits July 25, 2024 19:47

Update homeassistant/components/roborock/roborock_storage.py

d4aef81

Co-authored-by: Allen Porter <[email protected]>

Some changes

97856b2

Merge branch 'dev' into image_caching

cebec30

github-actions bot added the stale label Sep 25, 2024

github-actions bot closed this Oct 3, 2024

github-actions bot locked and limited conversation to collaborators Oct 4, 2024

balloob reopened this Oct 28, 2024

balloob removed the stale label Oct 28, 2024

Merge branch 'dev' into image_caching

c8a13cf

home-assistant unlocked this conversation Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to cache Roborock maps instead of always reloading #112047

Add ability to cache Roborock maps instead of always reloading #112047

Lash-L commented Mar 2, 2024 •

edited

Loading

home-assistant bot commented Mar 2, 2024

balloob Mar 2, 2024

Lash-L Mar 2, 2024

balloob Mar 3, 2024

Lash-L Mar 3, 2024

Lash-L Mar 11, 2024

allenporter Mar 12, 2024

Lash-L Mar 14, 2024

balloob commented May 14, 2024

Lash-L commented May 14, 2024

Lash-L commented Jul 6, 2024

allenporter Jul 8, 2024

allenporter Jul 8, 2024

Lash-L Jul 8, 2024

allenporter Jul 8, 2024

Lash-L Jul 8, 2024

allenporter Jul 9, 2024

Lash-L Jul 26, 2024

allenporter Jul 27, 2024 •

edited

Loading

allenporter Jul 27, 2024

home-assistant bot commented Jul 8, 2024

allenporter Jul 9, 2024

allenporter Jul 9, 2024

Lash-L Jul 26, 2024

Lash-L Jul 26, 2024

allenporter Jul 27, 2024

allenporter Jul 27, 2024

allenporter Jul 9, 2024

Lash-L Jul 26, 2024

allenporter Jul 9, 2024

github-actions bot commented Sep 25, 2024

Add ability to cache Roborock maps instead of always reloading #112047

Are you sure you want to change the base?

Add ability to cache Roborock maps instead of always reloading #112047

Conversation

Lash-L commented Mar 2, 2024 • edited Loading

Breaking change

Proposed change

Type of change

Additional information

Checklist

home-assistant bot commented Mar 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

balloob commented May 14, 2024

Lash-L commented May 14, 2024

Lash-L commented Jul 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allenporter Jul 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

home-assistant bot commented Jul 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 25, 2024

Lash-L commented Mar 2, 2024 •

edited

Loading

allenporter Jul 27, 2024 •

edited

Loading