Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to cache Roborock maps instead of always reloading #112047

Draft
wants to merge 36 commits into
base: dev
Choose a base branch
from

Conversation

Lash-L
Copy link
Contributor

@Lash-L Lash-L commented Mar 2, 2024

Breaking change

Proposed change

After talking with Paulus, he has multiple maps and the time it takes to startup Roborock is costly (about 40s) because of the nature of how we have to load the images. Now we only have to do that once because they are cached.

lightly relates to #97311 and conversations on caching there.

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:

Checklist

  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
  • Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

@home-assistant
Copy link

home-assistant bot commented Mar 2, 2024

Hey there @humbertogontijo, mind taking a look at this pull request as it has been labeled with an integration (roborock) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of roborock can trigger bot actions by commenting:

  • @home-assistant close Closes the pull request.
  • @home-assistant rename Awesome new title Renames the pull request.
  • @home-assistant reopen Reopen the pull request.
  • @home-assistant unassign roborock Removes the current integration label and assignees on the pull request, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the pull request.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the pull request.

@Lash-L Lash-L changed the title Add ability to cache images instead of always reloading Add ability to cache Roborock maps instead of always reloading Mar 2, 2024
"""Take the serialized dict and convert to object."""
return RoborockImageExtraStoredData(
map_flag=serialized_dict["map_flag"],
cached_image=base64.b64decode(serialized_dict["cached_image"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this will make the restored entity cache huge as it now contains a multiple whole images in base64 (which takes up more space than original image).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from what it seems for me (on my map at least)

The response from the api has a length of 269612 bytes.

The cached map aka the image has a length of 7909 bytes

And the base64 encoded serialized image has a length of 10548 bytes.

I'm open to a better way of storing the image, I just wanted to make sure that I was able to consistently serialize and serialize the image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better stored on disk outside of the restore entity cache. That cache is written to disk every 15min. In my production the restore state cache is 123KB. That would increase with 25% (I have 3 maps).

If you write it to disk, make sure it's cleaned up when the config entry is removed. Maybe you can make a folder inside .storage: <config>/.storage/roborock_<config entry ID>/maps ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be fine by me. My only question is how often do I write it to the disk? Should it only be written to the disk on config entry unloading? Or should it get done at the same point as the restore entity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also remembered that the response from the api before we decompress it is a gzip file. So I could also save that to disk and that would save some space. I'm not 100% sure a clean way to do that as it is kind of in the middle of the process within our package, but if we want it to be even smaller, I can do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think firing off an async write when the map has changed is fine, and could include some rate limiting or debouncing (e.g. you don't necessarily need to write every time the robot moves).

I don't think its worth it to do anything fancy there in the middle of the library. I've gone down that road when writing lots of objects where nest passes in a storage interface into the client library for saving images/videos nest media source and the interface but i don't think its worth it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to do tests - but I took what you did in Nest and modified it. No injection within the library, but I did think it would be useful to have a central object for handling the storage.

I'll try to get tests done soon, but for now I'd appreciate any feedback on the implementation. Will mark as draft until I get tests done.

@Lash-L Lash-L marked this pull request as draft March 14, 2024 02:34
@Lash-L Lash-L marked this pull request as ready for review March 31, 2024 02:26
@balloob
Copy link
Member

balloob commented May 14, 2024

yeah, you can use our storage helper. create a file <domain>.<config entry ID>.

@Lash-L
Copy link
Contributor Author

Lash-L commented May 14, 2024

yeah, you can use our storage helper. create a file <domain>.<config entry ID>.

Marking as draft until i have a chance to do that

@Lash-L Lash-L marked this pull request as draft May 14, 2024 20:27
@Lash-L Lash-L marked this pull request as ready for review July 6, 2024 23:36
@Lash-L Lash-L requested a review from balloob July 6, 2024 23:37
@Lash-L
Copy link
Contributor Author

Lash-L commented Jul 6, 2024

yeah, you can use our storage helper. create a file <domain>.<config entry ID>.

Marking as draft until i have a chance to do that

Finally got around to it. Should be ready for a look over

@frenck frenck added the smash Indicator this PR is close to finish for merging or closing label Jul 7, 2024
homeassistant/components/roborock/coordinator.py Outdated Show resolved Hide resolved
homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved
return self._hass.config.path(f"{MAP_PATH}/{self._entry_id}/{map_name}")

def exec_load_maps(self, map_names: list[str]) -> list[bytes | None]:
"""Load map content. Should be called in executor thread."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be safer to separate the map loading disk i/o from updating the python object state (_data). That is, it seems like the save and load can happen from different executor threads.

Perhaps provide async interfaces and let those update the object state and defer to the executor threads for I/O only.

When writing, i think its fine to only update _data after the map write succeeds rather than writing and handling a rollback case on error.


import dataclasses
import logging
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer pathlib for all the filename and file/directory manipulation. It will make some of the path concatenation and read/write operations a little less code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used pathlib before, but I think I did it all correctly.

Although one test is failing now, perhaps i'm patching incorrectly. I will have to look later.

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved

async def async_load_map_info(self, coord_duid: str) -> dict[int, RoborockMapInfo]:
"""Load a coordinator's map info."""
map_info: dict[str, dict[str, dict]] | None = await self._store.async_load()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads from disk every time load is called so the use of self._map_info does not seem to be adding value. Is the idea to treat _map_info as a cache or should it be dropped entirely? It seems like the intent was to load once at startup, then always read from it and write through directly to disk every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called once. _map_info is a cache so that we can update the coord that changed and then rewrite it.

The coord that calls the save does not have access to all of the data, so roborock storage holds it instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, my thinking since its not used when reading, it could just also be re-read before writing to simplify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean I'm doubling the IO everytime I am writing to the disk - I can do that but it seems a odd. Happy to do it if you think that is best

Copy link
Contributor

@allenporter allenporter Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed updates were rare, so i was trying to see about removing complexity. If this is really a performance sensitive flow then it could also be improved by not storing all the maps in a single file since updating maps for one coord_duid requires updating all of them.

If we want to make this work, then there are some either potential bugs here or semantics that need to be enforced in how this is used:

  • If you call save before calling load then the existing map data gets overwritten
  • Calling remove map info doesn't clear the map info so then calling save will write back old map data
  • My assumption is that two different coordinators can call load and save interleaved and i'm not sure how safe that is.

These work today because of assumptions in exactly how these are used by the callers today, which i think make this a little brittle, or spread logic across two files.

Removing _map_info sidesteps needing to address these bugs / or add documentation explaining assumptions / or add invariant checking to enforce a specific call pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what makes this fancy is that its trying to do two things (1) provide a cache of the entire disk and (2) provide partial update semantics with read/write of a subset of the entire cached data.

Separating the responsibilities and having the cache at a lower level, reading /writing the entire dataset would make the semantics simpler and easier to reason about. (e.g. cache the entire data on load, overwrite the entire data on save.

But more concretely i think these fixes can make what is here work:

  • only ever read from disk once, always prefer to read from the existing data if it was already read
  • invalidate map info when removing it from disk
  • consider using async_delay_save on write to batch writes together from multiple devices

homeassistant/components/roborock/image.py Show resolved Hide resolved
@home-assistant home-assistant bot marked this pull request as draft July 8, 2024 13:37
@home-assistant
Copy link

home-assistant bot commented Jul 8, 2024

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.


async def async_load_map_info(self, coord_duid: str) -> dict[int, RoborockMapInfo]:
"""Load a coordinator's map info."""
map_info: dict[str, dict[str, dict]] | None = await self._store.async_load()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, my thinking since its not used when reading, it could just also be re-read before writing to simplify.

homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved
homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved
homeassistant/components/roborock/roborock_storage.py Outdated Show resolved Hide resolved
)
# To prevent weirdness - if there is a map without a name
# - set its name to its flag
if roborock_map.mapFlag not in self.maps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is what is happening here is the we set self.maps = to whatever is currently in the local storage, then we ask for the list of maps from the API. The content of self.maps is never updated with a new list of names or rooms, which is not necessarily what I would expect if the API returned newer data. I may be misunderstanding though: Is this ok for another reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing my thoughts as I reread my code - so stick with me.

We load the saved map info from the storage.

We then get the list of maps from the api.

If we haven't seen this map before, we should add it to self._maps.

The api doesn't have room information, that information can only be gotten when we have that current map selected. so we can't get it here. instead, room updates would be done in get_rooms().

I think the point about names is fair though, so I will update the name here if the name has changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although on second thought - I don't think entities can handle dynamic changing of attribute name, so i'm not sure if that makes sense to handle. I will log a warning for now and you can give input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK stepping back: What is the intent here of refreshing this regularly? What are the cases in scope and not in scope.

As an alternative: Only query on startup if we don't want to handle all the corner cases. I'm having trouble understanding why some are handled and some are not and so not handling any of them after startup seems appealing to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is purpose is the local disk storage of metadata / map info? The API is always consulted here anyway first and I don't see it read on its own anywhere.

flag=roborock_map.mapFlag, name=map_name, rooms={}
)
update = True
if update:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is expected behavior if the API no longer returns a map? I might expect it to be removed from the local disk cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? It probably should be - and the entity should be removed as well. Is the proper way to remove the entity just using the entity registry?

homeassistant/components/roborock/image.py Outdated Show resolved Hide resolved
@@ -64,25 +65,33 @@ def __init__(
map_flag: int,
starting_map: bytes,
map_name: str,
roborock_storage: RoborockStorage,
create_map: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now starting map can either be map bytes or image bytes, and it seems like it would there should be a way to pick one to keep the abstraction here simpler. Can RoborockMap just deal with one or the other? Setting a more cirps boundary between the storage class, the coordinator, and the entity will avoid all parts of the system needing to know everything about how the caching works. (There are also multiple levels of caching happening here now so you can consider how to simplify that in the future as well, putting all the cache in a single layer)

@frenck frenck removed the smash Indicator this PR is close to finish for merging or closing label Jul 9, 2024
Copy link

There hasn't been any activity on this pull request recently. This pull request has been automatically marked as stale because of that and will be closed if no further activity occurs within 7 days.
If you are the author of this PR, please leave a comment if you want to keep it open. Also, please rebase your PR onto the latest dev branch to ensure that it's up to date with the latest changes.
Thank you for your contribution!

@github-actions github-actions bot added the stale label Sep 25, 2024
@github-actions github-actions bot closed this Oct 3, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 4, 2024
@balloob balloob reopened this Oct 28, 2024
@balloob balloob removed the stale label Oct 28, 2024
@home-assistant home-assistant unlocked this conversation Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants