Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SQLite database periodically #18612

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

glassez
Copy link
Member

@glassez glassez commented Feb 24, 2023

It optimizes the database every time a session is loaded, and after every 10,000 writes/removes (can be useful for long-running instances).
Note, that 10,000 is an arbitrary number taken for initial implementation and I don't want to have long discussions about it (unless you can agree on some other value quickly enough). Some might also say that it would be nice to have an appropriate configuration option for it. I agree that it probably makes sense, but I would strongly not like to deal with it right now.

@glassez glassez added the Core label Feb 24, 2023
@glassez glassez added this to the 4.6.0 milestone Feb 24, 2023
@glassez glassez requested a review from a team February 24, 2023 16:30
thalieht
thalieht previously approved these changes Feb 24, 2023
xavier2k6
xavier2k6 previously approved these changes Feb 26, 2023
Copy link
Member

@sledgehammer999 sledgehammer999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I am not very fond of this heuristic.
I would suggest issuing a VACUUM by periodically querying the page_count and freelist_count. And if free space is >10% then issue a VACUUM.
NOTE 1: 10% was randomly chosen
NOTE 2: There are some other heuristics discussed here. The free space one seems the most straightforward.
NOTE 3: That is my personal opinion and I don't insist on it, if you want to through with this PR.

src/base/bittorrent/dbresumedatastorage.cpp Outdated Show resolved Hide resolved
@glassez
Copy link
Member Author

glassez commented Feb 28, 2023

Personally, I am not very fond of this heuristic.

In fact, I'm not "thrilled" with it either.
But this PR is not as spontaneous as it might seem. I've been thinking about this problem for a long time (though not very actively). And a lot of what you referred to above coincides more or less with my own conclusions.

I would suggest issuing a VACUUM by periodically querying the page_count and freelist_count. And if free space is >10% then issue a VACUUM.

Increasing the size of the database due to the number of unused/free pages is not the main problem in our case. Since qBittorrent actively uses the database, these pages will still be re-used again later.

@glassez
Copy link
Member Author

glassez commented Feb 28, 2023

Since qBittorrent usually produces records of different lengths, it is more susceptible to the problem of data/page fragmentation. Therefore, the most suitable heuristic for qBittorrent, IMO, is the one based on the number of writing/deleting made.
That's what I tried to implement. However, to do it completely, you need to save data on the number of changes made in the database itself (well, or somewhere else), otherwise those who do not use qBittorrent very actively will never reach the optimization point, since the counter will be reset too early.
For the first time, I decided to "cheat" a little, so as not to save this data, but just make another "forced" optimization at each startup. So maybe I won't merge it now, but will finish the work first.

@glassez glassez marked this pull request as draft February 28, 2023 12:44
@sledgehammer999
Copy link
Member

You have summed my thoughts very well.

For the fragmentation part, I wonder if a good enough heuristic can be obtained by using the info in DBSTAT.

@glassez
Copy link
Member Author

glassez commented Feb 28, 2023

For the fragmentation part, I wonder if a good enough heuristic can be obtained by using the info in DBSTAT.

Hmm, it looks like it might be useful...

@glassez
Copy link
Member Author

glassez commented Mar 1, 2023

For the fragmentation part, I wonder if a good enough heuristic can be obtained by using the info in DBSTAT.

Hmm, it looks like it might be useful...

The problem is that DBSTAT is an optional feature that is enabled at compile time, so we may find ourselves in a situation where it will not be available on the target system. At least it is disabled in SQLite plugin shipped with the official Qt binaries for Windows. So at least we'll need some kind of fallback to do without it.
What do you think?

@glassez glassez dismissed stale reviews from xavier2k6 and thalieht via c41c202 March 1, 2023 12:52
@glassez glassez force-pushed the optimize-db branch 2 times, most recently from 8e9c830 to 615e7b3 Compare March 5, 2023 12:00
If SQLite DBSTAT module is available the database optimization will be
performed automatically based on the value of page fragmentation.
@xavier2k6

This comment was marked as off-topic.

@glassez glassez modified the milestones: 4.6.0, 5.0 Jul 21, 2023
@HanabishiRecca
Copy link
Contributor

Something to take into consideration: https://sqlite.org/pragma.html#pragma_auto_vacuum

@glassez
Copy link
Member Author

glassez commented Jun 16, 2024

Something to take into consideration: https://sqlite.org/pragma.html#pragma_auto_vacuum

I initially rejected this idea because of the following:

Note, however, that auto-vacuum only truncates the freelist pages from the file. Auto-vacuum does not defragment the database nor repack individual database pages the way that the VACUUM command does. In fact, because it moves pages around within the file, auto-vacuum can actually make fragmentation worse.

In fact, right now I intend to just add an optimization command to the menu so that the user can run it manually. It would be better than nothing until automatic optimization is added someday.

@HanabishiRecca
Copy link
Contributor

In fact, right now I intend to just add an optimization command to the menu so that the user can run it manually. It would be better than nothing until automatic optimization is added someday.

Good idea. I think the manual option would be useful even alongside with automatic optimization.

@glassez
Copy link
Member Author

glassez commented Jun 16, 2024

I think the manual option would be useful even alongside with automatic optimization.

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants