Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MDB-29707] Metatdata cleaner with system drop replica #241

Merged
merged 5 commits into from
Oct 8, 2024

Conversation

MikhailBurdukov
Copy link
Contributor

@MikhailBurdukov MikhailBurdukov commented Oct 4, 2024

A little performance test:

500 replicated tables + 0 ReplicatedDB
    old 5 runs: 
        median:  0m44.959s
        list : [2m56.825s, 2m56.635s, 0m44.959s, 0m41.463s, 0m44.481s]
    new 5 runs:  
        median:  0m9.965s
        list: [0m6.865s, 0m9.965s, 0m10.014s, 0m6.701s, 0m10.085s]


500 replicated tables + 200 Replicated databases
    old 5 runs: 
        median:  3m39.932s
        list : [0m50.593s, 0m52.412s, 3m39.697s, 3m40.830s, 3m39.932s]
    new 5 runs:  
        median:  0m8.550s
        list: [0m13.685s, 0m13.335s, 0m8.550s, 0m8.320s, 0m8.627s]

gen script:

REPLICA="non_exists"
DB='tests_db'

clickhouse client --query "ATTACH DATABASE $DB"
clickhouse client --query "DROP DATABASE $DB"

clickhouse client --query "CREATE DATABASE $DB"


for i in {1..500}
do
   QUERY="CREATE TABLE $DB.t$i (a int) ENGINE=ReplicatedMergeTree('/clickhouse/tables/$DB/t$i','$REPLICA') ORDER BY a"

   echo "$QUERY"
   clickhouse client --query "$QUERY"
done

clickhouse client --query "DETACH DATABASE $DB"


for i in {1..200}
do
        DB_REPL="repl$i"
        clickhouse client --query "ATTACH DATABASE $DB_REPL"
        clickhouse client --query "DROP DATABASE IF EXISTS $DB_REPL"
        QUERY="CREATE DATABASE $DB_REPL ENGINE = Replicated('/clickhouse/databases/$DB_REPL', 'shard1', '$REPLICA')"
        echo $QUERY
        clickhouse client --query "$QUERY"
        clickhouse client --query "DETACH DATABASE $DB_REPL"
done

@@ -37,6 +37,7 @@
"unfreeze_timeout": 10 * 60,
"restart_replica_timeout": 10 * 60,
"restore_replica_timeout": 10 * 60,
"drop_replica_timeout": 10 * 60,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 min is enough? or set it to 1 hour?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok. We will adjust it if necessary

@MikhailBurdukov
Copy link
Contributor Author

MikhailBurdukov commented Oct 4, 2024

System drop database replica was announced in 23.1. So we can't use it for 22.8.
That version already deprecated, but we have small number of clusters with that version.

  • Should we disable replicated database cleaning for 22.8? It is pretty new feature, maybe we don't even have users who uses them.
  • Or keep backoff to the old behaviour?

WDYT @Alex-Burmak @aalexfvk ?

@MikhailBurdukov MikhailBurdukov changed the title Metatdata cleaner with system drop replica [MDB-29707] Metatdata cleaner with system drop replica Oct 4, 2024
"""

if block_until_finised_tasks:
self._queue_active[0][1].wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._queue_active can be empty

self._max_active_tasks = max_parrallel_tasks

self._queue_pending: Deque[str] = deque()
self._queue_active: Deque[Tuple[str, IAsyncResult]] = deque(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make dataclass for the tuple to get rid of naked indices like [0][1]


if not self._exists_tasks_to_do():
return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can add waiting for the first ready result here and remove with_block and block_until_finished_tasks parameters from everywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can. But this will lead to unnecessary waiting every time we call the queue update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we remove waiting from _update_active_queue at all and insert it here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved implementation with wait to the separate function

replica_path = os.path.join(path, "replicas")
if not zk.exists(replica_path):
for replicated_object in replicated_objects:
# Actually rn in the ch (10.24) there are no secure way to determine that node is the root of replicated table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Actually rn in the ch (10.24) there are no secure way to determine that node is the root of replicated table.
# Actually there is no a reliable way to determine that node is the root of replicated table in the CH (24.10)

if not zk.exists(replica_path):
for replicated_object in replicated_objects:
# Actually rn in the ch (10.24) there are no secure way to determine that node is the root of replicated table.
# So we are accuming that if object not database then it is table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# So we are accuming that if object not database then it is table.
# So we are assuming that if object is not a database then it is a table.

Comment on lines 477 to 479
is_database = bool(
zk.get(replicated_object.path)[0] == REPLICATED_DATABASE_MARKER
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is casting to bool needed here ?

Comment on lines 500 to 501
if replicated_object.path not in databases_to_cleanup:
databases_to_cleanup[replicated_object.path] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using defaultdict for the databases_to_cleanup ?

@@ -37,6 +37,7 @@
"unfreeze_timeout": 10 * 60,
"restart_replica_timeout": 10 * 60,
"restore_replica_timeout": 10 * 60,
"drop_replica_timeout": 10 * 60,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok. We will adjust it if necessary

@aalexfvk aalexfvk merged commit 6502ec3 into main Oct 8, 2024
26 checks passed
@aalexfvk aalexfvk deleted the metadata_cleaner branch October 8, 2024 07:27
MikhailBurdukov added a commit that referenced this pull request Oct 14, 2024
aalexfvk pushed a commit that referenced this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants