opt: optimize cluster identification #3032

tediou5 · 2024-09-18T10:31:01Z

For the cache, the modification is quite simple — just generate a random Cache ID each time and derive sub-IDs from it.

However, it's a bit different for the farmer. Currently, the logic for deriving the farmer's fingerprint is almost the same as that for the farm's fingerprint, but the information included is insufficient to fully describe whether the farm's state has changed between two restarts.

As a result, the current implementation does not support fast restarts. However, we have still achieved our initial goal: reducing identification messages to one per farmer and retrieving farm details in a steady stream.

Therefore, I will submit this PR first and implement the fast restart feature in a subsequent PR.

Code contributor checklist:

I have read, understood and followed contributing guide

teor2345

I’m not familiar enough with this code to review in detail, but I did find this typo:

crates/subspace-farmer/src/cluster/cache.rs

teor2345 · 2024-09-20T00:44:31Z

This looks like a known issue in test_bad_receipt_chain, so I'm going to re-run these tests:
https://github.com/autonomys/subspace/actions/runs/10933026831/job/30402683752?pr=3032#step:12:1387

nazar-pc

I tried reviewing, but I don't understand why the refactoring was necessary, in most cases it was at most a stylistic choice

nazar-pc · 2024-09-20T01:03:43Z

crates/subspace-farmer/src/bin/subspace-farmer/commands/cluster/controller/farms.rs

@@ -32,8 +34,6 @@ use tracing::{error, info, trace, warn};
 type AddRemoveFuture<'a> =
    Pin<Box<dyn Future<Output = Option<(FarmIndex, oneshot::Receiver<()>, ClusterFarm)>> + 'a>>;

-pub(super) type FarmIndex = u16;


This was meant to be a generic, cluster doesn't need to know what the type is, we could have used u8, but decided to allow for more than 256 farms in cluster mode. Why was this necessary?

just move to subspace_farmer::cluster::farmer, because I want to use it in ClusterFarmerIdentifyBroadcast

There should be no need to send it in there, it seems to be an artifact of ID derivation mechanism

nazar-pc · 2024-09-20T01:09:01Z

crates/subspace-farmer/src/farm.rs

+
+    /// Derive sub IDs
+    #[inline]
+    pub fn derive_sub_ids(&self, n: usize) -> Vec<Self> {


This makes no sense to me, can you explain what this is supposed to mean?

Using a child id derived from a farmer or cache avoids the need to transfer farm ids, which would make it redundant to randomize new ids for each farm when the farmer or cache is already randomly generating ids.

Still makes no sense to me, sorry.

This is farm ID, just one of the many farms the farmer is managing. Other farms have different IDs. I don't understand what this derivation has to do with any of that.

I can remove him if you don't think it's necessary, it doesn't affect the overall logic here

I'm trying to understand why it was added in the first place and so far it doesn't really make sense to me

This does make it confusing, maybe I should change the type to FarmerId.

It makes no sense to me either way. You need to transfer the same information that was transferred before these changes anyway.

Either way we need to know what farms are inside the farmer, recorded or derived, so you prefer to record the farms_id inside the farmer in the controller.

There are already farm_ids of individual farms in the controller. What needs to be is to group them by farmer ID somehow and only transfer details once during initialization.

I know what you mean. I'll try it later.

teor2345 · 2024-09-20T01:32:45Z

This looks like a known issue in test_bad_receipt_chain, so I'm going to re-run these tests: autonomys/subspace/actions/runs/10933026831/job/30402683752?pr=3032#step:12:1387

Hmm, it failed again:
https://github.com/autonomys/subspace/actions/runs/10933026831/job/30407714765?pr=3032#step:12:1357

Not sure if this PR, other recent changes, or infrastructure changes are triggering it.

tediou5 · 2024-09-20T02:29:10Z

This looks like a known issue in test_bad_receipt_chain, so I'm going to re-run these tests: autonomys/subspace/actions/runs/10933026831/job/30402683752?pr=3032#step:12:1387

Hmm, it failed again: https://github.com/autonomys/subspace/actions/runs/10933026831/job/30407714765?pr=3032#step:12:1357

Not sure if this PR, other recent changes, or infrastructure changes are triggering it.

I will check it later

nazar-pc · 2024-12-07T10:29:25Z

Closing due to lack of progress and numerous merge conflicts. Feel free to reopen later. We also have improved infrastructure for stream requests now resulting in less boilerplate.

teor2345 · 2024-12-09T00:50:59Z

This didn’t seem to actually get closed, I’ll do it now.

tediou5 added 8 commits September 18, 2024 18:21

chore: moving FarmIndex into cluster::farmer

f4f409b

chore: tiny refactoring for cluster farm

c1d4a22

feat: derive sub farm_ids from a farm_id

2017568

opt: optimize farm identification

1c70a89

feat: derive sub cache_ids from a cache_id

3f285f4

chore: tiny refactoring for cluster cache

8087b87

opt: optimize cache identification

db67397

chore: add debug log when farm or cache updating last identification

1fa5cdd

tediou5 requested review from nazar-pc, shamil-gadelshin and rg3l3dr as code owners September 18, 2024 10:31

teor2345 reviewed Sep 18, 2024

View reviewed changes

crates/subspace-farmer/src/cluster/cache.rs Outdated Show resolved Hide resolved

fix typo

fdad7e5

nazar-pc requested changes Sep 20, 2024

View reviewed changes

teor2345 closed this Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: optimize cluster identification #3032

opt: optimize cluster identification #3032

tediou5 commented Sep 18, 2024

teor2345 left a comment

teor2345 commented Sep 20, 2024

nazar-pc left a comment

nazar-pc Sep 20, 2024

tediou5 Sep 20, 2024

nazar-pc Sep 20, 2024

nazar-pc Sep 20, 2024

tediou5 Sep 20, 2024

nazar-pc Sep 20, 2024

tediou5 Sep 20, 2024

nazar-pc Sep 20, 2024

tediou5 Sep 20, 2024

nazar-pc Sep 20, 2024

tediou5 Sep 20, 2024

nazar-pc Sep 20, 2024

tediou5 Sep 20, 2024

teor2345 commented Sep 20, 2024

tediou5 commented Sep 20, 2024

nazar-pc commented Dec 7, 2024

teor2345 commented Dec 9, 2024

opt: optimize cluster identification #3032

opt: optimize cluster identification #3032

Conversation

tediou5 commented Sep 18, 2024

Code contributor checklist:

teor2345 left a comment

Choose a reason for hiding this comment

teor2345 commented Sep 20, 2024

nazar-pc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

teor2345 commented Sep 20, 2024

tediou5 commented Sep 20, 2024

nazar-pc commented Dec 7, 2024

teor2345 commented Dec 9, 2024