Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: optimize cluster identification #3032

Closed

Conversation

tediou5
Copy link
Contributor

@tediou5 tediou5 commented Sep 18, 2024

close #2900

For the cache, the modification is quite simple — just generate a random Cache ID each time and derive sub-IDs from it.

However, it's a bit different for the farmer. Currently, the logic for deriving the farmer's fingerprint is almost the same as that for the farm's fingerprint, but the information included is insufficient to fully describe whether the farm's state has changed between two restarts.

As a result, the current implementation does not support fast restarts. However, we have still achieved our initial goal: reducing identification messages to one per farmer and retrieving farm details in a steady stream.

Therefore, I will submit this PR first and implement the fast restart feature in a subsequent PR.

Code contributor checklist:

Copy link
Contributor

@teor2345 teor2345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not familiar enough with this code to review in detail, but I did find this typo:

crates/subspace-farmer/src/cluster/cache.rs Outdated Show resolved Hide resolved
@teor2345
Copy link
Contributor

This looks like a known issue in test_bad_receipt_chain, so I'm going to re-run these tests:
https://github.com/autonomys/subspace/actions/runs/10933026831/job/30402683752?pr=3032#step:12:1387

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reviewing, but I don't understand why the refactoring was necessary, in most cases it was at most a stylistic choice

@@ -32,8 +34,6 @@ use tracing::{error, info, trace, warn};
type AddRemoveFuture<'a> =
Pin<Box<dyn Future<Output = Option<(FarmIndex, oneshot::Receiver<()>, ClusterFarm)>> + 'a>>;

pub(super) type FarmIndex = u16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was meant to be a generic, cluster doesn't need to know what the type is, we could have used u8, but decided to allow for more than 256 farms in cluster mode. Why was this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just move to subspace_farmer::cluster::farmer, because I want to use it in ClusterFarmerIdentifyBroadcast

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no need to send it in there, it seems to be an artifact of ID derivation mechanism


/// Derive sub IDs
#[inline]
pub fn derive_sub_ids(&self, n: usize) -> Vec<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes no sense to me, can you explain what this is supposed to mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a child id derived from a farmer or cache avoids the need to transfer farm ids, which would make it redundant to randomize new ids for each farm when the farmer or cache is already randomly generating ids.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still makes no sense to me, sorry.

This is farm ID, just one of the many farms the farmer is managing. Other farms have different IDs. I don't understand what this derivation has to do with any of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove him if you don't think it's necessary, it doesn't affect the overall logic here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why it was added in the first place and so far it doesn't really make sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make it confusing, maybe I should change the type to FarmerId.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no sense to me either way. You need to transfer the same information that was transferred before these changes anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way we need to know what farms are inside the farmer, recorded or derived, so you prefer to record the farms_id inside the farmer in the controller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already farm_ids of individual farms in the controller. What needs to be is to group them by farmer ID somehow and only transfer details once during initialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean. I'll try it later.

@teor2345
Copy link
Contributor

This looks like a known issue in test_bad_receipt_chain, so I'm going to re-run these tests: autonomys/subspace/actions/runs/10933026831/job/30402683752?pr=3032#step:12:1387

Hmm, it failed again:
https://github.com/autonomys/subspace/actions/runs/10933026831/job/30407714765?pr=3032#step:12:1357

Not sure if this PR, other recent changes, or infrastructure changes are triggering it.

@tediou5
Copy link
Contributor Author

tediou5 commented Sep 20, 2024

This looks like a known issue in test_bad_receipt_chain, so I'm going to re-run these tests: autonomys/subspace/actions/runs/10933026831/job/30402683752?pr=3032#step:12:1387

Hmm, it failed again: https://github.com/autonomys/subspace/actions/runs/10933026831/job/30407714765?pr=3032#step:12:1357

Not sure if this PR, other recent changes, or infrastructure changes are triggering it.

I will check it later

@nazar-pc
Copy link
Member

nazar-pc commented Dec 7, 2024

Closing due to lack of progress and numerous merge conflicts. Feel free to reopen later. We also have improved infrastructure for stream requests now resulting in less boilerplate.

@teor2345
Copy link
Contributor

teor2345 commented Dec 9, 2024

This didn’t seem to actually get closed, I’ll do it now.

@teor2345 teor2345 closed this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize farm and cache identification
3 participants