-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ds): add init_error for worker init error matrix #114
feat(ds): add init_error for worker init error matrix #114
Conversation
This PR contains major changes to implement the feature in #112 . If you have any suggestions about the implementation, I'd love to make further improvements. Then, I will add the required docs and tests in this PR or a new one. |
I believe it would be useful to have it in Crowd-Kit, but I have two comments due to the significance of the proposed changes.
|
Thanks for your timely reply!
|
It would be great to have docs and tests. Thank you for the clarifications. This new parameter is only applied at the very first iteration. I am wondering why this code performs an addition instead of an assignment. We already estimate these matrices from data, and right after that, we adjust them with |
Great observation! The addition serves to address the issue of incomplete initialization error matrices, particularly when conducting truth inference in a production environment. Essentially, in such environments, we might not have all the workers' initialization error matrices available for the truth inference process. To clarify, the worker's initialization error matrix is more akin to a "count matrix" rather than a "probability matrix" (I will provide additional documentation on this later). It resembles the errors in the M-step before the averaging process. Therefore, the addition operation on the error matrix functions as an accumulation. We combine the historical error matrix counts of the worker with the current task's error matrix counts to obtain an accumulated error matrix. In cases where we lack the initialization error matrices for certain workers, their error matrices remain unchanged. However, when available, we obtain an accumulated error matrix, which will later be averaged to derive the final "probability matrix." |
I'd still prefer to have the option to assign the passed For example, it might be useful for reproducibility to run the new aggregation with the same matrices as estimated before without making the new initialization of them. Could you please describe your scenario so I can understand it well? |
Sorry for the unclear illustration about the motivation of this implementation. Ours scenario is pretty simple actually: all the In this case, if we simply assign the passed So, what we can do to use the |
I think the idea of an extra method parameter is a good one. When using the assign method, in addition to making sure |
I see, thank you. Let's add it and I'll be happy to merge it. |
These issues have been fixed in |
Thanks for your quick fix! The ci works well now. Please review again. |
Close #112
Checklist