-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] kmeans.daph #876
base: main
Are you sure you want to change the base?
[WIP] kmeans.daph #876
Conversation
Hi @Garic152 , thanks for working on this. Our code generation pipeline is currently not run by default and only executed by adding When you run Looking at the stack trace you provided, if you pipe if through
Seems like the problem is triggered by the I'd suggest compiling with Otherwise, it would be good if you can create a minimal reproducible example that triggers this problem and create an issue :) |
My bad, I missed that you already included a reproducible example. Here's the backtrace:
|
After diving into some more debugging, I now came across another very weird error, this time in line 123. This error occurs during the execution of an element-wise multiplication operation within the
As the matrices themselves should work totally fine when printed and the EWMin operation on itself also works great, I did another analysis with gdb like @philipportner recommended. This lead to the following backtrace:
There where several things i noticed when analyzing the backtrace: In frame 13 inside of In frame 12 (
This could indicate that the error seems to originate from frame 11 (getValuesInternal()) or even before that. This is the
This is unfortunately the point where I am not sure anymore whether the Structure behaves normally or not because so far I didn't come into contact with the data structure generation part of Daphne at all. 4 Side notes that may help to better identify the source of the error:
I will push another reproducible error file |
I tried running it locally on the pr #758 as this PR refactors code concerning the The ternary at When adding this diff, the type is still min_distances = i == 1 ? as.matrix<f64>(is_row_in_samples * distances) : as.f64(min(min_distances, distances));
+ print(typeOf(min_distances)); |
Translating the script turned out to be more of a challenge than expected. After going through the code and comparing the auto-translated code variables line by line with the kmeans.dml output, there were many extremely sneaky errors, mostly due to the 0 vs. 1 based indexing translation errors, especially in the ctable function and also the reshape function, which does not reshape column by column (like in systemds), but row by row (maybe a functionality to choose between these 2 methods would be useful in daphne?) After fixing these errors and making sure every operations does what it is supposed to when being compared to systemds, I still once again arrived at the segfault error I initially mentioned regarding the k-means-iteration step in line 153. I tried reproducing this error in another file:
I replaced all variables and calculations with much simpler terms and also removed unnecessary parts of the original code that didn't have anything to do with the error to improve the codes debuggability. When running the code, the while loop runs once, and after assigning C_new to C like in the kmeans algorithm, the contents of C are somehow corrupted. I took a look at the IR, but didn't notice anything particularly wrong, the code also runs fine in numpy so there shouldn't be any logic errors. I also checked all the types, but there weren't any visible mistakes here either. I would really appreciate some advice here, as this bug is unfortunately something I just cannot get behind. |
I have no come to a state of the code where it's functionality should be ready very soon, but had to include many type conversions and small workarounds to make it work.
Some things I noticed where:
x = y == 1 ? 1 : 2.0
), having outputs of different types leads to errorsAfter my last adjustment of the file in line 188-190 which changed the values to be inserted from si64 to f64 for
insert_col
to work, i now have another error message of this type:When using the
--explain
function, all passes up tollvm
work fine, but themlir_codegen
pass doesn't seem to work. I have included a small test file in this PR that reproduces the error message.After fixing the remaining errors I will first test the functionality of the translated algorithm and then work on proper formatting and commenting.