-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring MetaDataObject out of DenseMatrix #758
base: main
Are you sure you want to change the base?
Conversation
* This commit introduces the meta data object to the CSR data type * Memory pinning To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
* This commit introduces the meta data object to the CSR data type * Memory pinning To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
* This commit introduces the meta data object to the CSR data type * Memory pinning To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
* This commit introduces the meta data object to the CSR data type * Memory pinning To prevent excessive allocation ID lookups in the hot path when using --vec, this change "pins" memory by allocation type of previous accesses.
df4702e
to
cfd8053
Compare
… Pinning * This commit introduces the meta data object to the CSRMatrix data type To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix. * Separate handling of ranges Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that". * Memory pinning To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses. Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done. Closes daphne-eu#758
cfd8053
to
17d3baa
Compare
… Pinning * This commit introduces the meta data object to the CSRMatrix data type To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix. * Separate handling of ranges Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that". * Memory pinning To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses. Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done. Closes daphne-eu#758
17d3baa
to
d9d1b59
Compare
… Pinning * This commit introduces the meta data object to the CSRMatrix data type To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix. * Separate handling of ranges Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that". * Memory pinning To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses. Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done. Closes daphne-eu#758
d9d1b59
to
9016ae9
Compare
… Pinning * This commit introduces the meta data object to the CSRMatrix data type To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix. * Separate handling of ranges Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that". * Memory pinning To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses. Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done. Closes daphne-eu#758
9016ae9
to
6f6da3b
Compare
… Pinning * This commit introduces the meta data object to the CSRMatrix data type To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix. * Separate handling of ranges Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that". * Memory pinning To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses. Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done. Closes daphne-eu#758
6f6da3b
to
ce36921
Compare
The numerous force pushes are a result of my local clang-format disagreeing with the CI's clang-format: --- src/runtime/local/datastructures/AllocationDescriptorGRPC.h (original)
+++ src/runtime/local/datastructures/AllocationDescriptorGRPC.h (reformatted)
@@ -35,7 +35,7 @@
public:
AllocationDescriptorGRPC() = default;
AllocationDescriptorGRPC(DaphneContext *ctx, const std::string &address, const DistributedData &data)
- : ctx(ctx), workerAddress(address), distributedData(data) {};
+ : ctx(ctx), workerAddress(address), distributedData(data){};
~AllocationDescriptorGRPC() override = default;
[[nodiscard]] ALLOCATION_TYPE getType() const override { return type; }; |
Explaining the labels:
|
…not throw Changing the behavior of fileExists() to a boolean operation as suggested by the method's name. Throwing an exception us up to the caller of this method. Closes daphne-eu#867
… Pinning * This commit introduces the meta data object to the CSRMatrix data type To implement this change, handling of the AllocationDescriptors has been refactored out of DenseMatrix. * Separate handling of ranges Since tracking of ranges of data is only used in the distributed setting for now, we will handle this separately and assume always a full allocation for local computation. This should result in less unnecessary "if range not null do this, else do that". * Memory pinning To prevent excessive allocation ID lookups in the hot path, especially when using --vec, this change "pins" memory by allocation type of previous accesses. Simply put, as long as there is no different access type (e.g., call getValues() for host vs device memory) it is assumed, that the data is not changed and no query of the meta data object needs to be done. Closes daphne-eu#758
Due to the use of ptr to local var the distributed (GRPC_SYNC) mode crashed in test cases. This patch fixes this by using std::unique_ptr appropriately. Furthermore, a check for nullptr is performed before getting distributed data to add a message indicating that execution failed here.
ce36921
to
d434bf5
Compare
Hi @corepointer, thanks for putting so much effort into improving this code. This PR is pretty big and includes quite a few important changes at the core of our data structures. As there's a lot of changes to this core code, but no changes to the documentation, could you please update the documentation describing the overall idea behind the whole Especially the knowledge that is hard to get from just reading the code, like what it can and what it cannot do (and what it should do?). From what I can remember of previous discussions and you mention "ranged and full allocations", but maybe I'm wrong, it should handle parts of a Additionally, it would be great if this can explicitly be tested as part of our test suite. Proper testing of this core functionality would be incredibly beneficial for working on this code in the future. This should be doable by providing mock implementations of the Lastly, could you please provide some performance results for these changes that measure these improvements you mention concerning excessive allocation ID lookups? I'd also be happy to do a review of the PR if you like. |
I was also just finishing reading the code: Thanks for this contribution, @corepointer. Refactoring the use of the meta data objects to also support Major comments:
A few detailed comments:
|
Thanks for the feedback @philipportner and @pdamme. And thanks for a taste of my one medicine (I frequently note the lacking documentation and test cases in code reviews) 😆 I will, of course, fix this.
|
This PR moves the MetaDataObject (MDO) functionality out of DenseMatrix and generalizes it to be used by other classes derived from Structure as well.
Furthermore, this contains a performance improvement to prevent excessive allocation ID lookups and a separation of ranged and full allocations.
All tests are running except the distributed ones.