-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Io_uring integration #736
base: main
Are you sure you want to change the base?
Io_uring integration #736
Conversation
compilation issue caused by fully generic free == operator
- Fixed minor conflicts that could not be resolved automatically: - src/runtime/local/datastructures/Matrix.h - operator<< was added upstream - src/runtime/local/kernels/CheckEq.h - kernel specialization for superclass Matrix was added upstream - I moved the implementation to a new operator== in Matrix, and replaced the checkEq implementation by a `*lhs == *rhs`, just like it was done for the other data structures in this PR
- Removed CSRMatrix::getNumItems() since it's inherited from Matrix. - Made the constructors and destructors of ContiguousTensor/ChunkedTensor private. - For consistency with DenseMatrix, CSRMatrix, and Frame. - DataObjectFactory is meant to be the only point where the constructors and destructors of DAPHNE data objects are called. - Throw an exception instead of returning 0 in ContiguousTensor/ChunkedTensor::serialize(). - Otherwise, the error could easily be ignored. - Throw an exception instead of calling std::abort() in Tensor::slice()/sliceRow()/sliceCol(). - Otherwise, it's not clear what consequences an uncontrolled program termination could have (e.g., no clean-up of resources). - Some more minor changes. - E.g., related to comments, year in license headers of new files, etc.
basic read/write operations + Some tests
Maybe a couple small clarifications
Here I meant that you can not just e.g. cmakes FetchContent_ functions, as liburing does not have a cmake based build. And for the possibility to handle it via a submodule/simply wget-ing it and as well as building it via the build.sh, as is the case for most of the other deps, the default install step uses sudo, since the default install location is /user, which can be changed. But the installing still fails without sudo if these are set to somewhere where sudo is not needed. I havent looked into what exactly is failing in the script yet, since it is a bunch of sed hieroglyphics, but it definitively does not work out of the box without sudo, even if the paths are set to somewhere where permissions are not an issue. So the options here would be to skip the default install step and manually move the header and static lib where they belong in the daphne repo (i.e. thirdparty/installed/bin and lib) in the build.sh or rely on the default install, but then require sudo for the build.sh. Since this is about the build system and imo all options are not great, i did not go ahead and simply pick one. @pdamme what would be the preference here?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good.
But before we merge, we need to clarify a few things:
a) Is C++ standard 20 ok? It seems risky to just change it without understanding the consequences.
b) io_uring must be automatically fetched and linked. This will also become relevant in #656.
c) There are many magic constants in your code. I would like you to comment, at least, why they are what they are. to increase maintainability.
d) Can you provide a sample .daphne script that utilises your io_uring implementation?
I would like @pdamme's review as well.
CMakeLists.txt
Outdated
@@ -29,7 +29,7 @@ project(daphne-prototype LANGUAGES CXX C) | |||
|
|||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) | |||
set(CMAKE_BUILD_WITH_INSTALL_NAME_DIR ON) | |||
set(CMAKE_CXX_STANDARD 17 CACHE STRING "C++ standard to conform to") | |||
set(CMAKE_CXX_STANDARD 20 CACHE STRING "C++ standard to conform to") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we understand the consequences of changing the C++ standard for the DAPHNE use-cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was already some initial discussion with @pdamme in #694 regarding this. I think moving to a C++20-compliant compiler is fine as long as the documentation is clear about this. It might be a good idea to separate a potential switch to C++20 out of this PR and push it isolated so that we can (a) bring it in quicker and (b) revert if we realize that things break. Accordingly, the CI would need to be updated as well.
Re: Use Cases: Do we know anything about the environments where Daphne is run? Looking at our current dependency management I doubt that we do.
Thanks for this PR, @CPestka and @MarcusParadies, and thanks for your helpful review, @niclashedam. I will look at the code by today EOD and share my comments. In general I'd say if we need C++20 for a certain component, we should upgrade; if that's problematic in certain relevant cases/environment, introducing a switch like @MarcusParadies suggests sounds like a good idea to me. |
While I don't agree with any, but the loop bound in IO_Threadpool.cpp, to be magic numbers, as they are (mostly) sensibly (imo) named constants, I moved some of them to (defaulted) function/constructor parameters and added some comments to the rest.
No. As stated in the beginning.
As was discussed in the meetings the entire work for the tensor / zarr / io_uring integration was supposed to be divided into smaller prs. This Pr only provides the async i/o capabilities + unit tests. #694, #742, #741 + this pr + some more follow up prs will be the base for the final pr, which will introduce the dsl integration.
The linking part is already done via CMake here. As for fetching and building liburing, again I first want to know what way of addressing the dep would be ok with everybody. I could just go a head and simply add the 4 lines needed for version 2. to the build script, but that comes with the caveats mentioned earlier.
C++20: @pdamme suggested in #694 that I should go ahead and use it in the follow up prs, which I did here. I agree though that it may be better do the switch in a separate pr, which should also update the minimum compiler versions in the docu as well as update the ci setup. And speaking of the ci. Maybe while it is being updated anyways, we could add some sanitizer runs to the CI. I added switches to toggle the different sanitizers here, but did not wire it up to the ci. However running daphne currently with any of the -fsanitize options simply crashes for me. Running the daphne tests with valgrind unfortunatly takes a very long time, as it is in much slower than the build in sanitizer in my experience, but does not crash. As running valgrind over all tests just took to long I ran it only over a small subset of the tests (upstream/main; ./test.sh [kernels]) which resulted in 19.9k lines of error output (see attached log file), which some may argue may be a bit of an issue. |
When i wrote this i assumed other op types would likely never be needed here (e.g. all the networking or utility stuff), so what is currently there only supports simple read and write ops. Adding support for another op would not be very hard though, essentially this would mean adding another member to the ops enum, adding a meta data struct and adding the SubmitYourOppTypeHere() / Handle..() functions. I haven't looked at the actual code in the csd integration, but i would assume you would need IORING_OP_URING_CMD for nvme passthrough (right?). Again i assumed other ops would not be needed here so adding new ops is a bit verbose in the current setup and e.g. switching to unions tagged with the op type would probably be a good idea then, as that would reduce additional code duplication. Generally though, as I dont really know what your requirements are, Im not sure if this implementation gives you any advantages at all. If you "only" submit one request, block till you receive the cqe and then carry on and dont want to share your io_uring instance(s), then having more than one instance is pointless and having a simpler completion interface that does not care about ooo cqe arrival does not give you any benefit, as you dont share your instance and thus only ever have one request in flight at once. So adding the capabilities you need to this may not give you any real benefit and may be more code than simply writing what you need without integrating it with this. On the other hand it may still be nice to have all io_uring related code in one place. I dont really have strong feelings about either approach here.
Not sure what you mean here. |
I assume the current build failure is due to the ci machine pulling the docker image rather than using the now updated build.sh |
That took some time to catch up on this conversation 😰
|
Yes, we would submit one or a few SQEs to a private queue in CS. We would not block but instead, check for CQE at a later point in time. As for A, integration with other areas of the system is not necessarily straightforward as they can cause dependency issues or other unwanted side-effects. However, your implementation seems to be quite isolated, so I doubt this is a problem. |
Adding dependency to the build system that is required by some upcoming PRs. Co-authored-by: Constantin Pestka <[email protected]>
Adding dependency to the build system that is required by some upcoming PRs. Co-authored-by: Constantin Pestka <[email protected]>
compilation issue caused by fully generic free == operator
- Removed CSRMatrix::getNumItems() since it's inherited from Matrix. - Made the constructors and destructors of ContiguousTensor/ChunkedTensor private. - For consistency with DenseMatrix, CSRMatrix, and Frame. - DataObjectFactory is meant to be the only point where the constructors and destructors of DAPHNE data objects are called. - Throw an exception instead of returning 0 in ContiguousTensor/ChunkedTensor::serialize(). - Otherwise, the error could easily be ignored. - Throw an exception instead of calling std::abort() in Tensor::slice()/sliceRow()/sliceCol(). - Otherwise, it's not clear what consequences an uncontrolled program termination could have (e.g., no clean-up of resources). - Some more minor changes. - E.g., related to comments, year in license headers of new files, etc.
basic read/write operations + Some tests
I fixed the conflicts and rebased on main. Will force push after running tests again. |
Moving the linking to libAllKernels fixes linker errors when building binaries that depend on libAllKernels.
6e2b1d8
to
6c4fc1a
Compare
To be honest, I overlooked that request to make it optional. But as sudo is not required anymore, I regard this optionality as optional 😆 scnr Even more so as this dependency is very lightweight and does compile just fine. |
Two more things:
|
Basic io_uring integration for async i/o
This Pr introduces a simple multi-threaded wrapper around io_uring that aims to eliminate some of the intricacies of io_uring in general and more specifically using it as a shared resource, from multiple threads (distribution of requests, keeping state of request meta data, out-of-order arrival, request re-submission, threadsafe access to the actual rings, doing the entire thing with at least ok performance, etc.)
Goal
This Pr is intended to be the base of / alongside a couple follow up Pr's that will introduce Zarr support, "async support" for a few kernels (like the Aggregate kernel) and wiring this up through the compiler with the DSL, to finally show a sensible async i/o + processing example accessible from the DSL level.
This Pr, thus provides the async io part of that goal and the scope of this implementation is thus mostly limited to io_uring features that are needed for this (i.e. mostly just simple read/write ops) and you won't find any of the fancier features of io_uring here (although most could be added, if somebody needs them, relatively easily). This goes similarly for the multi-threading part of this. It is rather basic and could definitively be improved, but I think it is ok for now.
One thing that will need to be addressed before this could be merged
How to address the liburing dependency. For now I just added the linking of the lib to the CMake stuff, but did not add any way to automatically fetch and build the dependency. The last time I needed liburing as a dependency for a cpp project i did not find a "clean" "CMake-native" way to add liburing as a dependency (but i did not look to closely, and that was some time ago, so that could have changed in the mean time). What would be the preferred way to add this dependency (submodule, simply add it to the dependency list, looking for an "native" CMake solution etc.)? I vaguely remember that liburing already came up in the CSD context. I had a quick look at the fork, but did not find the place where liburing got handled there. How has it been resolved there?
Other possible things to be discussed: