You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you have a DAG of binary operations, you can traverse it in some topological order and generate proper bitcodes for your GPU kernel. OmniSci does it on parsed SQL queries, and more specifically different filter combinations, etc. TensorFlow does it based on the equation that needs to be minimized for gradient descent. Can we do it for GraphBLAS?
If you want to keep your kernel code as simple as possible, with minimal branches, etc., then there's no way doing it at compile-time. Unless you know what you're going to solve at compile time for SQL queries and tensor flow optimizations, you don't know about the exact details of the queries/equations at compile-time.
Both TensorFlow and OmniSci do real-time code generation using LLVM.
The text was updated successfully, but these errors were encountered:
On Thu, May 2, 2019 at 12:29 PM Carl Yang ***@***.***> wrote:
If you have a DAG of binary operations, you can traverse it in some
topological order and generate proper bitcodes for your GPU kernel. OmniSci
does it on parsed SQL queries, and more specifically different filter
combinations, etc. TensorFlow does it based on the equation that needs to
be minimized for gradient descent. Can we do it for GraphBLAS?
If you want to keep your kernel code as simple as possible, with minimal
branches, etc., then there's no way doing it at compile-time. Unless you
know what you're going to solve at compile time for SQL queries and tensor
flow optimizations, you don't know about the exact details of the
queries/equations at compile-time.
Both TensorFlow and OmniSci do real-time code generation using LLVM.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMJ7L77LS34CJNFZ72JFKDPTM6LLANCNFSM4HKFMKUA>
.
If the user wants to run a few iterations of a for-loop, the amount of sparsity in the data (which affects which kernel is more optimal and should be chosen) after Iteration X is unknown at compile time. So the kernel fusion could vary each iteration of the loop in a data-dependent way
I'm inclined towards a shared library approach rather than header-only, in order to support a Python frontend that can be used with Jupyter notebook. The consequence is that the computation graph which depends on which ops the user wants is unknown at compile time (i.e. when the shared library is compiled).
If you have a DAG of binary operations, you can traverse it in some topological order and generate proper bitcodes for your GPU kernel. OmniSci does it on parsed SQL queries, and more specifically different filter combinations, etc. TensorFlow does it based on the equation that needs to be minimized for gradient descent. Can we do it for GraphBLAS?
If you want to keep your kernel code as simple as possible, with minimal branches, etc., then there's no way doing it at compile-time. Unless you know what you're going to solve at compile time for SQL queries and tensor flow optimizations, you don't know about the exact details of the queries/equations at compile-time.
Both TensorFlow and OmniSci do real-time code generation using LLVM.
The text was updated successfully, but these errors were encountered: