-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support (I)FFT calculations #42
Comments
Thank you for your interest! I will look into it. I didn't find any FFT methods in boost.SIMD, unfortunately. |
I'm coming at this fairly naively, but would wrapping fftw with JNI calls may be an effective way to implement this? The fallback for unsupported architectures could be JTransforms. |
It might be effective, but it would also require either re-licensing |
Powers of 2 are fine. I understand they are more efficient that way (and Apache Commons also had this as a hard limitation). Other than that, here is what I specifically am using right now: A forward Fast Fourier Transform using a real double array as input which outputs a complex array (in the form of [real0, complex0, real1, complex1...]). I perform a multiplication of two of these "complex" arrays. I perform an inverse Fast Fourier Transform with scaling on this complex array and from that result I take just the reals. It might be possible to still wrap another library that uses SIMD. I'm currently trying to get Intel MKL to work for my project, but I can't get the linking to work in Gradle, however there are many libraries: https://github.com/project-gemmi/benchmarking-fft |
A couple insights:
We could also probably implement a SIMD-friendly decimation-in-frequency reduction ourselves, delegating the actual FFT for small arrays to Apache Commons Math (we have it as a dependency anyway). |
I'd skip Apache Commons actually. It converts its results to Complex class instances and when I rewrote it to use F64FlatArrays, compared to JTransforms, it was just very slow. I did find another list of libraries: https://community.vcvrack.com/t/complete-list-of-c-c-fft-libraries/9153 |
OK, thanks for the info! It's surprisingly hard to find a permissive-licensed library doing 1-D SIMD on doubles on this list, so the option of combining the existing SIMD capabilities with a Java FFT implementation is still viable. |
After some research, it looks doubtful that we'll be able to perform the decimation using existing SIMD capabilities, The problem is that we require complex multiplication for this, and all the libraries I've seen expect the complex data in interleaved format ( |
However many of the libraries, like Pffft already do use SIMD, so that should be plug-and-play right? It's only "pretty" fast though, but it does use SIMD, so I guess that would make it at least "quite" fast, haha. |
Yes, the previous comment was related to the option where we would decimate in Kotlin using |
|
I see, you're right. Maybe we can convert pffft to doubles? Not sure how this would affect the program (but I don't consider myself an active low-level programmer anyways, so I might be wrong). |
Oh here we go, with double support: https://github.com/marton78/pffft |
Hmm, might work. I'll have to look into it in more details later. |
Played a bit with the library. It seems that it doesn't support unaligned arrays, which is a problem, since JVM's double arrays are not aligned to 32 bytes (the array object normally is, but its header takes 16 bytes, so the data are only aligned to 16 bytes, and there's no ironclad guarantee even for that). And by "doesn't support" I mean it crashes hard with a segfault. |
marton78/pffft#22 created an issue. |
The maintainers closed the issue as "too much work to do". |
OK, I think I've actually managed to fix the unaligned access problems. Preliminary (really, really preliminary) benchmarks show very promising results. Like, our wrapper outperforms @Jolanrensen by the way, how do you multiply the complex arrays? And what is the typical size of the array? |
That's very promising! To multiply the results from JTransform (which are ordered [real, complex, real complex]), I wrote the following to multiply a in place with b: for (k in 0 until a.size / 2) {
val aRe = a[2 * k]
val aIm = a[2 * k + 1]
val bRe = b[2 * k]
val bIm = b[2 * k + 1]
a[2 * k] = aRe * bRe - aIm * bIm
a[2 * k + 1] = aRe * bIm + aIm * bRe
} (this might be able to be done more efficiently, not sure) |
Hmm, 250 is a really small array, I'd say. Not sure we'll see much benefit from SIMD here. Doing a more fair comparison, I start to see the difference at about 1024 complex elements (2048 doubles). |
I have a use case that includes complex mutilation of up to 4M (2^22) complex values (so 8M doubles), so a speedup from SIMD would be welcomed. With that said, our FFT code is the critical path, so any speedup there would trump optimizing the multiplication. |
Mostly a side note for future me: |
So, my solution seems to be working on Linux, both for me locally and on TeamCity. The other OSes, well... |
I must confess I'm a bit stuck with Windows build. I can't even link against the original version ( |
For my specific data engineering application, having a fast Fast Fourier Transform algorithm would be great to have. However, there are many applications that could benefit from FFT, like audio analysis.
Most math/vector libraries have some form of (inverse) FFT built in, like Numpy for Python, Apache Commons for Java (which is slow) or JTransforms (which is quite fast and multi threaded but doesn't use SIMD).
I think FFT could benefit a lot from the SIMD capabilities of Viktor.
This is confirmed by some papers of which this is just the first example I could find.
Intel MKL also has a form of FFT built in which uses SIMD (and maybe other Intel stuff) as well.
I don't know if boost SIMD already has FFT built in, but I wouldn't be surprised if it does.
The text was updated successfully, but these errors were encountered: