Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the forward transform #10

Open
4 tasks
lu-zero opened this issue Oct 18, 2016 · 24 comments
Open
4 tasks

Implement the forward transform #10

lu-zero opened this issue Oct 18, 2016 · 24 comments
Assignees

Comments

@lu-zero
Copy link
Owner

lu-zero commented Oct 18, 2016

Functions to implement for this task:

  • vpx_fdct4x4_vsx
  • vpx_fdct8x8_vsx
  • vpx_fdct16x16_vsx
  • vpx_fdct32x32_vsx
@sasshka
Copy link
Collaborator

sasshka commented Oct 6, 2017

Is this task in progress or could I take it?

@rafaeldelucena
Copy link
Collaborator

Hello @sasshka, I had some problems at the beginning but I managed to make the first transformation, I'll publish it soon.

I also noticed that most PPC codes for other transforms do not compile with high bitdepth, so I'll probably narrow the scope of this issue.

@lu-zero
Copy link
Owner Author

lu-zero commented Oct 17, 2017 via email

@rafaeldelucena
Copy link
Collaborator

Okay! list updated x)

@rafaeldelucena
Copy link
Collaborator

Until now I got about 35% of improvement making a VSX version of vpx_fdct4x4_vsx. Which would be considered the minimum acceptable?

Note: Google Test filter = *Trans4x4DC*.DISABLED*
[==========] Running 2 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from C/Trans4x4DCT
[ RUN      ] C/Trans4x4DCT.DISABLED_Speed/0
Fdct4x4[          10 runs]: 6 us
Fdct4x4[       10000 runs]: 600 us
Fdct4x4[    10000000 runs]: 602526 us
[       OK ] C/Trans4x4DCT.DISABLED_Speed/0 (604 ms)
[----------] 1 test from C/Trans4x4DCT (604 ms total)

[----------] 1 test from VSX/Trans4x4DCT
[ RUN      ] VSX/Trans4x4DCT.DISABLED_Speed/0
Fdct4x4[          10 runs]: 2 us
Fdct4x4[       10000 runs]: 384 us
Fdct4x4[    10000000 runs]: 383780 us
[       OK ] VSX/Trans4x4DCT.DISABLED_Speed/0 (384 ms)
[----------] 1 test from VSX/Trans4x4DCT (384 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (988 ms total)
[  PASSED  ] 2 tests.

@lu-zero
Copy link
Owner Author

lu-zero commented Nov 19, 2017

4x4 has a tiny kernel so I'd be already happy with this initial speedup, you can compare with the x86_64 and arm64 variants to see if it is in line with those.

8x8 and 16x16 should have a more substantial speedup though.

@lu-zero
Copy link
Owner Author

lu-zero commented Feb 1, 2018

Any progress on this?

@rafaeldelucena
Copy link
Collaborator

I've rebased with upstream but some adjusts are needed, ASAP I'll create the PR to WebM repository.

@lu-zero
Copy link
Owner Author

lu-zero commented May 10, 2018

Any news about this? :) (please CC me and david in gerrit)

@rafaeldelucena
Copy link
Collaborator

Hello Luca,

After rebasing some things broken (specially at store instructions) and looking at implementations for other architectures, they implement the Forward Transform, using operation with columns to reuse on 8x8, 16x16, ...

My implementation doesn't to this, so maybe I'll struggle to go for bigger matrices (will require some refactory).

For now I'm without time to complete this, maybe at the end of the month, but if you want to complete this, please go on.

Sorry for the delay!

[]s

@lu-zero
Copy link
Owner Author

lu-zero commented Jun 4, 2018

Any news on that?

@rafaeldelucena
Copy link
Collaborator

rafaeldelucena commented Jun 5, 2018

I went back to work on this task, but I have not finished yet.

I'm redoing the algorithm in a simpler way, after have written down the steps on octave, I think I'll have something to deliver by the end of the week.

@lu-zero
Copy link
Owner Author

lu-zero commented Jun 5, 2018

That's great :) Please CC me and David when you push to gerrit, looking forward to it :)

@lu-zero
Copy link
Owner Author

lu-zero commented Sep 2, 2018

@rafaeldelucena usual ping.

@rafaeldelucena
Copy link
Collaborator

I have implemented the fdct4x4, but I'm still not satisfied with the performance gain.

I'll do some adjustments and create a pull request to upstream.

@lu-zero
Copy link
Owner Author

lu-zero commented Dec 10, 2018

Hey, any news?

@rafaeldelucena
Copy link
Collaborator

Hi!

I created a PR to upstream for fdct4x4, https://chromium-review.googlesource.com/c/webm/libvpx/+/1360172

@sasshka
Copy link
Collaborator

sasshka commented Feb 14, 2019

Hey rafaeldelucena,
Are you still working on the issue? I'd take it if you don't have time to finish it.

@rafaeldelucena
Copy link
Collaborator

Hello @sasshka you can take it!

My last PR is in https://chromium-review.googlesource.com/c/webm/libvpx/+/1404181

@shawnl
Copy link

shawnl commented May 15, 2019

vpx_fdct32x32_vsx was implemented by Luc Trudeau [email protected]
@luctrudeau in dc93b62
What is the status on this? Is there still a bounty for this, and what about high bit-depth?

@lu-zero
Copy link
Owner Author

lu-zero commented May 15, 2019

@sasshka is working on that and I want to ask additional bounties for high bit-depth.

@shawnl
Copy link

shawnl commented May 16, 2019

I want to ask additional bounties for high bit-depth.

Please do. There are not many bounties available right now. Most have been fixed by internal people and maintainers that never claimed the bounties, which leaves the OpenBLAS ones that the man from Azerbaijan @quickwritereader is working on, the NumPy one, and the last libmvec one (pow powf), which I have made progress on.

I also accelerated WireGuard with VSX on my own (using code from openssl), including allowing simd during interrupts.

https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=107892
https://lists.zx2c4.com/pipermail/wireguard/2019-May/004149.html

@sasshka
Copy link
Collaborator

sasshka commented May 16, 2019 via email

@shawnl
Copy link

shawnl commented May 16, 2019

Sorry for the misunderstanding. I will stick to BountySource's interface, and keep here to technical discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants