Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First batch of shortcuts to HLO ops #267

Merged
merged 99 commits into from
Nov 29, 2024
Merged

First batch of shortcuts to HLO ops #267

merged 99 commits into from
Nov 29, 2024

Conversation

mofeing
Copy link
Collaborator

@mofeing mofeing commented Nov 12, 2024

  •  implement ops
  • test ops
  • rewrite TracedR* methods

@mofeing mofeing changed the title implement unary ops implement shortcuts to HLO ops Nov 12, 2024
@mofeing mofeing changed the title implement shortcuts to HLO ops shortcuts to HLO ops Nov 12, 2024

This comment was marked as outdated.

Copy link
Collaborator

@Pangoraw Pangoraw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

src/Ops.jl Outdated Show resolved Hide resolved
src/Ops.jl Show resolved Hide resolved
@wsmoses
Copy link
Member

wsmoses commented Nov 20, 2024

so CI apparently fails

@mofeing
Copy link
Collaborator Author

mofeing commented Nov 20, 2024

so CI apparently fails

should be fixed now

I've left some ops without testing due to bugs (mainly #196 but also some weird problem generating the IRCode for Ops.polygamma) or complex test configuration (like send, recv, ...). should I use @test_broken for the tests with bugs?

@wsmoses macOS-x64 is failing because it can't find symbol sincos on the stablehlo.cosine test 🙃
I think it's still safe to merge because that tests passes locally in my M1 Mac, but it will break CI

@wsmoses
Copy link
Member

wsmoses commented Nov 20, 2024

test_broken is fine if it doesn't crash, a commented or disabled test is also fine

@mofeing
Copy link
Collaborator Author

mofeing commented Nov 20, 2024

then I will leave it commented because some crash

@mofeing
Copy link
Collaborator Author

mofeing commented Nov 21, 2024

@wsmoses any idea of how to solve macOS-x86?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reactant.jl Benchmarks

Benchmark suite Current: c964ca3 Previous: 4981557 Ratio
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1228899804 ns 1257741783 ns 0.98
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1228857056 ns 1443950901 ns 0.85
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1187142008 ns 1385605290 ns 0.86
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2744157498 ns 2794781252 ns 0.98
ViT base (256 x 256 x 3 x 32)/forward/CUDA/Lux 201183306 ns 203499640 ns 0.99
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 5148362884 ns 5710482855 ns 0.90
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant 5169143859 ns 5341487680 ns 0.97
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 5836423713 ns 6062348395 ns 0.96
ViT base (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 7725273905 ns 7286146462 ns 1.06
ViT base (256 x 256 x 3 x 32)/forward/CPU/Lux 31675920029 ns 34742615164 ns 0.91
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1183799363 ns 1198386252 ns 0.99
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1171359615 ns 1183740364 ns 0.99
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1190391617 ns 1226242113 ns 0.97
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 3022041013 ns 2824009338 ns 1.07
ViT small (256 x 256 x 3 x 4)/forward/CUDA/Lux 8589423 ns 8581758.5 ns 1.00
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1607601251 ns 1659437123 ns 0.97
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant 1609918717 ns 1663747727 ns 0.97
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1612382221 ns 1575216589 ns 1.02
ViT small (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3613423758 ns 3330508539 ns 1.08
ViT small (256 x 256 x 3 x 4)/forward/CPU/Lux 3155788085 ns 2697236030 ns 1.17
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1222408843 ns 1177894801 ns 1.04
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1213439326 ns 1231369417 ns 0.99
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1393287668 ns 1281022614 ns 1.09
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 2928373979 ns 2992466689 ns 0.98
ViT tiny (256 x 256 x 3 x 32)/forward/CUDA/Lux 22794435.5 ns 22743533 ns 1.00
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 2246173769 ns 2143636724 ns 1.05
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant 2279844491 ns 2146473226 ns 1.06
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 2244477597 ns 2157303648 ns 1.04
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 4159307861 ns 3959878673 ns 1.05
ViT tiny (256 x 256 x 3 x 32)/forward/CPU/Lux 5673307318 ns 5865019185 ns 0.97
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1219499596 ns 1387842900 ns 0.88
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1339647904 ns 1256093379 ns 1.07
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1394076872 ns 1257804932 ns 1.11
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 3007274926 ns 3119512525 ns 0.96
ViT tiny (256 x 256 x 3 x 4)/forward/CUDA/Lux 7170956.5 ns 6970194 ns 1.03
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1473810038 ns 1427711032 ns 1.03
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant 1468103895 ns 1416276590 ns 1.04
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1467963829 ns 1497474638 ns 0.98
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3352201866 ns 3162263527 ns 1.06
ViT tiny (256 x 256 x 3 x 4)/forward/CPU/Lux 1267514332 ns 1224458504 ns 1.04
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1218209141 ns 1258743352 ns 0.97
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1219848341 ns 1272174301 ns 0.96
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1250532234 ns 1286613721 ns 0.97
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 3309073423 ns 2941431002 ns 1.12
ViT tiny (256 x 256 x 3 x 16)/forward/CUDA/Lux 12341004 ns 12308639.5 ns 1.00
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 1776840440 ns 1728521529 ns 1.03
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant 1769682086 ns 1726871692 ns 1.02
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 1746813993 ns 1724333072 ns 1.01
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 3650381123 ns 3485347197 ns 1.05
ViT tiny (256 x 256 x 3 x 16)/forward/CPU/Lux 2876260488 ns 3314985446 ns 0.87
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1268709194 ns 1296002633 ns 0.98
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1239986983 ns 1304207360 ns 0.95
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1435508421 ns 1255941581 ns 1.14
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 3146289052 ns 3046234878 ns 1.03
ViT small (256 x 256 x 3 x 16)/forward/CUDA/Lux 27442593.5 ns 27396872 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 2242135877 ns 2236205101 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant 2199696092 ns 2198709936 ns 1.00
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 2204608993 ns 2296730805 ns 0.96
ViT small (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 4054284127 ns 3958176139 ns 1.02
ViT small (256 x 256 x 3 x 16)/forward/CPU/Lux 5354219989 ns 5608465822 ns 0.95
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :after_enzyme) 1254489328 ns 1251122132 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant 1288445019 ns 1342649088 ns 0.96
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :before_enzyme) 1311838413 ns 1266853946 ns 1.04
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Reactant (optimize = :only_enzyme) 3037480498 ns 3134977417 ns 0.97
ViT small (256 x 256 x 3 x 32)/forward/CUDA/Lux 53140835 ns 53144411 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :after_enzyme) 3069243254 ns 3046201375 ns 1.01
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant 3100569155 ns 3063894685 ns 1.01
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :before_enzyme) 3050601072 ns 3087462133 ns 0.99
ViT small (256 x 256 x 3 x 32)/forward/CPU/Reactant (optimize = :only_enzyme) 4901907923 ns 4895902746 ns 1.00
ViT small (256 x 256 x 3 x 32)/forward/CPU/Lux 13323425227 ns 9797674342 ns 1.36
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :after_enzyme) 1411930117 ns 1295665825 ns 1.09
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant 1409847962 ns 1436872653 ns 0.98
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :before_enzyme) 1355468017 ns 1277008406 ns 1.06
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Reactant (optimize = :only_enzyme) 3046720551 ns 3239595052 ns 0.94
ViT base (256 x 256 x 3 x 16)/forward/CUDA/Lux 71191954 ns 71339137 ns 1.00
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :after_enzyme) 3211022497 ns 3189935441 ns 1.01
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant 3283577977 ns 3207736040 ns 1.02
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :before_enzyme) 3205245566 ns 3222417327 ns 0.99
ViT base (256 x 256 x 3 x 16)/forward/CPU/Reactant (optimize = :only_enzyme) 5118932238 ns 5319487172 ns 0.96
ViT base (256 x 256 x 3 x 16)/forward/CPU/Lux 13742003124 ns 12850280060 ns 1.07
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :after_enzyme) 1271142454 ns 1274372415 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant 1279685346 ns 1287517573 ns 0.99
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :before_enzyme) 1299608393 ns 1311075350 ns 0.99
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Reactant (optimize = :only_enzyme) 3102484563 ns 3155538600 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CUDA/Lux 20876844 ns 20891523 ns 1.00
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :after_enzyme) 1874409049 ns 1909585662 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant 1872785183 ns 1907821565 ns 0.98
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :before_enzyme) 1856532831 ns 1918208063 ns 0.97
ViT base (256 x 256 x 3 x 4)/forward/CPU/Reactant (optimize = :only_enzyme) 3589196661 ns 3741288084 ns 0.96
ViT base (256 x 256 x 3 x 4)/forward/CPU/Lux 3120367142 ns 3617112173 ns 0.86

This comment was automatically generated by workflow using github-action-benchmark.

function constant(
x::DenseArray{T,N};
location=MLIR.IR.Location(
"stablehlo.constant", MLIR.IR.Location(@__FILE__, @__LINE__, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still should refactor this into a location macro

macro location()
return MLIR.IR.Location(FILE, LINE)
end

which is then used everywhere (and can be overridden to be that more full back trace as mentioned

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have sth in mind but will open another PR

@mofeing mofeing merged commit 0f99d0f into main Nov 29, 2024
27 of 39 checks passed
@mofeing mofeing deleted the ss/hlo-traced-ops branch November 29, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants