You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Bifrost GPU contains 1-32 shader cores (G-71: 32). Each shader core contains 1-3 execution engines (G-71: 3) and a quad manager. Each execution engine runs one wave/warp/quad of 4 lanes/threads at a time but it can switch quads on clause boundaries, e.g. if a clause stalls for external resources. The lanes of one quad execute in lock-step, i.e. they share one program counter. Each lane runs on two functional units: an FMA unit and an ADD unit (but they can do more than just FMAs and ADDs). Every instruction tuple contains one instruction for each of the two units. Up to 8 (?) of these tuples are grouped into a clause. Whether or not the result of an FMA instruction can be immediately used in the ADD instruction of a tuple seems to depend on the instruction and its modifiers. Clauses cannot contain control flow other than in their last instruction. Effects of a clause on the register file only take effect after the clause is complete. Within a clause you can forward the result of the previous instruction into the next instruction using 8 temporary registers (not sure why 8, are there instructions with 8 outputs?).
Open question: How are lanes masked from execution?
There seem to be 12 different kinds of clauses (?)
There are 32 bit and 64 bit clauses. The number of possible constants (per tuple? per clause?) depends on what kind of clause it is.
Found a giant jump table with valid opcodes: 22-33,38-48,176-1090,1092-1144,1146-1157,1159-1231,1233-1295,1297-1483,1485-1734,1736-1797,1799-7061,7077-7896,7912-8105,8121-9347,9363-10052,10068-12758,12774-16467,16483-17337,17353-17687
There can be up to 128 uniforms.
FAU = Fast Access Uniform
(not sure which abstraction level)
D registers: 32 + 1 (1, 5..36)
R registers: 64 + 2 (4, 265, 201..264)
T registers: 8 (383..390)
In the assembly syntax many (all?) instructions have an immediate operand near the end with various bitfields like T register number and operand selection for less-than-32-bit operands:
The lowest couple of bits (how many? 4?) are the destination T register number + 1. In that case the result operand is written like this: %RN<def> = ...
The text was updated successfully, but these errors were encountered:
Just random info for now.
A Bifrost GPU contains 1-32 shader cores (G-71: 32). Each shader core contains 1-3 execution engines (G-71: 3) and a quad manager. Each execution engine runs one wave/warp/quad of 4 lanes/threads at a time but it can switch quads on clause boundaries, e.g. if a clause stalls for external resources. The lanes of one quad execute in lock-step, i.e. they share one program counter. Each lane runs on two functional units: an FMA unit and an ADD unit (but they can do more than just FMAs and ADDs). Every instruction tuple contains one instruction for each of the two units. Up to 8 (?) of these tuples are grouped into a clause. Whether or not the result of an FMA instruction can be immediately used in the ADD instruction of a tuple seems to depend on the instruction and its modifiers. Clauses cannot contain control flow other than in their last instruction. Effects of a clause on the register file only take effect after the clause is complete. Within a clause you can forward the result of the previous instruction into the next instruction using 8 temporary registers (not sure why 8, are there instructions with 8 outputs?).
Open question: How are lanes masked from execution?
There seem to be 12 different kinds of clauses (?)
There are 32 bit and 64 bit clauses. The number of possible constants (per tuple? per clause?) depends on what kind of clause it is.
Found a giant jump table with valid opcodes: 22-33,38-48,176-1090,1092-1144,1146-1157,1159-1231,1233-1295,1297-1483,1485-1734,1736-1797,1799-7061,7077-7896,7912-8105,8121-9347,9363-10052,10068-12758,12774-16467,16483-17337,17353-17687
There can be up to 128 uniforms.
FAU = Fast Access Uniform
(not sure which abstraction level)
D registers: 32 + 1 (1, 5..36)
R registers: 64 + 2 (4, 265, 201..264)
T registers: 8 (383..390)
In the assembly syntax many (all?) instructions have an immediate operand near the end with various bitfields like T register number and operand selection for less-than-32-bit operands:
%RN<def> = ...
The text was updated successfully, but these errors were encountered: