Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction Encoding #5

Open
Tilka opened this issue Apr 29, 2017 · 0 comments
Open

Instruction Encoding #5

Tilka opened this issue Apr 29, 2017 · 0 comments

Comments

@Tilka
Copy link

Tilka commented Apr 29, 2017

Just random info for now.

A Bifrost GPU contains 1-32 shader cores (G-71: 32). Each shader core contains 1-3 execution engines (G-71: 3) and a quad manager. Each execution engine runs one wave/warp/quad of 4 lanes/threads at a time but it can switch quads on clause boundaries, e.g. if a clause stalls for external resources. The lanes of one quad execute in lock-step, i.e. they share one program counter. Each lane runs on two functional units: an FMA unit and an ADD unit (but they can do more than just FMAs and ADDs). Every instruction tuple contains one instruction for each of the two units. Up to 8 (?) of these tuples are grouped into a clause. Whether or not the result of an FMA instruction can be immediately used in the ADD instruction of a tuple seems to depend on the instruction and its modifiers. Clauses cannot contain control flow other than in their last instruction. Effects of a clause on the register file only take effect after the clause is complete. Within a clause you can forward the result of the previous instruction into the next instruction using 8 temporary registers (not sure why 8, are there instructions with 8 outputs?).

Open question: How are lanes masked from execution?

There seem to be 12 different kinds of clauses (?)

There are 32 bit and 64 bit clauses. The number of possible constants (per tuple? per clause?) depends on what kind of clause it is.

Found a giant jump table with valid opcodes: 22-33,38-48,176-1090,1092-1144,1146-1157,1159-1231,1233-1295,1297-1483,1485-1734,1736-1797,1799-7061,7077-7896,7912-8105,8121-9347,9363-10052,10068-12758,12774-16467,16483-17337,17353-17687

There can be up to 128 uniforms.

FAU = Fast Access Uniform

(not sure which abstraction level)
D registers: 32 + 1 (1, 5..36)
R registers: 64 + 2 (4, 265, 201..264)
T registers: 8 (383..390)

In the assembly syntax many (all?) instructions have an immediate operand near the end with various bitfields like T register number and operand selection for less-than-32-bit operands:

  • The lowest couple of bits (how many? 4?) are the destination T register number + 1. In that case the result operand is written like this: %RN<def> = ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant