Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add OoO options to ROB #126

Open
wants to merge 79 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
ba79b8d
Initial attempt
hngenc May 28, 2021
26cac05
Add weightA
hngenc May 28, 2021
0ee0ffe
Add OoO options to ROB
hngenc May 29, 2021
7d4f0c8
Add new preload filter and turn off ooo config options
hngenc Jun 3, 2021
83ee602
Fix assert
hngenc Jun 3, 2021
ba4197b
Add WS check to preload filter
hngenc Jun 3, 2021
86860be
Experiment with making stores OoO
hngenc Jun 3, 2021
3121bb8
Fix assert
hngenc Jun 3, 2021
6d2ded9
Don't overwrite the preloaded address with garbage addresses
hngenc Jun 3, 2021
64bd464
add new state to preload filter and turn of st ooo
hngenc Jun 3, 2021
6bc6c41
Fix equality check
hngenc Jun 3, 2021
7d3066f
Fix b transpose error and io fire error
hngenc Jun 4, 2021
0df4b11
Add OoO ROB work correctly even when we preload garbage addresses
hngenc Jun 4, 2021
374a7da
Make sts ooo
hngenc Jun 4, 2021
85b5b29
Make exs ooo
hngenc Jun 4, 2021
d435b6e
Turn of ooo again
hngenc Jun 5, 2021
f506177
Fix ld_id
hngenc Jun 6, 2021
9995263
Try making sts ooo
hngenc Jun 6, 2021
7aeeae0
Make ex ooo
hngenc Jun 6, 2021
c562cde
Fix ld id again by using ld rs1
hngenc Jun 6, 2021
59e2cdc
make st ooo
hngenc Jun 6, 2021
8240b6c
make ex ooo
hngenc Jun 6, 2021
a6d6617
Make only sts ooo
hngenc Jun 6, 2021
483d957
Fix last_allocated_garbage_preload typo
hngenc Jun 6, 2021
83aaf79
Make ex ooo
hngenc Jun 6, 2021
1da83f6
Added fused preload+comp instructions
hngenc Jun 7, 2021
4467508
Add LoopMatmul ooo option
hngenc Jun 7, 2021
dd8c92c
bump gemmini-rocc-tests
hngenc Jun 7, 2021
d83a3a0
Bump gemmini-rocc-tests
hngenc Jun 7, 2021
34442b9
Implement C-address accumulation checks for WAW
hngenc Jun 7, 2021
05a83c9
Add stall counter
hngenc Jun 10, 2021
7063b5a
Add I unrolling
hngenc Jun 16, 2021
50981b0
Fix ROB
hngenc Jun 16, 2021
7b188b3
Fix rob-id in ExIUnroller
hngenc Jun 16, 2021
95efa6a
Fix max_i_blocks addition
hngenc Jun 16, 2021
01db476
Fix ExIUnroller
hngenc Jun 16, 2021
9faf839
Fix flooradd
hngenc Jun 16, 2021
c8cc2b1
Allow preload to have invalid rob id
hngenc Jun 16, 2021
efa4fcc
fix preload rob id valid again
hngenc Jun 16, 2021
e14014a
Fix block addr calculation
hngenc Jun 16, 2021
e288e76
Fix j block calc in ROB
hngenc Jun 16, 2021
cba5afa
fix preload rob id invalid again
hngenc Jun 16, 2021
6dd6307
Fix ROB bitwidths
hngenc Jun 16, 2021
42bd701
Fix rob-id in exiunroller
hngenc Jun 16, 2021
3ff5bbc
Add max-k rather than just max-i
hngenc Jun 16, 2021
d05a637
Comment out ExIUnroller garbage preload
hngenc Jun 17, 2021
9a25565
Add COMPUTE_AND_STAY to ExIUnroller
hngenc Jun 17, 2021
eab4f24
Add new_entry_is_ld_and_other_is_ex to ROB
hngenc Jun 23, 2021
e8cccc3
Make it so that compute A only marks the last submatrix as being in-use
hngenc Jun 24, 2021
185aa09
Revert "Make it so that compute A only marks the last submatrix as be…
hngenc Jun 24, 2021
c996a38
Add 2 dimensions to ROB overlap checks
hngenc Jun 24, 2021
cb7d5c6
Tighten requirements for iterator-checking in ROB
hngenc Jun 25, 2021
0713f83
Add dynamic interleaving for K
hngenc Jun 26, 2021
07e72f9
Make k_util limit dynamic rather than static
hngenc Jun 27, 2021
c22a9f3
Make the k_util limit look at whether or not a command could be sent,…
hngenc Jun 27, 2021
8a57b2e
Add k_util maximum
hngenc Jun 27, 2021
a4a8453
Fix combinational loop
hngenc Jun 27, 2021
ad4cdb2
Add more comments for Alon
hngenc Jun 27, 2021
f7a2d78
Print the k-portion of stalled commands
hngenc Jul 8, 2021
a1df841
Add new stall config options
hngenc Jul 8, 2021
033f5ae
Remove unnecessarry val modifiers
hngenc Jul 8, 2021
4770ff9
Add fine-grained-interleaving
hngenc Jul 10, 2021
963bf65
Reduce size of matmul fsm
hngenc Jul 13, 2021
6b2b9b5
Add cloneTypes to Bundle elements
hngenc Jul 13, 2021
598d8ed
Fix pre_addr garbage calculation
hngenc Jul 14, 2021
5ee1c7e
Turn off k portions by default
hngenc Jul 14, 2021
294d4e2
Add lean ROB option
hngenc Jul 17, 2021
f8fc12b
Add lean weightA arbiter
hngenc Jul 17, 2021
41aa2d4
Fix weightA arbiter
hngenc Jul 17, 2021
60ad23d
Consolidate ex-k-portion multipliers
hngenc Jul 17, 2021
1e18223
Add configs for synthesis
hngenc Jul 17, 2021
a4e133e
Update configs
hngenc Jul 17, 2021
225edd6
Add comments
hngenc Jul 17, 2021
4486bb4
Add 16-k-portion configs
hngenc Jul 19, 2021
2c747b1
Simplify ExController a little bit
hngenc Jul 22, 2021
f2d35ed
Fix bitwidths issue
hngenc Jul 22, 2021
1c020b1
Remove transposes from LoopMatmul FSM to save area
hngenc Jul 24, 2021
29edd8e
Reduced bitwidth of ExArbiter
hngenc Jul 24, 2021
9ebbfec
Fix ooo fine-grained-interleaving st.io.ex_ijk connections
hngenc Sep 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions src/main/scala/gemmini/Configs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,12 @@ object GemminiConfigs {
hardcode_d_to_garbage_addr = false,

mesh_output_delay = 1,

ld_ooo = false,
ex_ooo = true,
st_ooo = true,

use_preload_filter = true,
)

val chipConfig = defaultConfig.copy(sp_capacity=CapacityInKilobytes(64), acc_capacity=CapacityInKilobytes(32), dataflow=Dataflow.WS,
Expand All @@ -160,6 +166,24 @@ object GemminiConfigs {
)

val leanConfig = defaultConfig.copy(dataflow=Dataflow.WS, max_in_flight_reqs = 64, acc_read_full_width = false, ex_read_from_acc = false, ex_write_to_spad = false, hardcode_d_to_garbage_addr = true)

val synthesize_for_rob_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true) // Module ROB
val synthesize_for_rob_in_order = leanConfig.copy(ld_ooo = false, ex_ooo = false, st_ooo = false, lean_ooo_rob = false) // Module ROB

val synthesize_for_microthreads_coarse_16_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 16, ex_fine_grained_interleaving = false) // Module LoopMatmul
val synthesize_for_microthreads_coarse_8_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 8, ex_fine_grained_interleaving = false) // Module LoopMatmul
val synthesize_for_microthreads_coarse_4_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 4, ex_fine_grained_interleaving = false) // Module LoopMatmul
val synthesize_for_microthreads_coarse_2_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 2, ex_fine_grained_interleaving = false) // Module LoopMatmul

val synthesize_for_microthreads_fine_16_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 16, ex_fine_grained_interleaving = true) // Module LoopMatmul
val synthesize_for_microthreads_fine_8_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 8, ex_fine_grained_interleaving = true) // Module LoopMatmul
val synthesize_for_microthreads_fine_4_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 4, ex_fine_grained_interleaving = true) // Module LoopMatmul
val synthesize_for_microthreads_fine_2_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, lean_ooo_rob = true, ex_total_k_portions = 2, ex_fine_grained_interleaving = true) // Module LoopMatmul

val synthesize_for_microthreads_1_in_order = leanConfig.copy(ld_ooo = false, ex_ooo = false, st_ooo = false, lean_ooo_rob = true, ex_total_k_portions = 1, ex_fine_grained_interleaving = false) // Module LoopMatmul

val synthesize_for_weightA_ooo = leanConfig.copy(ld_ooo = false, ex_ooo = true, st_ooo = true, staticWeightAEnabled = true, lean_weightA = true) // Module WeightedArbiter
val synthesize_for_weightA_in_order = leanConfig.copy(ld_ooo = false, ex_ooo = false, st_ooo = false, staticWeightAEnabled = false, lean_weightA = false) // Module WeightedArbiterr
}

/**
Expand Down
6 changes: 6 additions & 0 deletions src/main/scala/gemmini/ConfigsFP.scala
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,12 @@ object GemminiFPConfigs {
hardcode_d_to_garbage_addr = false,

mesh_output_delay = 0,

ld_ooo = false,
ex_ooo = false,
st_ooo = false,

use_preload_filter = true,
)

//FP32 Single Precision Configuration
Expand Down
50 changes: 32 additions & 18 deletions src/main/scala/gemmini/Controller.scala
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,16 @@ class GemminiCmd(rob_entries: Int)(implicit p: Parameters) extends Bundle {
val cmd = new RoCCCommand
val rob_id = UDValid(UInt(log2Up(rob_entries).W))

val i = UInt(16.W) // TODO magic numbers
val j = UInt(16.W) // TODO magic numbers
val k = UInt(16.W) // TODO magic numbers
val max_i = UInt(16.W) // TODO magic numbers
val max_j = UInt(16.W) // TODO magic numbers
val max_k = UInt(16.W) // TODO magic numbers
val use_iterators = Bool()

val ex_k_portion = UInt(8.W) // TODO magic numbers

override def cloneType: this.type = new GemminiCmd(rob_entries).asInstanceOf[this.type]
}

Expand Down Expand Up @@ -105,7 +115,7 @@ class GemminiModule[T <: Data: Arithmetic, U <: Data, V <: Data]
*/

// Incoming commands and ROB
val rob = Module(new ROB(outer.config, new RoCCCommand))
val rob = Module(new ROB(outer.config, new RoCCCommand, new GemminiCmd(rob_entries)))

val raw_cmd = Queue(io.cmd)

Expand All @@ -123,9 +133,10 @@ class GemminiModule[T <: Data: Arithmetic, U <: Data, V <: Data]

// val (unrolled_cmd, loop_matmul_unroller_busy) = LoopMatmul(unrolled_cmd_after_conv, rob.io.ld_utilization, rob.io.st_utilization, rob.io.ex_utilization,

val (loop_cmd, loop_matmul_unroller_busy) = LoopMatmul(conv_cmd, rob.io.ld_utilization, rob.io.st_utilization, rob.io.ex_utilization,
meshRows*tileRows, coreMaxAddrBits, rob_entries, max_lds, max_exs, max_sts, sp_banks * sp_bank_entries, acc_banks * acc_bank_entries,
inputType.getWidth, accType.getWidth, dma_maxbytes)
val (loop_cmd, loop_matmul_unroller_busy) = LoopMatmul(conv_cmd, rob.io.ld_utilization, rob.io.st_utilization, rob.io.ex_utilization, rob.io.ex_k_portion_utilizations,
meshRows*tileRows, coreMaxAddrBits, rob_entries, rob_full_entries, max_lds, max_exs, max_sts, sp_banks * sp_bank_entries, acc_banks * acc_bank_entries,
inputType.getWidth, accType.getWidth, dma_maxbytes, new GemminiCmd(rob_entries), ex_total_k_portions, ex_fine_grained_interleaving, local_addr_t, lean_weightA, lean_ooo_rob,
staticWeightAEnabled)

val unrolled_cmd = Queue(loop_cmd)
unrolled_cmd.ready := false.B
Expand Down Expand Up @@ -170,13 +181,11 @@ class GemminiModule[T <: Data: Arithmetic, U <: Data, V <: Data]
tiler.io.issue.load.ready := false.B
tiler.io.issue.store.ready := false.B
tiler.io.issue.exec.ready := false.B
*/

rob.io.issue.ld.ready := false.B
rob.io.issue.st.ready := false.B
rob.io.issue.ex.ready := false.B

/*
when (is_cisc_mode) {
load_controller.io.cmd <> tiler.io.issue.load
store_controller.io.cmd <> tiler.io.issue.store
Expand All @@ -203,23 +212,28 @@ class GemminiModule[T <: Data: Arithmetic, U <: Data, V <: Data]
}
*/

load_controller.io.cmd.valid := rob.io.issue.ld.valid
rob.io.issue.ld.ready := load_controller.io.cmd.ready
load_controller.io.cmd.bits.cmd := rob.io.issue.ld.cmd
load_controller.io.cmd.bits.cmd.inst.funct := rob.io.issue.ld.cmd.inst.funct
load_controller.io.cmd.bits.rob_id.push(rob.io.issue.ld.rob_id)
val (rob_issue_ld, rob_issue_ex) = PreloadFilter(outer.config, new RoCCCommand, rob.io.issue.ld, rob.io.issue.ex)

load_controller.io.cmd.valid := rob_issue_ld.valid
rob_issue_ld.ready := load_controller.io.cmd.ready
load_controller.io.cmd.bits := DontCare
load_controller.io.cmd.bits.cmd := rob_issue_ld.cmd
load_controller.io.cmd.bits.cmd.inst.funct := rob_issue_ld.cmd.inst.funct
load_controller.io.cmd.bits.rob_id.push(rob_issue_ld.rob_id)

store_controller.io.cmd.valid := rob.io.issue.st.valid
rob.io.issue.st.ready := store_controller.io.cmd.ready
store_controller.io.cmd.bits := DontCare
store_controller.io.cmd.bits.cmd := rob.io.issue.st.cmd
store_controller.io.cmd.bits.cmd.inst.funct := rob.io.issue.st.cmd.inst.funct
store_controller.io.cmd.bits.rob_id.push(rob.io.issue.st.rob_id)

ex_controller.io.cmd.valid := rob.io.issue.ex.valid
rob.io.issue.ex.ready := ex_controller.io.cmd.ready
ex_controller.io.cmd.bits.cmd := rob.io.issue.ex.cmd
ex_controller.io.cmd.bits.cmd.inst.funct := rob.io.issue.ex.cmd.inst.funct
ex_controller.io.cmd.bits.rob_id.push(rob.io.issue.ex.rob_id)
ex_controller.io.cmd.valid := rob_issue_ex.valid
rob_issue_ex.ready := ex_controller.io.cmd.ready
ex_controller.io.cmd.bits := DontCare
ex_controller.io.cmd.bits.cmd := rob_issue_ex.cmd
ex_controller.io.cmd.bits.cmd.inst.funct := rob_issue_ex.cmd.inst.funct
ex_controller.io.cmd.bits.rob_id.push(rob_issue_ex.rob_id)

// Wire up scratchpad to controllers
spad.module.io.dma.read <> load_controller.io.dma
Expand Down Expand Up @@ -353,7 +367,7 @@ class GemminiModule[T <: Data: Arithmetic, U <: Data, V <: Data]
// val config_cmd_type = cmd.bits.rs1(1,0) // TODO magic numbers

//val funct = unrolled_cmd.bits.inst.funct
val risc_funct = unrolled_cmd.bits.inst.funct
val risc_funct = unrolled_cmd.bits.cmd.inst.funct

val is_flush = risc_funct === FLUSH_CMD
/*
Expand All @@ -365,7 +379,7 @@ class GemminiModule[T <: Data: Arithmetic, U <: Data, V <: Data]

when (is_flush) {
// val skip = compressed_cmd.bits.rs1(0)
val skip = unrolled_cmd.bits.rs1(0)
val skip = unrolled_cmd.bits.cmd.rs1(0)
tlb.io.exp.flush_skip := skip
tlb.io.exp.flush_retry := !skip

Expand Down
19 changes: 16 additions & 3 deletions src/main/scala/gemmini/DMACommandTracker.scala
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import chisel3.util._

// This module is meant to go inside the Load controller, where it can track which commands are currently
// in flight and which are completed
class DMACommandTracker[T <: Data](val nCmds: Int, val maxBytes: Int, tag_t: => T) extends Module {
class DMACommandTracker[T <: Data](val nCmds: Int, val maxBytes: Int, tag_t: => T, prng_seed: Int, proportion_of_slow_accesses_out_of_128: Int, stall_delay: Int) extends Module {
def cmd_id_t = UInt((log2Ceil(nCmds) max 1).W)

val io = IO(new Bundle {
Expand Down Expand Up @@ -56,6 +56,8 @@ class DMACommandTracker[T <: Data](val nCmds: Int, val maxBytes: Int, tag_t: =>
val tag = tag_t.cloneType
val bytes_left = UInt(log2Up(maxBytes+1).W)

val stall_cycles = UInt(32.W) // TODO magic number

def init(dummy: Int = 0): Unit = {
valid := false.B
}
Expand All @@ -73,16 +75,21 @@ class DMACommandTracker[T <: Data](val nCmds: Int, val maxBytes: Int, tag_t: =>
io.busy := cmd_valids.reduce(_ || _)

val cmd_completed_id = MuxCase(0.U, cmds.zipWithIndex.map { case (cmd, i) =>
(cmd.valid && cmd.bytes_left === 0.U) -> i.U
(cmd.valid && cmd.bytes_left === 0.U && cmd.stall_cycles === 0.U) -> i.U
})
io.cmd_completed.valid := cmds.map(cmd => cmd.valid && cmd.bytes_left === 0.U).reduce(_ || _)
io.cmd_completed.valid := cmds.map(cmd => cmd.valid && cmd.bytes_left === 0.U && cmd.stall_cycles === 0.U).reduce(_ || _)
io.cmd_completed.bits.cmd_id := cmd_completed_id
io.cmd_completed.bits.tag := cmds(cmd_completed_id).tag

when (io.alloc.fire()) {
cmds(next_empty_alloc).valid := true.B
cmds(next_empty_alloc).tag := io.alloc.bits.tag
cmds(next_empty_alloc).bytes_left := io.alloc.bits.bytes_to_read

val random_number = random.GaloisLFSR.maxPeriod(width=8, seed=Some(prng_seed))

cmds(next_empty_alloc).stall_cycles := Mux(random_number < proportion_of_slow_accesses_out_of_128.U,
stall_delay.U, 0.U)
}

when (io.request_returned.fire()) {
Expand All @@ -97,6 +104,12 @@ class DMACommandTracker[T <: Data](val nCmds: Int, val maxBytes: Int, tag_t: =>
cmds(io.cmd_completed.bits.cmd_id).valid := false.B
}

cmds.foreach { cmd =>
when (cmd.valid && cmd.bytes_left === 0.U && cmd.stall_cycles > 0.U) {
cmd.stall_cycles := cmd.stall_cycles - 1.U
}
}

when (reset.asBool()) {
cmds.foreach(_.init())
}
Expand Down
6 changes: 6 additions & 0 deletions src/main/scala/gemmini/DSEConfigs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,12 @@ object DSEBaseConfig {
max_in_flight_reqs = 16,

mesh_output_delay = 1,

ld_ooo = false,
ex_ooo = false,
st_ooo = false,

use_preload_filter = true,
)
}

Expand Down
93 changes: 93 additions & 0 deletions src/main/scala/gemmini/ExIUnroller.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
package gemmini

import chisel3._
import chisel3.util._
import chisel3.experimental._
import freechips.rocketchip.tile.RoCCCommand
import chipsalliance.rocketchip.config.Parameters
import GemminiISA._
import Util._

class ExIUnroller[T <: Data : Arithmetic, U <: Data, V <: Data](config: GemminiArrayConfig[T, U, V])(implicit p: Parameters) extends Module {
import config._

val block_rows = meshRows * tileRows
val block_cols = meshColumns * tileColumns

val io = IO(new Bundle {
val in = Flipped(Decoupled(new GemminiCmd(rob_entries)))
val out = Decoupled(new GemminiCmd(rob_entries))
})

object State extends ChiselEnum {
val preload, compute = Value
}
import State._
val state = RegInit(preload)

val (q, len) = MultiHeadedQueue(io.in, entries=3, heads=2, maxpop=2)

val first_cmd_is_preload = q.bits(0).cmd.inst.funct === PRELOAD_CMD

val total_I = q.bits(0).cmd.rs2(63, 48).asUInt() // This is only valid if first_cmd_is_preload === true.B // TODO magic numbers
val I_sent = RegInit(0.U(16.W)) // TODO magic number
val last_send = total_I -& I_sent <= block_rows.U

val must_unroll = first_cmd_is_preload && total_I > block_rows.U

val J_blocks = Cat(q.bits(0).cmd.inst.opcode, q.bits(0).cmd.inst.rs1, q.bits(0).cmd.inst.rs2, q.bits(0).cmd.inst.rd)
val K_blocks = Cat(q.bits(1).cmd.inst.opcode, q.bits(1).cmd.inst.rs1, q.bits(1).cmd.inst.rs2, q.bits(1).cmd.inst.rd)
val I_block = I_sent / block_rows.U

val preload_cmd_with_bounded_i = WireInit(q.bits(0))
preload_cmd_with_bounded_i.cmd.rs2 := (minOf(total_I -& I_sent, block_rows.U) << 48) |
(q.bits(0).cmd.rs2(47, 32) << 32) |
(q.bits(0).cmd.rs2(31, 0).asTypeOf(local_addr_t) + I_block * J_blocks * block_rows.U).asUInt()
preload_cmd_with_bounded_i.rob_id.valid := last_send && q.bits(0).rob_id.valid

val compute_cmd_with_bounded_i = WireInit(q.bits(1))
compute_cmd_with_bounded_i.cmd.rs1 := (minOf(total_I -& I_sent, block_rows.U) << 48) |
(q.bits(1).cmd.rs1(47, 32) << 32) |
(q.bits(1).cmd.rs1(31, 0).asTypeOf(local_addr_t) + I_block * K_blocks * block_rows.U).asUInt()
compute_cmd_with_bounded_i.cmd.rs2 := (minOf(total_I -& I_sent, block_rows.U) << 48) |
(q.bits(1).cmd.rs2(47, 32) << 32) |
(q.bits(1).cmd.rs2(31, 0).asTypeOf(local_addr_t) + I_block * J_blocks * block_rows.U).asUInt()
compute_cmd_with_bounded_i.rob_id.valid := last_send && q.bits(1).rob_id.valid

when (I_sent > 0.U) {
preload_cmd_with_bounded_i.cmd.rs1 := (block_rows.U << 48) | (block_cols.U << 32) | GARBAGE_ADDR
compute_cmd_with_bounded_i.cmd.inst.funct := COMPUTE_AND_STAY_CMD
}
when (q.bits(0).cmd.rs2(31, 0).asTypeOf(local_addr_t).is_garbage()) {
preload_cmd_with_bounded_i.cmd.rs2 := (block_rows.U << 48) | (block_cols.U << 32) | GARBAGE_ADDR
}
when (q.bits(1).cmd.rs1(31, 0).asTypeOf(local_addr_t).is_garbage()) {
compute_cmd_with_bounded_i.cmd.rs1 := (block_rows.U << 48) | (block_cols.U << 32) | GARBAGE_ADDR
}
when (q.bits(1).cmd.rs2(31, 0).asTypeOf(local_addr_t).is_garbage() || (dataflow == Dataflow.WS && hardcode_d_to_garbage_addr).B) {
compute_cmd_with_bounded_i.cmd.rs2 := (block_rows.U << 48) | (block_cols.U << 32) | GARBAGE_ADDR
}

io.out.valid := Mux(must_unroll, (q.valid(0) && state === preload) || (q.valid(1) && state === compute), q.valid(0))
io.out.bits := Mux(must_unroll, Mux(state === preload, preload_cmd_with_bounded_i, compute_cmd_with_bounded_i), q.bits(0))

q.pop := Mux(io.out.fire(), Mux(must_unroll, Mux(state === compute && last_send, 2.U, 0.U), 1.U), 0.U)

// Control the state
when (io.out.fire() && must_unroll) {
state := state.next
}

// Control I_sent
when (io.out.fire() && must_unroll && state === compute) {
I_sent := floorAdd(I_sent, block_rows.U, total_I)
}
}

object ExIUnroller {
def apply[T <: Data : Arithmetic, U <: Data, V <: Data](in: ReadyValidIO[GemminiCmd], config: GemminiArrayConfig[T, U, V])(implicit p: Parameters) = {
val mod = Module(new ExIUnroller(config))
mod.io.in <> in
mod.io.out
}
}
16 changes: 9 additions & 7 deletions src/main/scala/gemmini/ExecuteController.scala
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,13 @@ class ExecuteController[T <: Data, U <: Data, V <: Data](xLen: Int, tagWidth: In
}
}

val unrolled_cmd = TransposePreloadUnroller(io.cmd, config)
val unrolled_cmd = TransposePreloadUnroller(ExIUnroller(io.cmd, config), config)

val cmd_q_heads = 3
assert(ex_queue_length >= cmd_q_heads)
// val (cmd, _) = MultiHeadedQueue(io.cmd, ex_queue_length, cmd_q_heads)
val (cmd, _) = MultiHeadedQueue(unrolled_cmd, ex_queue_length, cmd_q_heads)
// val (cmd, _) = MultiHeadedQueue(unrolled_cmd, ex_queue_length, cmd_q_heads)
val (cmd, _) = MultiHeadedQueue(unrolled_cmd, rob_full_entries, cmd_q_heads) // TODO this should be ex_queue_length
cmd.pop := 0.U

io.solitary_preload := cmd.valid(0) && cmd.bits(0).cmd.inst.funct === PRELOAD_CMD && !cmd.valid(1)
Expand Down Expand Up @@ -784,7 +785,7 @@ class ExecuteController[T <: Data, U <: Data, V <: Data](xLen: Int, tagWidth: In
mesh_cntl_signals_q.io.enq.bits.a_transpose := a_transpose
mesh_cntl_signals_q.io.enq.bits.bd_transpose := bd_transpose

mesh_cntl_signals_q.io.enq.bits.rob_id.valid := !performing_single_mul && !c_address_rs2.is_garbage()
mesh_cntl_signals_q.io.enq.bits.rob_id.valid := cmd.bits(preload_cmd_place).rob_id.valid && !performing_single_mul && !c_address_rs2.is_garbage()
mesh_cntl_signals_q.io.enq.bits.rob_id.bits := cmd.bits(preload_cmd_place).rob_id.bits

mesh_cntl_signals_q.io.enq.bits.dataflow := current_dataflow
Expand Down Expand Up @@ -963,15 +964,16 @@ class ExecuteController[T <: Data, U <: Data, V <: Data](xLen: Int, tagWidth: In
//val complete_lock = RegInit(false.B)

//Seah: added for WS accumulator
when(mesh.io.resp.fire() && mesh.io.resp.bits.tag.rob_id.valid) {
when(mesh.io.resp.fire()) {
output_counter := wrappingAdd(output_counter, 1.U, w_total_output_rows)
val last = mesh.io.resp.bits.last

when(last) {
mesh_completed_rob_id_fire := true.B
io.completed.valid := true.B
when(last && mesh.io.resp.bits.tag.rob_id.valid) {
mesh_completed_rob_id_fire := mesh.io.resp.bits.tag.rob_id.valid
io.completed.valid := mesh.io.resp.bits.tag.rob_id.valid
io.completed.bits := mesh.io.resp.bits.tag.rob_id.bits
}

start_array_outputting := !is_garbage_addr
}

Expand Down
20 changes: 20 additions & 0 deletions src/main/scala/gemmini/GemminiConfigs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,26 @@ case class GemminiArrayConfig[T <: Data : Arithmetic, U <: Data, V <: Data](

mesh_output_delay: Int,

ld_ooo: Boolean,
ex_ooo: Boolean,
st_ooo: Boolean,

use_preload_filter: Boolean,

prng_seed: Int = 1, // ALON: You can change the PRNG seed here
proportion_of_slow_accesses_out_of_128: Int = 10, // ALON: The number of memory accesses (out of 128) that are slow. You can also make this 0
stall_delay: Int = 1000, // ALON: How many cycles should we wait for a slow memory access? You can also make this 0
delay_lds: Boolean = false, // ALON: Should loads be stalled?
delay_sts: Boolean = false, // ALON: Should stores be stalled?

ex_total_k_portions: Int = 1, // ALON: You can change this to any number of k-portions that you would like
ex_fine_grained_interleaving: Boolean = true, // ALON: If this is true, then we use the newer ("finer") intervleaving strategy

lean_ooo_rob: Boolean = false, // No garbage preloads
lean_weightA: Boolean = false, // Only static weightA supported

staticWeightAEnabled: Boolean = true,

headerFileName: String = "gemmini_params.h"
) {
val sp_width = meshColumns * tileColumns * inputType.getWidth
Expand Down
Loading