-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAM: Acknowledge writes #30
Comments
At least for the lowRISC implementation, all transactions are in order. |
Yes, the packets are ordered, but currently it is possible to send the "start" packet while the MAM is still writing, leading potentially to corruption.. |
@imphil I "solved" that before by reading back the memory ( |
Enforcing a reply for every write leads to latency overhead and control overhead in the chisel mam. I think if a reply is really needed just for strong order, may be a special type of write rather than every write transaction. |
You are right, Wei. For now I was only thinking about an acknowledge on the internal MAM interface when the write is issued. But that is not necessarily when it is finished. |
First, how can lowrisc not be affected by that? The problem is independent of the ordering or buffering. Assume the following scenario: You issue a MAM write request, and directly after the last MAM packet you send the start cpu register write to STM (that's what happens currently). This start cpu packet will reach STM (which is the second module in the NoC) before even the MAM worm is at its destination (in our case for example, it's ~20 more hops) causing the system to start before MAM has a chance to write the data to the memory, independent of any buffering. Second, there is no significant additional latency overhead. The software does not need to wait for the ACK before it sends the next write. It only should wait for the ACK of the last packet. So essentially we introduce one roundtrip latency between writing the last memory through the MAM and starting the system. That should be acceptable. Also the additional NoC traffic caused by ACK packets is small enough to be acceptable. |
lowRISC is affected by this and this is indeed a problem which should be fixed. The connection can be looked as: Currently, all components, from HIM to L2 cache coherence tracker treat write operations as fire and forget. To safely provide an ack (ensure that the write operation is finished), it requires the Chisel MAM to issue a special write operation to the TileLink Bus; therefore, the L2 cache coherence tracker will send back a finish packet to Chisel MAM when the data are written the L2 cache. Then the Chisel MAM can generate an ack to HIM. To efficiently utilize the available bandwidth, Chisel MAM will not wait for the finish packet from L2 coherence tracker before sending the following write requests to TileLink. This means multiple write operations are in fly simultaneously. A buffer is needed to record all these concurrent operations. The reasons for this buffer: (1) L2 is banked and every L2 bank have multiple parallel tracker. So L2 does not ensure the write order. This means the finish packets may come back out-of-order. (2) If there are multiple masters on the debug network, for example we support HIM and CTM to store traces in memory/cache, the Chisel MAM needs to record which packet an ack is responding to. Once this buffer is required, the control logic of Chisel MAM is increased, could be significantly. If the Chisel MAM do not want to keep a record, it then force sequential write operations: block further write operations until a finish packet is received from L2 coherence trackers. Then this would incur latency overhead. My proposals are: (2) A light-weight hardware fix. I hope I have explained it better this time. |
Hi, I think we should fix it for now with an acknowledge on the last packet of a write on write setup (
That would not require another software change. Does that sound reasonable? |
Thanks Wei for the explanation, now I understand what you meant. The most significant problem (and only one in OpTiMSoC actually) is the path from the host to the MAM. For this problem, I think the easiest solution is just to issue a memory read after the last memory write, as you proposed Wei in your first solution. The much harder problem of ensuring that the data actually made it from MAM to the memory (or the caches) can for now either solved by a simple sleep() (since the time window here should be really small) or actually going the way Stefan/Wei Solution 2. I'll leave that up for you guys, since you better know what's going on inside LowRISC. |
Maybe to add to that, from how i understood Wei we need to have two writes, one fire and forget and one sync one, as always issuing sync writes is too costly. If I get your idea right Stefan then you think about a version with acks which we can later extend to provide fully synced writes to the memory system without software changes. But if we need two write types, this won't work without additional changes to the software to use the sync write type anyways? (the other option would be to add a config register to mam which can switch between two "meanings" of the ack, and set that one to "full barrier mode" before issuing the last write request) |
Actually it will be a flag in the last packet that this needs to sync I suppose. Alternatively, we can just have a memory sync packet I think. |
@imphil Yes, my idea was to have two types of write. I think @wallento 's idea is to let debug MAM handle it internally because the long debug packet from HIM is broken into smaller packets to Chisel MAM. I think @wallento 's idea could work just fine. Then this is hidden from software. Just find out that actually now the Chisel MAM does not run the TileLink with full throttle. It actually waits from an ack from TileLink before sending the next write. I think the idea was the internal buffer in Chisel MAM is able to hide this latency considering TileLink is much faster than Debug network. However, synced write still has delay overhead. I think a single write_sync would be enough. If write_sync is high for a write request, the Chisel MAM can hold req.ready low until the whole write request is finished. I think I can have this done with couple of lines. It should be easy to do in debug MAM as well I think? It could get complicated when MAM runs in full throttle, but let's leave it for the future. |
Currently,
osd_memory_write()
is fire-and-forget; the software does not know when MAM has finished writing data to the memory. This causes problems when memory loading must be finished before CPUs can be started.We therefore should
The text was updated successfully, but these errors were encountered: