-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: add DMA support to SPI #342
base: dev
Are you sure you want to change the base?
Conversation
I am creating a dma package for samd51. https://github.com/sago35/tinygo-dma I can adapt it to your proposed interface. dmatx.Start()
dbg6.Toggle()
dmatx.Wait() |
You can, but my intention is to integrate this into the TinyGo machine package. So there won't be a need for an external package. |
Yes, I too would like to include dma and spi communication using dma in the machine package. |
It seems like this is probably a proposal for async I/O (DMA is one implementation) for multiple types of peripherals (I2C, UART, ...). In general, I like the pattern: StartTx, Wait I'm not sure about StartTx as an alias of Tx - i suspect the app will be designed to depend on the timing behavior of StartTx. Unless there's some plan to have a background go routine model for all 'async I/O on synchronous peripherals', I'd sooner catch it at compile time - if async is not available, the StartTx and Wait symbols don't exist. Given this seems to be starting a pattern for multiple peripheral types, in my opinion it's worth thinking about when two different peripherals are involved. For example, a fully async device reading a sensor over I2C and updating an SPI display. I think you rapidly end up needing something like 'select()' - some way for the app to wait on multiple devices to complete, with a return value indicating which device 'completed'. This probably needs some integration with the scheduler? In fact, Wait on individual devices might be dangerous - simple for the easy case, but i can see it causing additional issues if there are multiple devices in use (e.g. a library using 'Wait'). Would be better to have a |
Thinking some more about general-purpose async IO, you'd want something like this for the app? display.StartDisplay(...)
for {
select {
case <-time.After(100 * time.Millisecond - (now() - last)):
// start a reading
accelSensor.Start()
case readings := <-accelSensor.Wait():
// process sensor data
case <-display.Wait():
// ... generate bitmap ...
// update screen as soon as possible
display.StartDisplay(...)
}
} Assuming this kind of API, how would the sensor driver be able to process an async I2C / SPI response? Some kind of callback from the bus maybe? |
Yes, although I haven't really thought of it that way. I was just focused on SPI for now. (Side note: initially I thought of putting DMA in the name but then I realized there could be other kinds of async I/O although I can't really think of any right now).
With "timing behavior", do you mean the fact that it runs in the background? For context, my main use for DMA right now is to support fast SPI display updates. The API for that would look something like this: func (d *Device) StartDrawBitmap(x, y, w, h int16, data []pixel.RGB565) error {
d.spi.Wait()
d.setWindow(x, y, w, h)
return d.spi.StartTx(data)
}
func (d *Device) Wait() error {
return d.spi.Wait()
} And it can be used like this (in pseudo-Go): func paint(display Displayer, buffers [2][]pixel.RGB565) {
for cycle := 0; cycle < chunks; cycle++ {
// pick a buffer that's not currently in use
buf := buffers[cycle % 2]
// calculate the next frame
calculateNextChunk(buf)
// wait for the previous update to finish (if it hasn't already)
// this way we can reuse the buffer in the next cycle
display.Wait()
// send the new buffer
display.StartDrawBitmap(x, y, w, h, buf)
}
// make sure all active updates are finished
display.Wait()
} In this example, the display driver would need to be more complicated to support both asynchronous and synchronous APIs. By making
I'm not planning to, and I doubt a background goroutine will help in any way. The time between sending two bytes over SPI is likely to be shorter than a context switch.
Hmm, that's an interesting thought. For more complicated cases like you describe (I2C and SPI), it might be worth relying on goroutines (the synchronous API could still use DMA and yield to the scheduler). However, the API you describe with channels is quite interesting. I'm not sure about the overhead of channels, but I do like how it is much easier to compose with other async operations. I think I will prototype it and see what happens. |
...snip...
Thinking again, at the bus level, I agree - drivers should be written once, using StartTx. Different drivers for sync vs async capable busses would be unwieldy.
Another example of 'async IO' might be GPIO interrupts. Currently, app logic has to implement an interrupt safe handler for GPIO events - the rules around which are 'subtle'. Async I/O would likely be 'good enough' for many use-cases I suspect and allow memory allocation, etc in response to a GPIO event: select {
case upEvent := <- machine.GPIO10.Wait():
// 'up' button
case downEvent := <- machine.GPIO11.Wait():
// 'down' button
//...
} |
Here is one possible way to implement a DMA API using channels:
It can be implemented like this:
The reason why I'm using the same channels (or a shared closed channel), is because allocating a new channel on each transaction will require a heap allocation and is therefore too slow for an API that's supposed to be performant. This is how the API could be used in a display driver. Note that it exposes the func (d *Device) StartDrawBitmap(x, y, w, h int16, data []pixel.RGB565) error {
<-d.spi.Wait()
d.setWindow(x, y, w, h)
return d.spi.StartTx(data)
}
func (d *Device) Wait() chan error {
return d.spi.Wait()
} I am leaning slightly in favor of this API, even though it's quite a bit more complex. We'd need some form of synchronization between interrupts and the waiting goroutine anyway, and those might as well be channels. Channels might have a bit of overhead (more than a dedicated synchronization primitive), but I believe those can be optimized if we hit any issues. @kenbell what do you think? |
@aykevl I quite like it. I think I'd have Overlapping calls - error or delay?The other option would be for Mixing Sync and AsyncIf sync and async operations occur on the same bus - e.g. I2C if one driver/device uses async and another uses sync. Sync calls should fail or delay until the async call completes? Channel re-useAgree with re-using the channel, not allocating for each transaction. Sensors, etc...For transactions involving a read, the driver would do something like this? (bulk read) func (d *Device) StartRead(buf []byte) error {
<-d.spi.Wait() // unless we put this in d.spi.StartTx
return d.spi.StartTx([]byte{READ,REG_DATA,len(buf)}, buf)
}
func (d *Device) Wait() chan error {
return d.spi.Wait()
}
func (d *Device) EndRead() error {
<- d.spi.Wait() // if app doesn't want to wait, it can just call EndRead directly
return nil
} Or if more structured data... func (d *Device) StartHeatAndMeasure() error {
<-d.spi.Wait() // unless we put this in d.spi.StartTx
return d.spi.StartTx([]byte{READ,REG_START_HEATER_AND_MEASURE}, d.buf)
}
func (d *Device) Wait() chan error {
return d.spi.Wait()
}
func (d *Device) EndHeatAndMeasure() (uint8, uint8, error) {
<- d.spi.Wait() // if app doesn't want to wait, it can just call EndXXX directly
return d.buf[0], d.buf[1], nil
} |
I guess, but that means that it will be impossible for more than one transaction to be queued (meaning, calling
Not sure, it could work either way. We could say "call Wait() before doing a sync transaction" which is of course kind of sensitive to bugs. The least surprising thing to do would probably be to wait until the last async transaction has finished. I don't fully understand your |
I think I agree with your argument. I like the model that the contract is that one or more asynchronous (+ 1 synchronous) transaction can be outstanding and the bus driver is responsible for ensuring all get done safely, whether the bus is truly asynchronous or not. A synchronous bus implementation would convert the async calls into sync ones, and an asynchronous bus implementation would convert the synchronous call into async followed by wait. That way device drivers can be written for the best performance case (async busses) or the most simple code structure (sync busses) without worrying what the bus actually does underneath. Of course the app developer may care, and want all drivers and busses to be async to avoid delays in their app, but that's just an argument to gradually move all busses and device drivers to an async model over time.
I'm thinking sensor.StartHeatingAndMeasure()
// do stuff
val1, val2, val3 := sensor.EndHeatAndMeasure() or the app could do: sensor.StartHeatingAndMeasure()
select {
case <-sensor.Wait():
val1, val2, val3 := sensor.EndHeatAndMeasure()
case ...
} these would be equivalent to the synchronous version: val2, val2, val3 := sensor.HeatAndMeasure() The async versions would have the benefit that if the sensor needed to hold the bus (for, say, 100ms to heat a hotplate) the app can do other useful stuff (like update the display) without needing go routines. |
Tested on Fedora 39.
This makes test outputs more consistent, even though it is a lot slower on my arm64 laptop (running the tests in QEMU).
Rebased the PR (no actual update). @kenbell looking back on this I'm not so sure about the API with channels. It seems rather brittle to me and very easy to get wrong (especially in the implementation). So what I'd much rather do is add a simple blocking What do you think of this? |
@aykevl - let me drop a few comments on the PR. I think I'm missing something about the semantics of StartTx / Wait as-implemented, and it's probably best handled inline with the code. |
A few comments:
Wouldn't we be able to tell if a SPI has DMA by looking at whether it implements an interface with StartTx and Wait?
This seems odd to me. A scheduled Tx seems like it could be implemented by putting the contents of StartTx and Wait inside the same function and adding Which leads me to the next point: I like my DMA abstraction a bit more since it is readily reusable for PIO, SPI, I2C, etc. Maybe we needed a better DMA abstraction for scheduled SPI to be as good as async possible? (I say "my abstraction" but it's actually heavily inspired by embassy-rs's implementation, who rely heavily on async to use DMA.) So far I've been using SPI with DMA on the PIO with excellent results using the above API to communicate with the pico W's CY43439 at 25MHz clock. The core API used are the push and pull methods. |
Sorry I missed this comment.
(I think I answered this elsewhere but I don't remember where, so repeating here). Yes, and it's certainly a viable implementation strategy. However, the benefit of having a single API for all SPI objects makes it slightly easier to write drivers: they can use one or the other API and be correct in both cases. Also, this only works at the
Already explained this in my other comment: tinygo-org/tinygo#3985 (comment) |
The initial proposed API covers the use cases for a display quite nicely. The way a Will the same API be made available for the other peripherals and how will they handle sharing the available DMA channels? |
Here is a possible API I've designed to add DMA support in the machine SPI API.
Here is a copy of the API description:
Alternatives I've considered:
Tx
yield to the scheduler and relying on goroutines. I tried using this, but found it hard to use correctly - at least when I simply added agosched()
call inTx
. I struggled to make it actually do things in parallel. Once I switched to an API that exposes asynchronous transfers directly, it worked the first time for me.StartTx
return a promise/future that can be waited on. I liked the idea, but realized that it would be hard to put in an interface (what's the promise type going to be?). On the other hand,Wait
is just a simple call that is easy to put in an interface.IsAsync
but making this purely an implementation detail. Unfortunately, for displays it's important to know whether the transfer can be async: if the SPI peripheral isn't async, it's not worth using two separate buffers (one of which would have been used for DMA).StartTx
. I don't plan on implementing queuing now, but it seems like a possibly useful feature. Many chips support chained SPI transfers and if the chip doesn't support it, it can be emulated by starting the next transfer from the DMA end interrupt.I also have a proposal to the Displayer interface coming up which will use this API. Together, they can deliver very fast SPI display updates like in my Simplex noise demo here (where I used DMA to update the display in parallel with calculating the next frame, reducing SPI time from 20ms to almost nothing):
https://twitter.com/aykevl/status/1644411686189776897