-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve race free register access ergonomics #37
Comments
The reason the whole "require If it's so easy to write that I completely agree with the ergonomics bit though, I'd rather not have all this noise in my code, but currently it seems like a necessary evil. |
In general, unsafe is only required for memory safety bugs, which the whole Whilst changing pin state (or similar things) unexpectedly isn't great, it's not really unsound (in all the cases that i can think of), since it doesn't lead to memory corruption, the code just doesn't work anymore. This may be an issue when peripherals are doing memory access, as with dma, but for most peripherals this isn't an issue. |
For gd32vf103 I used an |
Along with the comments from the other day, there was some good discussion on this in the Matrix WG channel starting here (discussion ends at the change of subject). Why
|
It's worth pointing out that there was an alternative pointed out by @thalesfragoso, that addresses the soundness gap. My recommendation is to still use the previous proposal, but it seemed unreasonable to omit this from discussion. The AlternativeSince the issue is that "proxy" or "global" registers are unsafe to access, if all their methods were marked unsafe, then errors by HAL maintainers would at least be restrained to unsafe blocks. This does indeed plug the soundness concern. If we apply a pattern like this (marked
Then the AdvantagesThe principle advantage is the avoidance of Critical Sections. Be it perception or reality, there is resistance to their usage based on cost to WCET. The other advantage is that this can be added without user impact. It'd be a bit tedious to mark every access to a DrawbacksThe drawback of ergonomics remains. Passing around &mut registers is still costly for users. Certain traits cannot be implemented. The loss of generality and sneaking in of device specific information means that generic interfaces may become impossible. For example, the RecommendationI still believe that we can get a better interface, that allows for easier HAL Trait implementation, while still keeping WCET both low (<25 cycles) and document-able. It seems reasonable to calculate and put right in the README. I think the vast majority of users will see that those numbers are well within their required tolerance. |
#127 restarted the discussion on the same topic, so I'll like to give my thoughts on this issue:
I recently stumbled upon bitbanding myself and I see this neatly fits our purpose to improve the ergonomics at least on some levels, at least for methods, where only one register needs to be accessed.
For f3's we do not have to worry about multi-core issues, at least. This is a viable option to be used in this case. But as @IamfromSpace said, WCET is an issue we have to be aware of. To summarize, we have these three issues, which we have to prioritize, while further improving the API:
This is my initial prioritization order, but this is up to discussion to change this order. One goal we can all agree on, is preventing unsound issues, I guess. The current ergonomic situation at least show the problems, that a thread-safe and sound controlling of the peripheries is difficult to realize with the current register map on the stm32 family. I don't know if any other microcontroller family are better off. |
From @ra-kete in #127:
So action points could be the following:
|
On critical sections: If bitbanding is an option, we would only need critical sections for methods, which need access to multiple global registers. AFAIK, these are mostly methods used for general configuration of the pins or peripherals, .e.g |
First I would like to clarify that I don't think the possibility of a
It's usually used to enable peripherals' clocks on the RCC register, you can find it in use in the F1 hal and here is also an example on the F4: https://github.com/stm32-rs/stm32f4xx-hal/blob/master/src/pwm.rs#L165-L172. However, AFAIK the F3 doesn't have support for bit banding.
You can also use that to say that the un-ergonomics isn't that bad since they are only used in the initialization where the user has all the |
I think that is correct. At least for the DISCOVERY, there are no bit-banding regions marked in the memory map. So bit-banding is not an option we need to discuss here further (though definitely relevant for more general discussions).
Personally, I'd also like my initialization code to look clean, but I can understand if not everybody cares about that. It's certainly not a big issue to use the However, as @adamgreig pointed out in Matrix, there are also scenarios in which one wants to re-configure a pin at runtime. I've not needed that myself, but the fact that some HALs have introduced methods for temporary re-configuration (e.g. stm32-rs/stm32l0xx-hal#74) makes it clear that this is still a common use-case. In these scenarios, passing around shared registers becomes unwieldy, especially when you have multiple pins owned by several components that you want to re-configure. Then you'll have to store the shared registers globally somehow. If you use something like RTIC that allows you to share resources easily that's actually not as bad, but still more of a hassle than simply being able to call |
And let's not forget the drawback of passing |
So the only way to go is to use critical sections instead of
II would rather be careful using the |
The runtime case cuts both ways, now you can be using the methods in a hot and/or sensitive part of your code, which can cause the critical sections to be a problem. My biggest problem with changing the methods is that it takes away the option from the users that don't even need or want a critical section because they have the However, I wouldn't be against having both methods, one zero cost with the
I wouldn't consider that to be completely true. |
Hm, I overstated it a bit.
I like the idea of implementing both. Keep the These could be added as extra |
I understand, I'm just not convinced anyone really needs that option. What would help to convince me is if you (or anyone else) could point me to a real world use-case that wouldn't be possible with critical sections. I also wouldn't object to implementing both approaches in principle. However, I'd make the
For the critical section approach, is there any reason not to just implement the |
Use cases aside, could the critical section based method simply use a CS to create the relevant |
Yes, that sounds quite feasible. With this peripheral methods that reconfigure pins would be duplicated but otherwise the code wouldn't change much. Hm, sounds like it wouldn't be as bad as I initially thought. Might even be worthwhile to implement them both just for the sake of experimentation, to get data on what people like to use. Unrelatedly, I think I found another advantage of the CS approach. It makes ergonomic use of fully erased pins possible. On fully erased pins, you cannot change the configuration using the |
This would configure the device correctly, but would be unsound, leaving race conditions. The trick with &mut depends on only one reference existing ever, at which point the borrow checker does the rest of the work for us. If we create one, the borrow checker can’t enforce soundness for us (due to the lack of R-W atomicity). I made this mistake before the community pointed it out too. The performance with CSes is certainly interesting. The &mut is zero cost in performance, whereas the CS strategy must lock and unlock. I’m not so certain it’s costly enough to merit concern, as the entirety of RTIC (formerly RTFM) is based on the concept. On the topic of RTIC, there was one more approach that came up late in these discussions, but needs a lot more exploration. The idea was to expose a type parameter with a Mutex Trait constraint on each type that depends on global registers. With RTIC, it would use their Exclusive, and then we could also provide a simple CS impl, or even a F***ItGoFast impl that unsafely executes. This still might be a bit confusing to a newcomer, but a tutorials would be sufficient and “learn once.“ We then avoid the “can’t impl common traits” problem. We’d really need a proof of concept for this. If it works though it seems like the most promising way to meet the majority of the numerous constraints we face here. If that’s not possible, I’m also still in favor of the CS approach. We want our abstractions to be as free as possible, but if we don’t fully abstract, then I think we’ve missed our primary goal. The CS strategy isn’t free, but it’s not so costly, and the things we need them for are generally very simple; careful reviews and I think we’d be fine. |
Sorry, I don't think I was clear: we provide two configuration methods, one of which internally creates a critical section, and inside the critical section it uses an unsafe block to conjure a new
RTIC does not lock; it changes the BASEPRI (and uses interrupt priorities), so higher priority interrupts can always execute. That's a critical (sorry) difference from using critical sections, because activity in lower-priority interrupts does not affect WCET of higher-priority interrupts. |
Ah, sorry, I saw that discussion and didn’t put two and two together—that is sound, my mistake. In this regard, I think that the CS approach precludes the vast majority of cases. The instruction cost is just not that high, it doesn’t seem like there are that many applications that are truly 5-6 cycles away from meeting requirements (but I could be wrong). I do like the pragmatism though, it’s certainly appealing.
Right, sorry, I was being too loose with terms. You’ve said it much better! And you’ve well highlighted the downside of the CS, we’d have to use the highest priority, so every RTIC task could in theory be affected in WCET. I wonder if we could use the &mut strategy and the Mutex trait strategy too. In theory, any &mut REG can satisfy Mutex trivially, right? Ideally the compiler would optimize away the rest. |
@adamgreig Shoot, I was making some progress with a PoC and realized that
Is unsound, because the other reference may get interrupted by this CS, eliminating the atomicity that should exist due to the single As an example, consider the following. Let's say we have some Our main execution path looks like this:
And we have interrupt code simply calls
In this case we have an ordering that fails us:
We can only conjure I think there's still some opportunities here (notably |
@IamfromSpace If I understand correctly what you describe is that we run into problems if any other parts of the code also hold (non-conjured) references to a register at the same time we conjure one in a CS. That makes sense to me. However, I don't see how it should be an issue in the GPIO case. If you construct an instance of the HAL's |
@ra-kete yep, that’s all correct! It’s not an issue as long as we can prove (one way or another) that there is no other reference floating around. If we can take it, then we can do the CS-conjure-combo anywhere we’d like safely. That’s still a challenge for registers like RCC which are widely used and therefore sort of hard to take without preventing other usage. One thing that makes sense here to to make something like a ShareableRCC, which takes the RCC and uses a CS on modification (through the Mutex trait is a natural fit) and then can safely be Copy and Clone. This helps for things that need global “fan-out” like PWM Channels. I‘m hoping to put a branch together that does something along these lines to demonstrate its usage. |
@IamfromSpace Good catch, you're quite right, we can't allow simultaneous use of "pass in the &mut" and "we'll create one in an unsafe block". Your ShareableRCC idea sounds a lot like the REC struct in the H7 HAL which might provide some inspiration: https://github.com/stm32-rs/stm32h7xx-hal/blob/master/src/rcc/rec.rs https://docs.rs/stm32h7xx-hal/0.6.0/stm32h7xx_hal/rcc/rec/index.html |
Alrighty, I feel like I've got something fairly interesting that does a lot of things that have been talked about. I've done this on the PWM module, because I'm very familiar with it and it's actually a very interesting example of some of the challenges here. PWM is neat because it has a bit of everything: it needs the This utilizes the Mutex trait for anything that might be shared. So the RCC block can't just be passed in directly, but if you have a The Mutex impl is selected through type parameter declarations. So if you want to use
The working branch is here where the most useful place to look would be the module docs and the examples. I'm pretty excited about this, because it plugs a few race conditions in a way that gives the user a lot of control. The downside is pretty clear--it makes the interface a bit more advanced. I think there's been a major commitment to things like type states in the embedded ecosystem though, so it doesn't seem like a departure in that regard. My hope was that you'd never need to pass in global registers, but I think we can only "push it up." This example avoids passing in anything when working with a |
I've been toying around with gpio module to remove those However I'm not sure this is legal or not🥴 unsafe { &*(&self.afrl as *const _ as *const AtomicU32) } |
@Piroro-hs Does STM32F3 actually support atomic operations in peripheral memory space? I couldn't immediately find any mention one way or the other in documentation. |
@mvirkkunen ST's "STM32 Cortex®-M4 MCUs and MPUs programming manual" (PM0214) said nothing about such restrictions, so I think peripheral memory space (device memory) is treated like normal memory in LDREX/STREX operations. |
In the ARMv7-M architecture reference manual, A3.4.5 "Load-Exclusive and Store-Exclusive usage restrictions", it says:
So I don't think it could work in device memory. Though it might be worth using a normal AtomicBool that's shared between all the pins as a semaphore lock on the AFRH and AFRL, which would still prevent needing to pass them in, at the cost of some memory (maybe a pointer per pin, which sort of sucks, or maybe one static atomic per port, which might not be so bad...). Edit: having said that, I sort of bet it would work in practice, since I think for Cortex-M4 single core devices there's only a single exclusive access monitor which tags the entire memory space for exclusive access, which would probably cover the device memory... but it's strictly against the architecture reference manual. |
Oh, so sad... |
I'm sad too. It would be so nice to have a good solution for this, but nope. Everything has some issue... |
I'm sad as well, it's such an annoying problem to solve! I found this interesting earlier discussion, https://internals.rust-lang.org/t/atomic-cmpxchg-with-volatile-semantics/4101, which seems to confirm my suspicion that it does work on cortex-m4, and suggests it might even be an oversight in the architecture reference manual. But, as far as I'm aware the situation around volatile operations hasn't changed since then: using AtomicU32 would not emit volatile operations, so wouldn't be suitable here either (the compiler could, now or in the future, optimise around your atomic operations so they no longer work). I wonder if cortex-m could gain some new asm routines for volatile atomic operations... |
Updated to use precompiled assembly lib for volatile atomic register modification. |
By the way, I heard secondhand from an anonymous ARM engineer that ldrex/strex will work on device-type memory on Cortex-M4, even though the ARMv7-M reference says it wouldn't be allowed for the architecture in general. |
Is this a reasonable approach to "abuse" this hack? When it requires |
@Sh3Rm4n fwiw |
I'm not entirely sure if this approach has been suggested here before, if yes, please disregard... Following up on the 'two methods' approach, instead of creating the impl Pin {
pub fn into_push_pull_output(self) -> Pin<...> {
cortex_m::interrupt::free(|cs| {
self.into_push_pull_output_cs(cs)
})
}
pub fn into_push_pull_output_cs(self, cs: CriticalSection) -> Pin<...> {
// SAFETY: We are certainly in a critical section here
unsafe {
let regs = &mut (*self.gpio.ptr());
// ...
}
}
} This allows:
Of course this only works on single-core devices but are there any stm32f3's which are multi-core? Also it does not depend on an undocumented "feature" of the CPU... |
Summary
The current way we go about this is both un-ergonomic and unsound.
Current Usage
Today, the strategy for handling registers like these is to allow only the HAL to mutate them (via
pub(crate)
), and then assure that they are available for mutation from the user.For example if we want to put PA4 into an alternate function mode 2:
The
AHB
,MODER
, andAFRL
have no public API, so they cannot be mutated directly, but they must be (mutably) passed into thesplit
orinto_af2
so the HAL can modify them and correctly configure the pin.The Problem
Originally, my thought was a recommendation to improve ergonomics and hide more internal details from users. However, I realized that this strategy is not actually sound. The former leads to the latter, so we'll explore them in that order.
Ergonomics
If we do not return or require these registers it greatly simplifies the API for users. New users who are less familiar to MCU will be able to work in more concrete topics like peripherals, rather than registers.
Also, composition becomes much simpler. Now, peripherals that accept pins must either a) require the pin in an already correctly configured state b) require registers that are not related its own function so it can configure the pin.
The former requires the user know extra details, and the latter is a break of the Law of Demeter (reaching into a dependency).
If instead moving a pin into alternate function 2 was as simple as:
Then a peripheral could simply accept an unconfigured pin, and the user would never even know that an alternate function was required, further hiding details and simplifying user's code. HAL's across multiple chips and manufactures would look more similar and wouldn't require re-learning the nuances of each chip.
The two reasons that this hasn't been done before is likely not wanting to mask these details and the number of resulting
unsafe
blocks.It is not the job of a HAL to teach a user how their chip works. There should absolutely be resources to enable the curious! But an abstraction layer is about hiding. And the resulting
unsafe
blocks are not any less safe than today, due to soundness issues.Unsoundness
This strategy has been built on the premise that: If HAL users do not write unsafe code, and HAL contributors do not write unsafe code, then there will not be "unsafe" results. This is not true.
As a contributor I can write the following "safe" function:
Any user who setups up a peripheral, and then calls
evil
in their code (remember, they'll still be able to pass a mutable reference at this point) all pins that require an alternate function are now set incorrectly and no longer connected to their previous peripheral.The fact that maintainers would never allow this into the HAL is irrelevant. The important piece is that the borrow checker didn't not prevent this function from mutating pins it didn't own or have mutable access to.
Addressing this Issue
EDIT: The following has soundness issues, and is not a viable solution! Please see my next comment further down for a summary of the issue and safe ways to address it.
Going forward these registers should not be exposed or accepted. Instead
unsafe
blocks should be used, where ownership/mutable reference of the pin/peripheral is proof of safe access to particular bits within a register with "global" bits. Other mechanisms ensure that there is never more than one pin/peripheral/etc:The presence of the
unsafe
keyword is not a "bad thing." Instead it is a signal that this is a delicate area of the code that should be used with caution.The way that the stm32f3xx chips are designed simply doesn't allow for
svd2rust
to ensure safety at the peripheral layer. This is not anyone's "failing" it's simply a that we need to manually validate safety in certain places--it cannot be deferred to the borrow checker.The text was updated successfully, but these errors were encountered: