- 1. Basic idea
- 2. Structuring a
lilos
application - 3. How to think about
async
andawait
- 4.
lilos
executor and API contracts - 5. Interrupts and concurrency
- 6. How to do the thing you’re trying
- 6.1. Using an interrupt to wake a task
- 6.2. Giving other tasks an opportunity to run if ready
- 6.3. Doing something periodically
- 6.4. Doing something periodically without SysTick
- 6.5. Sending something to another task
- 6.6. Sending something to another task, but synchronously
- 6.7. Sharing a read-write resource between two or more tasks
- 6.8. Doing something only when all tasks are waiting
- 6.9. Getting
lilos
working on a different microcontroller
lilos
is a small operating system for embedded Rust applications. It’s
intended for applications that have real-time needs.
What makes lilos
unique:
-
It relies on Rust futures and
async fn
to implement cheap and flexible concurrency, without making you write explicit state machines. This means you can have more tasks in less RAM, and do complex things like have tasks split into multiple parts and rejoin, all with compiler checks. -
It provides a small but extensible set of OS constructs, like queues and mutexes, and makes it relatively easy for you to add custom ones.
-
lilos
concurrency happens almost entirely on the stack, which means concurrent tasks can freely borrow things from one another without requiring'static
orSend
. This also meanslilos
doesn’t need any sort of heap or arena allocator. -
lilos
APIs try to be as clear and simple to understand as possible. There are no magic macros or required code generation. -
You can write a useful
-based application that uses nolilos
unsafe
code.
lilos
was, as far as I’m aware, the first async
embedded Rust OS, derived
from a system I built in 2019. It’s been running in deployed systems since then,
and I’ve been gradually improving and fixing it as I use it for more and more
projects.
Note
|
lilos currently supports ARM Cortex processors (M0, M0+, M3, M4, M7, and
probably M33). I would be delighted to port it to a RISC-V processor and stop
being ARM-centric. Perhaps you could recommend a cheap dev board for me to buy?
|
lilos
is intended to be built into an application, which is a program you
write that does some sort of embedded microcontroller thing. lilos
itself is a
library (Rust crate) that you link into your application using Cargo. Once your
application’s main
function hands control to lilos
(using
lilos::exec::run_tasks
), lilos
takes over the CPU and manages concurrent
execution of your code until reset.
Applications are built out of tasks, which are the basic unit of concurrent
execution in lilos
(sort of like a thread). At the outermost layer, an
application has a fixed set of one or more tasks, which are Rust futures
(typically async fn
s) handed to lilos
in run_tasks
. Some of those tasks
are started right away, while others can be configured to start later in
response to events.
You can also have concurrency within a task. For instance, you can write code like this to cause a task to do some work, split into two independent pieces that run to completion, and then merge back together:
do_some_initial_work();
// Things A and B will run concurrently until both finish.
let (a_result, b_result) = join! {
work_on_thing_a(),
work_on_thing_b(),
};
do_something_with(a_result, b_result);
In the join!
block, the async functions work_on_thing_a
and
work_on_thing_b
will be interleaved, sharing CPU time until they both
complete.
This is the reason why lilos
is so useful: from a set of statically-defined
top-level tasks, you can create complex patterns of concurrency that change
dynamically.
lilos
tasks are managed by the executor, in the lilos::exec
module. It’s
the chunk of code that ensures tasks get CPU time when they need it, and mostly
don’t get CPU time when they don’t.
The executor schedules application tasks cooperatively, which means that a
task has to explicitly give up the CPU (by using, for example, await
) for
other tasks to run. This has some advantages:
-
You don’t have to think about preemption, and
-
Most race conditions are made more difficult, since each span of code between
await
points is effectively a free critical section.
Note
|
Of course, this has the drawback that code entering an infinite loop (which in this case includes panicking) will stop the whole executor. More on this later. |
To make this "free critical section" idea consistent, the executor also manages
the CPU’s interrupt controller to carefully control when interrupt service
handlers can run. By default, the executor will postpone any ISR from running
until the current task completely yields the CPU. This ensures that ISRs run
between tasks rather than preempting them, and makes ISR-task interaction a lot
easier to reason about. In the simplest configuration (using run_tasks
),
application code using lilos
won’t be preempted by anything.
Tip
|
While lilos always schedules tasks cooperatively, it is possible to
configure the executor to allow certain interrupts (or all interrupts) to
preempt your task code, for situations where you need tightly bounded latency.
This is an advanced technique, outside the scope of this guide, but if you’re
curious, see lilos::exec::run_tasks_with_preemption .
|
Here is an example that alternates between blinking two LEDs together, and
blinking them at totally different frequencies. This is sort of pseudocode
because I haven’t provided all the build system files and such to make it work,
but the code in the box below is actual lilos
code that could work if plugged
into the right scaffolding. (See the examples
folder in the repo for complete
worked examples.)
// We have two LEDs, named Led::A and Led::B.
// Make them both outputs.
make_pin_output(Led::A);
make_pin_output(Led::B);
// With that done, enter into our blinky-pattern loop.
loop {
// First we're going to blink the two LEDs together 10 times
// (for a total of 20 toggles). We'll make them blink at 5Hz,
// which means we need to sleep for 100 ms each time.
for _ in 0..20 {
sleep_for(Millis(100)).await;
toggle_pin(Led::A);
toggle_pin(Led::B);
}
// Now let's break into two concurrent state machines, one
// managing each LED, and blink them at different unrelated
// frequencies. For the next three seconds, A will toggle
// at delays divisible by 30, while B will toggle at delays
// divisible by 50; at any delay divisible by both 30 and 50,
// they will toggle near-simultaneously. (Note that this is
// very similar to the "fizzbuzz" cliche tech interview
// question.)
join! {
// A will go faster:
async {
for _ in 0..100 { // 100 * 30 = 3000
sleep_for(Millis(30)).await;
toggle_pin(Led::A);
}
done.set(true);
},
// B will go slower but finish at the same time:
async {
for _ in 0..60 { // 60 * 50 = 3000
sleep_for(Millis(50)).await;
toggle_pin(Led::B);
}
},
}
// We rejoin here with both async blocks complete,
// and continue our loop at the top.
}
(The join!
macro is from the futures
crate, if you’re curious.)
A lilos
application consists of the following parts:
-
A
main
function, or entry point, which is responsible for setting up any resources needed by tasks, and then startinglilos
. -
State shared between any two or more tasks.
-
One or more tasks, which are written as
async fn
s that take the state they need as arguments — either by value, for state they will own, or by reference, for state they will share with other tasks.
For very simple applications that consist of totally independent concurrent tasks, you can skip number 2. But for most applications, some kind of communication between tasks is important.
One of the things that makes lilos
unusual is that you can declare shared
state as local variables on main
's stack — safely. This has a lot of
advantages, but the main one is that it lets the compiler’s borrow-checking work
across tasks. To use the main alternative — putting state in static
— you
have to be somewhat careful to retain Rust’s guarantees.
Note
|
There are a lot of times when the advantages of having state in a static
outweigh the drawbacks, and I’ll touch on that in a later section.
|
The main
function of a lilos
application typically looks something like
this:
#[cortex_m_rt::entry] (1)
fn main() {
let cp = cortex_m::Peripherals::take().unwrap(); (2)
let p = set_up_some_hardware(); (3)
let shared_between_a_and_b = Cell::new(true); (4)
let alice = pin!(task_alice( (5)
&shared_between_a_and_b,
p.TURBOENCABULATOR,
));
let bob = pin!(task_bob( (6)
&shared_between_a_and_b,
p.LASER_SHARK,
));
lilos::time::initialize_sys_tick(
&mut cp.SYST,
16_000_000, (7)
);
lilos::exec::run_tasks( (8)
&mut [alice, bob],
lilos::exec::ALL_TASKS, (9)
);
}
-
The
entry
proc-macro fromcortex_m_rt
binds themain
function to the processor’s Reset vector, and ensures that everything’s set up the way Rust expects before startingmain
. -
Hardware setup usually wants access to the shared Cortex-M peripherals defined by the architecture reference manual. Here we use the
cortex_m
crate to get a handle to them that we can use below. -
Generally, some amount of hardware setup needs to happen before starting tasks. The most common example is adjusting the processor’s clock frequency or starting an external crystal oscillator, but this is also a handy place to configure pins or turn on peripherals that tasks will use. This step often produces a
Peripherals
object from the processor-specific PAC crate, which is shown here asp
. -
State shared between tasks can be created as local variables here. The types shared between tasks do not need to be
Send
orSync
, so we can use simple types with interior mutation likeCell
. (This is a core advantage of not letting tasks preempt one another except atawait
points.) -
task_alice
is initialized with a combination of state shared withbob
, and a peripheral that she will exclusively control (theTURBOENCABULATOR
). (We’ll come back to thepin!
macro below.) -
task_bob
gets the same shared state and a different exclusive peripheral. -
This configures the
lilos::time
module assuming that the Cortex-M SYSTICK timer is ticking at 16 MHz. This must be done before using other API fromlilos::time
. -
This starts the executor and runs
alice
andbob
concurrently, until reset. -
The "start mask" defines the subset of tasks to start immediately. It’s usually
ALL_TASKS
which, as its name suggests, starts them all.
Tasks in lilos
are async fn
s that will never complete. They return the
Infallible
type (from core::convert
).
Most tasks also want arguments, which provide them with resources and shared state.
A prototypical task looks like this:
async fn task_alice( (1)
shared: &MySharedState, (2)
owned: &mut SomeBuffer, (3)
turboencabulator: TURBOENCABULATOR, (4)
) -> Infallible { (5)
loop { (6)
frob(turboencabulator);
shared.wait_for_bob().await; (7)
}
}
-
Each task is usually written as an
async fn
. Thisasync fn
is actually a task constructor: you could call it twice to make two Alice tasks, unless it prevents that somehow. (This one does not.) -
Shared state is passed into the task constructor by shared reference (
&
). -
Owned-but-external state, such as large buffers, are passed by exclusive reference (
&mut
). -
You can also pass in resources by-value, like this
TURBOENCABULATOR
type, which is presumably from a Peripheral Access Crate since it disregards Rust style norms. This can help prevent a task constructor from being called more times than you intended, since there’s no way for the code that calledtask_alice
to get thatturboencabulator
back to do it again. (Unless you build one, of course.) -
The
async fn
for a task must never return. TheInfallible
type is the best way to describe this using only the standard library: it’s an enum with no variants, so it’s impossible to construct one, and so it’s impossible to return from this function. (You can stillpanic!
of course.) This ensures that theFuture
produced from theasync fn
will never complete. -
The easiest way to ensure that a task never completes is to use a
loop
. -
The
loop
should contain at least oneawait
point or equivalent macro (such asjoin!
,select_biased!
, orpending!
). Otherwise, it will never yield control to other tasks!
Tip
|
You can also write your task as an explicit Future if you’d prefer. It’ll
work fine. Just make sure type Output = Infallible .
|
You can get quite far while keeping all your state on the stack. However, you may run into cases where it breaks down. For me, this is almost always one of the following situations:
-
I’m using a lot of RAM, and I want to know if I’ve run out of RAM at compile time. (Stack usage isn’t measured at compile time, so if you run out, you find out with a panic at runtime.)
-
I have a variable that I want to inspect from a debugger, so I’d like it to be at a predictable place in memory with a predictable name.
-
I have a large buffer that I’d like to place somewhere specific. For instance, a lot of microcontrollers have several different RAMs that aren’t right next to each other; you might put the stack in one, and a large communication buffer in another, to get the most out of the chip. The other common reason I want to do this is to use DMA.
In all three of these cases, the state you’re stuffing into a static
may or
may not be shared between tasks. It’s often useful to put a single task’s own
state into a static
for visibility.
Rust has rules on the use of static
that help to avoid the most common race
conditions and other mistakes. These rules mean we have to do some extra
paperwork to put state in a static, in most cases.
The simplest case is putting an Atomic
type in a static
. These types are
thread-safe and use interior-mutability, so Rust is totally chill with them
being static
(rather than the more restricted static mut
). Putting an
AtomicUsize
in a static
is trivial, and so is sharing it across tasks:
static EVENT_COUNTER: AtomicUsize = AtomicUsize::new(0);
async fn task_alice() -> Infallible {
loop {
some_event().await;
EVENT_COUNTER.fetch_add(1, Ordering::Relaxed);
}
}
async fn task_bob() -> Infallible {
loop {
sleep_for(Millis(1000)).await;
print(EVENT_COUNTER.load(Ordering::Relaxed));
}
}
(You could also pass each task a &AtomicUsize
rather than having them hardcode
the static
, of course.)
To static
more complex things safely — things that need to be static mut
— there’s a pattern that builds on this foundation. The core issue with static
mut
is that any code that can see the variable (in terms of scope) can try and
poke it to generate a &mut
. If you do this in two places, you’ve now got two
&mut
references pointing at the same thing, which is Bad And Wrong — &mut
needs to remain exclusive. You can defend against this by using a pair of
static
variables and a pinch of unsafe. Here’s a case where we want a 1 kiB
buffer to be static
:
fn get_the_buffer() -> &'static mut [u8; 1024] { (1)
static TAKEN: AtomicBool = AtomicBool::new(false); (2)
if TAKEN.swap(true, Ordering::SeqCst) { (3)
// This function has been called more than once,
// which would produce an aliasing &mut.
// Just Say No!
panic!();
}
// If we get to this point, the check above passed.
// That means we're the first to execute this code since
// reset! That in turn means we can safely produce a
// &mut to our buffer and know it will be unique.
{
static mut BUFFER: [u8; 1024] = [0; 1024]; (4)
unsafe { &mut BUFFER } (5)
}
}
-
Because the buffer is
static
, we can return a reference with the'static
lifetime. Doing anything else is complex and I don’t recommend it. -
Define an
AtomicBool
that records whether our buffer has been "taken" by a call to this function. Because it’s defined inside the function, we only have to read this one function to see all possible uses of the variable and convince ourselves that we’ve done the right thing. -
This will return
true
on the second time we call this function, causing us to panic. We’ve exchanged compile-time borrowing checks (which we get for free for state on the stack) for runtime borrowing checks. (There’s not really a great alternative to this, since the compiler is very conservative aboutstatic
.) -
By declaring the
BUFFER
inside this function, we again ensure that only code written write here can potentially access it. By opening an anonymous scope on the line just above, we also guarantee that no code earlier in the function can access it — so if you tried to touchBUFFER
before checkingTAKEN
, you’d get a compile error. Overkill? Arguably. But I’m allergic to bugs. -
Using
unsafe
, we assert to the compiler that we have checked all the preconditions for producing a&mut
referring toBUFFER
. Which, in this case, we have.
This pattern covers the vast majority of uses of static
. The main exception is
if you want to build an array out of a type that is not Copy
, or if the
initializer expression you want to use to initialize your static
is not
const
.
There’s a sneaky trick for getting around the Copy
limitation for initializing
arrays: array literals actually allow any Copy
value or any const
. So
this works:
struct MyTypeThatIsNotCopy;
static STATE: [MyTypeThatIsNotCopy; 256] = {
const X: MyTypeThatIsNotCopy = MyTypeThatIsNotCopy;
[X; 256]
};
…where [MyTypeThatIsNotCopy; 256]
would fail. Weird, huh? But useful.
Initializing a static
from a non-const
expression is more involved, and for
now I’m treating it as out of scope for the intro guide.
Some documentation of Rust async
and await
has presented it as a seamless
alternative to threads. Just sprinkle these keywords through your code and get
concurrency that scales better! I think this is very misleading. An async fn
is a different thing from a normal Rust fn
, and you need to think about
different things to write correct code in each case.
Here is how I think about fn
vs async fn
:
-
A Rust
fn
is a function that will execute until it decides to stop executing (ignoring things like threads being preempted), or until it’s interrupted by a panic. In particular, its caller gives up control by calling it, and cannot decide to "un-call" it halfway through. (And likewise, if yourfn
calls anotherfn
, you give up control to thatfn
, which can decide to enter an infinite loop orpanic!
.) -
A Rust
async fn
is an explicit state machine that you can manipulate and pass around, that happens to be phrased using normal Rust syntax instead of tables andmatch
statements. It generates a hidden type implementing theFuture
trait. The code that calls anasync fn
(or uses anyFuture
, for that matter) has ultimate control over thatFuture
, and can decide when it runs or doesn’t run, and can even discard it before it completes.
This distinction is subtle but very important: an async fn
represents an
inversion of control compared to a normal fn
.
If you wrote an explicit state machine by hand, this distinction would be clear in the code. For instance, here’s a simple one:
#[derive(Default)]
enum State {
#[default]
Begin,
PinHigh,
PinLow,
Done,
}
impl State {
/// Returns `true` if it completes, `false` otherwise.
fn step(&mut self) -> bool {
match self {
Self::Begin => {
set_pin_high();
*self = Self::PinHigh;
false
}
Self::PinHigh => {
set_pin_low();
*self = Self::PinLow;
false
}
Self::PinLow => {
tristate_pin();
*self = Self::Done;
false
}
// Our terminal state:
Self::Done => true,
}
}
}
State machines like this are almost universal in embedded systems, whether they’re phrased explicitly or left implicit. Drivers that have a combination of API entry points and interrupt service routines, for instance, form this kind of state machine. This toy version is written to be small enough to pick apart.
Each time the code that owns your State
calls step
, your code gets the
opportunity to do stuff. At the end of that stuff, it returns, and the calling
code regains control. It can then keep calling step
until it gets true
,
indicating completion; or it could do something else and never call step
again; or it could drop
your state. (Note that it can also choose to keep
calling step
even after getting the true
result! It’s very much in control
here.)
How long will the high and low periods on the pin last? Well, how often will the
caller call step
? Sometimes this is defined by a contract (e.g. "this state
machine advances every 100 ms"), but in this code example, we haven’t done
anything to control timing. The caller could call step
in a loop
and make
the high/low periods as short as possible, or it could sleep for months in
between calls…or never call step
again.
What will the final state of the pin we’re controlling be? Currently, we can’t
say. The caller could leave us paused forever without calling step
, or could
drop us before we finish. So the final state of the pin could be high, low, or
tristate, depending on what the caller chooses. We could make this
better-defined by adding a Drop
impl, so if the caller were to drop
the
State
before it finishes, the pin would do someting predictable:
impl Drop for State {
fn drop(&mut self) {
if !matches(self, Self::Done) {
tristate_pin();
*self = Self::Done;
}
}
}
But if your caller decides to hang on to State
and never call step
, there’s
not really anything State
itself can do about this.
And you want it this way. Really. Keep reading.
That might sound bad, but it’s really powerful. For instance, imagine that your caller looks like this:
let mut state = State::default();
loop {
wait_for_a_key_press();
let done = state.step();
if done { break; }
}
If we want to step
every time the user presses a key, then we have to accept
the possibility of never step
-ping — because we can’t force the user to
press a key! Being able to create a state machine and have it sit around waiting
forever, at very low cost, is part of the power of writing explicit state
machines.
Writing explicit state machines in "long-hand" like this is error-prone
and complex. Let’s rewrite the running example as an async fn
. (The pending!
macro is from the futures
crate, and yields to the caller without waiting for
any particular event. It contains an await
.)
async fn my_state_machine() {
set_pin_high();
pending!();
set_pin_low();
pending!();
tristate_pin();
}
That doesn’t reproduce the Drop
behavior if we’re cancelled. To do this in an
async fn
you need to have something in the body of the function that will
perform an action when destroyed. You can roll this by hand, but, I recommend
the scopeguard
crate and its defer!
macro:
async fn my_state_machine() {
set_pin_high();
// Now that we've set the pin, make sure
// it goes tristate again whether we exit
// normally or by cancellation.
defer! { tristate_pin(); }
pending!();
set_pin_low();
pending!();
// Pin gets tristated here
}
That’s dramatically less code. It’s also much easier to check for correctness:
-
You can tell at a glance that there’s no way to return to an earlier state from a later one, since doing so would require a
for
,loop
, orwhile
, and there isn’t one here. -
You can see (once you’ve read the docs for the
defer!
macro) that, as soon as the pin gets set high and before we yield control back, the state machine will ensure that the pin gets tristated at the end, no-matter-what. You don’t have to go hunting for a separateDrop
impl.
Often, an application winds up requiring a hierarchy of state machines.
Imagine that you wanted to take the pin-toggling state machine from the previous
section, and ensure that it waits a certain minimum interval between changes. If
the OS provides a "sleep for a certain time period" state machine (as lilos
does) then the easiest way is to plug that into your state machine. Its states
effectively become sub-states within one of your states. This is
composition.
In a hand-rolled state machine, this is hard enough to get right that I’m not going to present a worked example. (Try it if you’re curious!)
But with a state machine expressed using async fn
, it’s trivial, because we
have an operator for it: await
. await
is the most common state machine
composition operator (though not the only one!). It says, "take this other state
machine, and run it to completion as part of my state machine."
And so, we can add sleeps to our pin-toggler by changing our pending!()
to
instead await
a reusable sleep-for-a-duration state machine:
async fn my_state_machine() {
set_pin_high();
defer! { tristate_pin(); }
sleep_for(Millis(100)).await;
set_pin_low();
sleep_for(Millis(100)).await;
// Pin gets tristated here
}
This will ensure that a minimum of 100 ms elapses between our changes to the pin. We can’t impose a maximum using this approach, because — as we saw above — our caller could wait months between stepping our state machine, and that’s part of what we’re signing up for by writing this state machine.
Composition and cancellation interact in wonderful ways. Let’s say you’re using
some_state_machine
and you’re suspicious that it might take more than 200 ms.
You’d like to impose a timeout on it: it will have 200 ms to make progress,
but if it doesn’t complete by the end of that window, it will be cancelled
(drop
-ped).
lilos
provides a "future decorator" for this purpose: with_timeout
. It’s a
function that takes any future as input, and returns an altered future that
won’t be polled past a certain time.
match with_timeout(Millis(200), some_state_machine()).await {
Some(result) => {
// The state machine completed successfully!
print(result);
}
None => {
// The timeout triggered first! Do any additional
// cleanup you require here.
}
}
Tip
|
There are many other ways of doing this, such as using the
select_biased! macro from the futures crate; with_timeout is cheaper.
|
This is the sort of power we get from the async fn
ecosystem. Doing this with
hand-rolled state machines is probably possible, but would be complex — and
we haven’t even talked about borrowing and lifetimes. That’s a bigger topic
than will fit in this doc, but the short version is: borrowing across await
points in an async fn
pretty much Just Does What You’d Expect, but getting it
right in a hand-rolled state machine requires unsafe
and gymnastics.
From my perspective, this is the fundamental promise of async fn
: easier,
composable, explicit state machines.
If a chunk of code absolutely needs to run to completion without letting
anything else run, use a normal fn
. If a chunk of code doesn’t need to call
any async fn
s, use a normal fn
. Basically, any function that can be
written as a normal fn
without breaking something, should be. It’s easier.
But if you need to write a state machine, use async fn
. It’s harder to
understand than normal fn
because of the inversion of control and potential
for cancellation, but far easier to understand than the code you might write by
hand to do the same thing!
Caution
|
There’s a proposal to make code generic on whether or not it’s being
used async , so that the same code could produce both a simple function and a
Future . In this case you’d have to make sure to think about correctness in all
possible ways your code could be used. I am suspicious, and I hope after reading
this section, you are too.
|
To be able to reason about the behavior of a program written using async fn
,
it’s important to understand the fundamental promises made by the async
runtime that underlies it. These promises will apply to the outermost futures
(in lilos
, the top-level tasks), and will by default apply to the futures
composed within those futures unless the code does something to alter the
behavior.
I like to be able to make statements like "my program can’t do X" and not turn
out to be wrong later, so I’ve tried to specify lilos
's behavior pretty
rigorously. The API docs are, as always, the authoritative definition, but this
section will summarize the important bits.
If you give a future to the lilos
executor in the top-level tasks array, the
executor will:
-
Poll it promptly when it receives an event.
-
Generally not poll it when it has not received an event, but, no guarantees.
"Receives an event" here means that the top-level future, or any future
contained within it, blocked waiting for an event like a Notify
or a queue,
and that event got signaled.
This means, if you plug a future into the top-level tasks array, you can assume it will be polled at approximately the right times, and not dropped unexpectedly, or ignored for months for no reason.
Each time it processes the task array, the executor polls the futures in the order they appear. This means the event response latency for the first task in the array will be slightly better than the latency for the 400th task in the array. This may be relevant if your application is latency-sensitive.
Tip
|
The executor reserves the right to poll your task future sometimes even
when a relevant event has not occurred. These are called spurious wakes. The
ability to generate spurious wakes is actually critical to the implementation of
the executor, for reasons that are described in the executor code if you’re
curious. This is why the lowest-level event APIs like Notify always take a
condition predicate, to tell if the event they’re waiting for has really
happened.
|
All futures produced by the lilos
public API — which includes every pub async
fn
in the lilos
crate — should have well-defined behavior on cancellation.
Dropping a lilos
API future without polling it, or without polling it to
completion, should never lose data or corrupt state. The intent is that the APIs
adhere to the following definition of "cancel-correct:"
Calling an
async fn
and dropping the returned future before it completes should have no relevant side effects beyond dropping any values passed into theasync fn
as arguments.
I snuck the word "relevant" in there because it will obviously have some side effects. At the very least, it will burn CPU time and mess with memory. It might increment some event counters behind the scenes. But from the perspective of a caller, it should be fine to drop the future and then retry the operation without having to think about it.
The exception made for arguments passed into the async fn
exists because
there’s no good way to get the arguments back out on drop. So if you pass
ownership of, say, a peripheral into an async fn
, and then you throw that
async fn
away… well, you’ve thrown away access to the peripheral too. In
general, if there’s any chance you’ll want to cancel and retry an operation, it
should take its resources by reference.
Using lilos
APIs from interrupt handlers is nuanced.
In the default configuration (an application started using run_tasks
without
any fancy preemption options), interrupt handlers don’t preempt task code. In
this situation, you can squint and treat interrupt handlers as an additional
task, albeit one that isn’t async
.
On Cortex-M processors, the default interrupt controller configuration also stops interrupt handlers from preempting each other.
In this situation, it’s safe to use a surprisingly broad set of lilos
's APIs from
interrupt handlers. However, it’s kind of hard to actually access the APIs.
A small subset of core lilos
types are Sync
and can be stored directly in a
static
, for sharing with interrupt handlers. Notify
is the main one, and is
the example discussed in the section Using an interrupt to wake a task. This is the easy case.
Fancier things like mutexes are, perhaps surprisingly, not Sync`
in
indicates whether a type can support simultaneous shared
access from multiple threads with potentially arbitrary preemption and
interleaving of operations; we don’t have to support that on lilos
.
This is because `Synclilos
because our
tasks aren’t threads, and this simplifies the implementation dramatically.
It’s possible to share these types with interrupt handlers in a limited fashion safely, but I don’t currently have a worked example of this because it’s a very niche requirement, in my experience.
By configuring the interrupt controller, you can arrange for interrupt handlers
to be able to preempt one another even if they can’t preempt lilos
tasks. On
the cortex_m
crate this requires some unsafe
code, so you won’t do it by
accident.
Once you’ve done this, assume that lilos
APIs are only safe to use from the
lowest priority interrupt handlers — that is, the ones that aren’t going to
be preempting another handler. There are exceptions, in particular Notify
,
which is always safe.
By configuring the interrupt controller appropriately and starting your
application with run_tasks_with_preemption
, it’s possible to allow a subset
of interrupt handlers to fire even while your tasks are running. Any interrupt
handlers that you allow to do this must be careful with what lilos
API they call.
Unless stated otherwise, assume that they only have access to Notify
.
The most common example of this is allowing the SysTick
interrupt handler to
preempt application code. lilos
uses SysTick
to maintain the OS timer, and its
SysTick
interrupt handler is carefully written to be safe when preempting task
code. If tasks do more than about a millisecond of computation between yielding
with await
points, the SysTick
handler may be delayed, and the OS may lose
time.
For instance, setting SysTick
to the highest priority and allowing it to
preempt tasks would look like this:
// ... in the application main fn ...
let mut cp = cortex_m::Peripherals::take().unwrap();
// ... other stuff ...
unsafe {
// Set to the highest priority.
cp.SCB.set_priority(SystemHandler::SysTick, 0); (1)
}
// set up tasks...
// run the executor
unsafe {
lilos::exec::run_tasks_with_preemption( (2)
tasks_array,
lilos::exec::ALL_TASKS,
lilos::exec::Interrupts::Filtered(0x80), (3)
)
}
At <1> we override the default priority (which is all-1s) to zero, the highest.
When starting the executor at <2>, we use run_tasks_with_preemption
, which
requires unsafe
because it requires you to have thought through your
application architecture in terms of preemption. (In this specific case, it’s
probably fine for any application, but once other interrupt handlers are
involved, you’ll want to be careful.)
Passing Filtered(0x80)
at <3> masks interrupts of priority 0x80
and lower
(numerically greater) while tasks are running. This leaves the priorities
between 0 and 0x7F
available for preempting interrupt handlers. Note that the
number of bits implemented in the priority field on Cortex-M is vendor
dependent, so you can’t just pass 1
here and expect it to work for "any
priority lower than 0."
lilos
has extensive API documentation, which is always the most up-to-date and
complete source for information about the APIs. To view it from a local clone of
the lilos
repository, enter the os
subdirectory and run:
cargo doc --open
This section will give a higher-level tour of the APIs you might use while building an application, organized by the problem they solve.
Note that lilos
uses Cargo features to control which parts of its API are
built. By default, lilos
will build with all the toppings. You can opt out of
this and request individual features a la carte if you like.
lilos::exec::Notify
is what you want for this.
Notify
is a very small (8 bytes), very cheap object that is designed to hang
out in a static
and synchronize task code with events. Those events usually
come from interrupts, though Notify
is also used under the hood to implement
most other inter-task-communication APIs in lilos
.
Note
|
Notify doesn’t have to be in a static , it’s just often convenient
for it to be in a static .
|
Here’s an example of using Notify
to synchronize with an interrupt when
sending a byte out a UART. This is a simplified and platform-generic version of
the code in the UART-related examples in the repo; see those examples if you
want more.
static TX_EMPTY: Notify = Notify::new(); (1)
/// Sends a byte, waiting if the UART is busy.
async fn send_byte(uart: &Uart, byte: u8) {
if uart.status.read().tx_empty().bit_is_clear() { (2)
// Uh-oh. There's still something in the UART's TX
// register, which means it's still working on the
// _last_ byte we gave it. With a fast CPU and a
// slow serial port, this could take a long time!
// Let's block until/ the hardware says it's done.
uart.control.modify(|_, w| { (3)
w.tx_empty_irq_enable().set_bit()
});
TX_EMPTY.until(|| {
uart.status.read().tx_empty().bit_is_set() (4)
}).await;
}
// tx_empty is set, so, we can stuff the next byte in!
uart.transmit.write(|w| w.bits(byte));
}
#[interrupt] (5)
fn UART() {
// Get access to the UART from the ISR. Because it's a shared reference
// this is almost always okay.
let uart = unsafe { &*my_device_pac::UART::PTR };
let control = uart.control.read();
let status = uart.status.read();
if control.tx_empty_irq_enable().bit_is_set() { (6)
if status.tx_empty().bit_is_set() {
// The send_byte routine is blocked waiting to hear from us.
// Keep the interrupt from reoccurring:
uart.control.modify(|_, w| {
w.tx_empty_irq_enable().clear_bit() (7)
});
// And signal the task:
TX_EMPTY.notify(); (8)
}
}
}
-
We declare a
Notify
atstatic
scope where both ourasync fn
and the interrupt handler can see it. I generally name theNotify
after the hardware event it represents. -
Check UART status before attempting to send, to find out if it’s still working. This is an optimization; you could also do the enable-interrupt-and-wait sequence unconditionally. That code would be correct, but slower in cases where there’s no need to wait.
-
Alter the UART configuration to generate an interrupt when
tx_empty
gets set. -
Use
Notify::until
to wait for the event.until
takes a predicate function to tell when to wake up; here, we check the same status bit we read before to see when it gets set. It’s important to do this check, because it’s entirely possible (and sometimes useful) for tasks to wake spuriously. This makes sure the condition we think we’re waiting for has actually happened. -
Peripheral access crates for microcontrollers in the
cortex-m-rt
ecosystem defineinterrupt
proc-macros for marking functions as ISRs. Since this example is generic, this pretends we’re targeting a micro with an interrupt named "UART." -
Interrupts can happen for a variety of reasons, and can be spurious. More complex interrupt handlers than this one usually wind up handling a variety of different conditions in the same routine. Here we check for the interrupt-enable bit that we set above to decide whether to act on the
tx_empty
status bit. This is technically overkill for the example, but becomes really important as soon as you also want to (say) receive data! -
If the event has occurred, we clear its interrupt-enable bit at the UART to keep this ISR from triggering again (at least, due to that particular event).
-
This signals any tasks waiting on the
Notify
that they should check the condition they’re monitoring. In our case, becausetx_empty
is set (we checked!), this will cause the suspendedsend_byte
routine to wake and finish processing.
Note
|
The send_byte sketch above is cancel-safe because the type of byte
(u8 ) is Copy . It’s written so that transmitting the byte happens after all
await points. This means that it either transmits the byte and completes, or
does not transmit the byte and the caller can retry (using a copy of byte ).
|
If you want to temporarily pause an async fn
to give any other pending tasks a
chance to run, but without yielding the CPU for more time than necessary, use
either lilos::exec::yield_cpu
or the futures::pending!
macro.
Here’s how to use yield_cpu
to periodically give other tasks a chance to run
during a large mem-copy, which would otherwise burn the whole CPU until it
finishes (because it’s all synchronous code):
async fn polite_copy(source: &[u8], dest: &mut [u8]) {
assert_eq!(source.len(), dest.len());
for (schunk, dchunk) in source.chunks(256).zip(dest.chunks_mut(256)) {
dchunk.copy_from_slice(schunk);
// Every 256 bytes, pause briefly and see if anyone else
// is ready to run.
lilos::exec::yield_cpu().await;
}
}
futures::pending!()
is more or less equivalent to
lilos::exec::yield_cpu().await
. I prefer yield_cpu
because it makes the
await
visible to the reader, but do whatever feels best to you!
Note
|
If you need to do a large RAM-to-RAM bulk copy, and are concerned about
impacting event response times, it’s often convenient to do it with DMA — freeing the CPU and avoiding the need to yield_cpu .
|
The easiest way to do something periodically is with the lilos::time
module,
which uses the SysTick timer common to all ARM Cortex-M CPUs.
Tip
|
lilos::time is available if lilos was built with the systick feature,
which is on by default.
|
To use this module, make sure you’re calling lilos::time::initialize_sys_tick
in your main
function!
For precisely timing a periodic task in a loop, use lilos::time::PeriodicGate
.
let mut gate = PeriodicGate::from(Millis(100));
loop {
gate.next_time().await;
toggle_a_pin();
}
PeriodicGate
will try to minimize drift by always computing the "next time" in
terms of the previous time, no matter how long you spend doing other actions in
this iteration of the loop. So, this example will call toggle_a_pin
every 100
ms, even if it takes 50 ms to run.
If what you actually want is to make sure that a minimum amount of time passes
between two operations, you’re looking for lilos::exec::sleep_for
instead:
loop {
sleep_for(Millis(100)).await;
toggle_a_pin();
}
If toggle_a_pin()
takes 50 ms to run, this loop will call it every 150 ms
instead of every 100 ms.
If you want to do something periodically, but you don’t want to use the SysTick timer to do it, you will want to set up some hardware timer (provided by your microcontroller) and use interrupts as described in the section Using an interrupt to wake a task.
Why would you want to do this? In my case it’s usually one of two reasons:
-
I’m on a device where idling the CPU in its lowest power state stops the SysTick timer from counting, so it loses time. The Nordic nRF52 series of microcontrollers behave this way.
-
I need timing more precise than milliseconds. The
lilos
default time unit is a compromise choice: the ARM SysTick timer has the advantage of being very portable, but it essentially requires an interrupt per tick to do accurate time keeping. So we configure it to tick at 1 kHz to reduce interrupt load.
Tip
|
If you’re cool with requiring the tasks to synchronize — that is, the sender will wait until the receiver is ready to receive, and vice versa — then see the next section for a cheaper and easier option. |
If you need to send things from task A to task B, the most general option is the
single-producer single-consumer queue in lilos::spsc
. This covers cases like:
-
Task A will generate bursts of events intermittently, and task B wants to process them gradually at its own pace.
-
Task A will generate events at regular but variable paces, and task B wants to consume them in large periodic batches.
…in addition to the simple case of "A wants to send a thing to B."
Tip
|
lilos::spsc is available if lilos is built with the spsc feature,
which is on by default.
|
If you need to send things from task A to task B, and it’s okay to make the two
tasks synchronize each time they want to exchange data, then the
lilos-handoff
crate is your new best friend. Creating a Handoff
doesn’t
require any storage, and exchanging data using a Handoff
guarantees to only
copy your data in memory once — unlike spsc
, which copies data at least
twice: once on the way in, once on the way out.
If you just want the sender to wait while the receiver goes on doing its work,
have a look at the try_pop
operation on lilos_handoff::Pop
.
Tip
|
lilos_handoff is not part of the core API. Use cargo add lilos-handoff
to add it to your project.
|
If two or more tasks need access to a resource, and they all want to have
&mut
-style access (but not at the same time, because &mut
), you probably
want lilos::mutex
.
Tip
|
lilos::mutex is available if lilos is built with the mutex feature,
which is on by default.
|
Note
|
lilos 's mutex API is somewhat unusual, and attempts to make it harder for
applications to accidentally build cancel-unsafe code on top of it. See the
module docs for details.
|
If you want to run some code only when there’s nothing else to do, you can
provide a custom idle hook to lilos
by starting the executor using
lilos::exec::run_tasks_with_idle
. The default idle hook just contains the WFI
instruction that sleeps the processor until the next interrupt. If your
processor needs other care when going to sleep (setting some bits in a register,
turning off something expensive, reading a bedtime story) the idle hook is the
right place to do it.
Two things to note:
-
Like task code, the idle hook will be run with interrupts off. This is okay because the WFI instruction will resume if a pending interrupt arrives, even if interrupt handler execution is currently disabled.
-
You can’t use
async fn
in the idle hook because, by definition, it runs only when noasync fn
has anything to do.
Tip
|
I like to install an idle hook that sets a pin low, calls
cortex_m::asm::wfi() , and then sets that same pin high. By monitoring the pin
with a logic analyzer, I can see how often the CPU is idle — the pin will be
high when any task is running, and low when nothing is running. Having the logic
analyzer compute "average duty cycle" of the signal gives me CPU utilization
percentage — for nearly free!
|
There are worked examples in the repo for a bunch of different microcontroller platforms — mostly RP2040 and various STM32s — but maybe you’ve got something different!
If the microcontroller in question is an ARM Cortex-M based system, and you can
successfully compile a basic embedded Rust program for it (say, a main
that
just panics), then lilos
should work out of the box. lilos
has no
dependencies on any features of the microcontroller except those specified by
ARM.
If the microcontroller is particularly oriented toward low-power applications,
you may want to consider disabling the time
feature so that lilos
doesn’t
expect the SysTick to be configured. Nordic nRF52 micros in particular benefit
from this. (There’s not a worked example for the nRF52 in the repo, but I am
using them in several projects with lilos
.)
On the other hand, if the microcontroller is not an ARM Cortex-M … that’s going to be significantly harder.
-
If it’s a 32-bit RISC-V with the standard interrupt controller, I’m actually pretty interested in porting
lilos
— email me. -
I haven’t really thought about other 32-bit microcontrollers. As long as it’s supported by rustc, I’m open to it. I love learning about unusual microcontrollers. Email me.
-
If it’s 64-bit, that’s…probably feasible? But less obviously useful? I’d be curious to hear about your application.
-
I am uninterested in ports to 16- and 8-bit CPUs, and there are parts of the executor’s implementation that will be difficult to get working on such CPUs because of assumptions about atomic types. But, good luck to you!