# Parallella (part 6): FreeRTOS

Intro

FreeRTOS is a real-time operating system kernel for embedded devices. Usually, we would use it for tiny-time-constraint applications where reliability is at utmost importance, so why bother about RTOS at all?

It’s true… "Accelerator" is probably the most common use case for Epiphany chip. We aim to write high-performance and parallel applications, where slow parts and coordination is left for the host (ARM) CPU. But as we go higher in the stack, we start to see that many operations are also I/O bounded. In such a case, we would like to share the CPU with some other task rather than waiting and wasting it. How?

This is where a kernel comes into play. In a nutshell, it provides resource management capabilities, such as multi-tasking or multi-threading, where CPU-time is split fairly between running tasks so that overall utilization increases. Of course, there is an additional cost to multi-tasking, and often writing applications that leverage native API (ie. e-libs) is more performant, but watch out for complexity! You need to decide what tradeoff is most suitable for your application.

In this article we’ll do a few things:

  • port FreeRTOS on the Epiphany architecture
  • write a simple multi-tasking application (two tasks running on one eCore)
  • dive into basic kernel theory (i.e., context switching)

One may say that running FreeRTOS on Epiphany doesn’t make much sense (performance, use case, low memory limits, etc.)… If you believe so, you can treat this post as a tutorial on how to port a kernel on a new platform. I couldn’t find anything that specific, so here we go!

Getting started

ralisi first ported the FreeRTOS on Epiphany. To simplify a few things, I have introduced a few changes in this fork:

git clone https://github.com/mkaczanowski/FreeRTOS

Throughout this article, I reference the fork, so you should clone the repo, as shown above.

Requirements

To compile the FreeRTOS kernel you need to have epiphany SDK setup, this is:

  • e-gcc compiler/linker
  • e-libs
  • hdf / linker files

We used all of that before, so checkout previous articles if you don’t know how to proceed.

MISRA Compliance

It’s worth to note that FreeRTOS follows MISRA C standard guidelines. While there are many rules, I think it’s good to know that functions are being prefixed with a few characters denoting its return type. Once you know that, it’s easier to read the code. For example:

  • vTaskDelayUntilv stands for void return type
  • prvSetupTimerInterruptprv stands for private, thus static

Structure

There are two locations we’ll look at:

  • FreeRTOS/Source/portable/GCC/Epiphany – this is where we’ll write architecture-specific code. It is the lowest layer that interacts with hardware (i.e., registers, timers, interrupts, etc.).
  • FreeRTOS/Demo/Epiphany – this is where we place our demo application

You can test out demo application right away, by running:

cd FreeRTOS/Demo/Epiphany
make

While working with the code, I’d recommend resetting cores before/after each FreeRTOS app run. Depending on the kernel version you might encounter "unusual behavior," so a quick reset removes false positives:

e-reset

Debugging

Again, we are going to fragment memory, mangle stacks, and deal with interrupts. As for a single application, the "printf debugging" approach is all right. But here it’s going to be a pain… Instead, I would rather use gdb that allows us to control the execution on Epiphany Core, peek into registers, and perform ad-hoc operations.

You can find instructions on how to use it here:

https://github.com/parallella/parallella-examples/tree/master/gdb-tutorial

To be honest, running gdb on remote chip ain’t pleasant… Compared to traditional usage where one or two commands are sufficient, here we have process extended by at least a few operations. While it all could be better, I encourage you to use gdb anyway. Simply it’s worth it and saves you a ton of time!

Goals

Before we dive into the RTOS kernel, let’s recall what we trying to acheive:

Our Demo application is about to execute two separate tasks running on a single eCore, where each task periodically increments a counter placed in local memory. On the host side, we fetch and print counters.

The important (and most problematic) thing here is how to write task preemption logic, but we’ll talk about that later. For now, this is how the demo code:

void addTask (void* pvParameters) {
    TickType_t xLastWakeTime = xTaskGetTickCount();
    volatile unsigned long long *count = (void *)0x7020;

    while(1) {
        *count = *count + 10;
        vTaskDelayUntil( &xLastWakeTime, 15 );
    }
}

void addTask2 (void* pvParameters) {
        ...
        *count = *count + 3;
        ....
}

int main(void) {
    ... 
    xTaskCreate( addTask, ( signed char * ) "add", configMINIMAL_STACK_SIZE, NULL, tskIDLE_PRIORITY+1, ( xTaskHandle * ) NULL );
    xTaskCreate( addTask2, ( signed char * ) "add", configMINIMAL_STACK_SIZE, NULL, tskIDLE_PRIORITY+1, ( xTaskHandle * ) NULL );

    vTaskStartScheduler();

    return EXIT_SUCCESS;
} 

I’d say the code is self-explanatory, but here comes help:

  • xTaskCreate – creates a new runnable task. Each task requires RAM that is used to hold the task state, and used by the task as its stack. In our case, the stack is allocated from the FreeRTOS heap
  • vTaskStartScheduler – starts the RTOS scheduler. After calling, the RTOS kernel has control over which tasks are executed and when.
  • vTaskDelayUntil – delays a task until a specified time. It’s the easy way to "yield" the task and let the scheduler pick the next candidate in the queue.
  • configMINIMAL_STACK_SIZE – defines how much space to allocate for the stack (we’ll define this constant later)
  • tskIDLE_PRIORITY – otherwise 0, denotes the task priority (used by the scheduler)

FreeRTOS building blocks

Working with modern kernels is often a complex task, and we saw that while working on the Epiphany Linux Kernel module. It takes much time to get around the documentation, concepts, and API because the kernel is very rich in features.
The FreeRTOS is different. It’s by order of magnitudes smaller (as most of RTOS kernels) and provides only bare minimum kernel functionality. That makes it a great candidate to study as it’s "components" are present in other kernels, but here are basic enough to get a grasp on them in a reasonable time.

Task

A real-time application that uses an RTOS can be structured as a set of independent tasks. Each task executes within its context with no coincidental dependency on other tasks within the system or the RTOS scheduler itself. Only one task within the application can be executing at any point in time, and the real-time RTOS scheduler is responsible for deciding which task this should be. The RTOS scheduler may, therefore, repeatedly start and stop each task (swap each task in and out) as the application executes. (source)

A task in FreeRTOS is represented as task control block (TCB) structure that holds its context. By looking at the soruce we can weed out properties of interest, such as:

  • pxStack, pxTopOfStack, pxEndOfStack – determines shape of the stack
  • xGenericListItem, xEventListItem – references the task in queus (such as blocked / run queue)
  • uxPriority – task priority used for scheduling

Okaaaay, stack, queues, scheduler… We don’t know what that is yet, so lets follow the trail of xTaskCreate with the hope we find some other clues:

xTaskCreate -> prvAllocateTCBAndStack -> prvIntialiseTaskLists -> prvInitaliseTCBVariables -> prvAddTaskToReadyQueue

It seems the xTaskCreate does three things: initialize a task control block (TCB), allocate stack, and make the task ready for execution (add to ready queue). Once it’s all done, our memory may look like this:

RTOS Stack

I said "may" because things depend on a linker script and FreeRTOS configuration (ie. static vs dynamic allocation). To be precise in our example TCB and stack is dynamically allocated on the heap.

Context switching

An eCore sequentially executes instruction by instruction of one task at the time. To run a second task (aka. share the CPU), we need to store "program execution context" somewhere in memory and "restore" the second task. As far as I can tell, the "context" in literature is not precisely specified, but usually, we talk about: registers, memory maps, various tables etc.

Huh, registers?! That sounds architecture-specific, right?

Yes, you’re correct. Our context must include:

  • program counter (PC) – where did we stop executing?
  • 64 eCore registers – the actual context
  • status register – register reflects the state of the processor and must be saved on entering an interrupt service routine (ISR).

Scheduling

The scheduler is the part of the kernel responsible for deciding which task should be executed at any particular time. The kernel can suspend and later resume a task many times during the task lifetime (source). From the end-user perspective, it appears that each task is running concurrently, but in fact, the multitasking OS rapidly switches the tasks making that impression, see the illustration below:

The scheduling policy is an algorithm used by the scheduler to decide which task to execute at any point in time. The criteria we should consider for selecting the right algorithm:

  • fairness – each task should get a fair share of CPU time
  • efficiency – maximum CPU utilization is desirable
  • response time – the lower, the better
  • throughput

RTOS system constraints on time, this is the processing must be done within a fixed time slot, or the system fails.

Implementation-wise scheduler decision is influenced by a few factors, such as task state. For example, if a task has the highest priority, but it’s waiting for a semaphore, then there is no reason to schedule it. This being said, each task has one of the following states:

  • running – aka currently utilizing the processor
  • ready – a task that is ready for execution (it’s not blocked or suspended) but not currently running because a different task of equal or higher priority occupies the CPU
  • blocked – waiting for an event (i.e., queue, semaphore)
  • suspended – it’s like blocked state but doesn’t time out. The special API must be called to put a task in this state vTaskSuspend()

System tick

When sleeping, an RTOS task will specify a time after which it requires ‘waking’. When blocking, an RTOS task can specify a maximum time it wishes to wait.
The FreeRTOS real-time kernel measures time using a tick count variable. A timer interrupt (the RTOS tick interrupt) increments the tick count with strict temporal accuracy – allowing the real-time kernel to measure time to a resolution of the chosen timer interrupt frequency. (source)

Simply, each time the tick is incremented the kernel (scheduler) must check if there is another task to unblock or wake. The "tick" is triggered by a hardware timer, so we have to set it up in the portable code.

Other features

Depending on the architecture, other features could be used by the RTOS kernel. For example, many CPUs come with a memory protection unit (MPU) that prevents us from overstepping the memory (i.e. overflowing stack).

Epiphany chip gives us such option and we can set protection over a local memory pages. The below example sets protection on first 4 pages and sets them as read-only (0x0000 – 0x4FFF)

unsigned memprotectregister = e_reg_read(E_REG_MEMPROTECT);
memprotectregister &= 0xffffff00; // clear last 8 bits
memprotectregister |= 0x0f;

e_reg_write(E_REG_MEMPROTECT, memprotectregister);

Porting FreeRTOS to Epiphany architecture

It’s time to focus on the implementation! Everything here tightly relates to the theory we covered above but also touches the knowledge from previous articles (ie. ISA). Feel free to go back if things are unclear.

Step 1: Stack frame

FreeRTOS doesn’t have a "special" logic to handle newly created task, but rather it treats a new task as it’s has been context-switched already. What it does is call restore context routine.
The side effect is that we need to initialize a stack frame separately and keep it consistent with save/restore context routines.

pxPortInitialiseStack defines the initial stack frame. The function is long (source) so lets focus on important bits:

  • the stack has to be double-word aligned (see Epiphany Architecture Reference)
  • the space for each register is initialized (64 registers, PC, status). Note that on Epiphany R13 register is used as a stack pointer, so we need to "skip it".
  • pvParameters needs to be placed on the R0

Step 2: Save / Restore routines

On the effective context switch, a task is going to be "saved" and "restored". Because SAVE routine is the exact opposite to RESTORE, we can discuss only one:

portRESTORE_CONTEXT:
  mov  fp,%low(_pxCurrentTCB)
  movt fp,%high(_pxCurrentTCB)

At first we point the frame pointer to the TCB structure.

  ldr  fp, [fp]
  ldr  sp, [fp]

Then we load data from memory via displacement mode LDR instruction. At this point, you might ask what’s the purpose of frame pointer if it’s equal to stack pointer?
Well, the stack pointer does change as we push and pop on the stack, so it’s hard to know where the stack starts as we go. The frame pointer doesn’t change and is equal to the stack pointer; this way, we know where the stack begins and can calculate the current distance.

  ldr r0, [sp], #1
  ldr r1, [sp], #1

  movts iret, r1
  movts status, r0

Next, we pop and load PC and STATUS values from the stack.

  ldr  r0, [sp], #1
  ...
  ldr  r63, [sp], #1

  rti

What’s left is to restore values of remaining 64 (or rather 63) registers.

Step 3: Setup system tick

Epiphany III comes with two event timers. We use one to set up periodic system tick:

#define CyclesForTick (configCPU_CLOCK_HZ/configTICK_RATE_HZ - 50)

static void prvSetupTimerInterrupt( void ) {
    unsigned int cyclesForTick = CyclesForTick;
    int delta = E_CTIMER_MAX - e_ctimer_get(E_CTIMER_1) - cyclesForTick; 
    e_ctimer_set(E_CTIMER_1, E_CTIMER_MAX);

    if(delta < 0)
        delta=0;

    e_ctimer_set(E_CTIMER_0, cyclesForTick-delta);
    e_ctimer_start(E_CTIMER_0, E_CTIMER_CLK);
}

Because the kernel is written in C we use elibs API to avoid clutter code.

Step 4: Interrupts

There are two ways context switch can happen:

  1. systick timer fires and scheduler finds another task to run
  2. task voluntarily yields (vPortYield)

For the first case:

prvSetupTimerInterrupt();
if( xTaskIncrementTick() != pdFALSE ) {
    vTaskSwitchContext();
}

For the second case we just call vTaskSwitchContext:

void vPortYield( void ) {
    portSAVE_CONTEXT();
    vTaskSwitchContext();
    portRESTORE_CONTEXT();
}

Step 5: Interrupt handler

In step 3 the E_CTIMER_0 was configured to trigger systick. In step 4 we defined the handler logic, but where is the code that calls the handler when timer triggers?
This is the flow:

(timer triggers) -> IVT is checked for registered handler -> portSAVECONTEXT -> call to C handler function

Well it's a bit unusual flow because portSAVECONTEXT plays also a role of interrupt handler, but works. The missing IVT <-> handler is defined as a macro:

.macro ISR name, number
_\name:
  .global _\name
  str r0, [sp, #-1]
  str r1, [sp, #-2]
  mov  r0,%low(_interrupt_mode)
  movt r0,%high(_interrupt_mode)
  mov r1, \number
  str r1, [r0]
  ldr r1, [sp, -#2]
  ldr r0, [sp, -#1]
  b _common_interrupt_handler
.endm

ISR timer0_handler,   3
...

Step 6: Start the scheduler

It's all about setting up the environment and starting the first task!

BaseType_t xPortStartScheduler( void ) {
    prvSetupTimerInterrupt();

    /* Start the first task. */
    portRESTORE_CONTEXT();

    /* Should not get here. */
    return pdTRUE;
}

Step7: portmacro.h

portmacro.h defines numerous platform specific macros or variables. For example:

  • data types
  • enable / disable interrupts
  • stack growth direction
  • byte alignment (double word for Epiphany)

Application configuration

FreeRTOS is full of pre-processor conditionals that control what function gets compiled into the application. While it doesn't look very appealing, this method ensures a small binary size, which is critical for embedded devices.

In the FreeRTOSConfig.h we define a few variables that configure the kernel behavior, for instance:

  • configUSE_PREEMPTION - whether to use task preemption or not
  • configCPU_CLOCK_HZ - in our case set to 700000000 (700 Mhz taken from the spec).
  • configTICK_RATE_HZ - system tick rate
  • configTOTAL_HEAP_SIZE - heap size, important for dynamic allocation!

Summary

Wasn't it that too difficult, right? It's not when you read about it! I spent many hours getting around the stack frame initialization with gdb... But I admit FreeRTOS is not that scary as I thought it would be.

// Bonus: There is a small modification in linker script "internal.ldf". If you feel like hacking today try to understand why this is needed 🙂

See other posts!