Intro
I have been thinking about porting one of the modern programming languages on Epiphany architecture. At first, I thought Rust is the new cool kid in town, so maybe I’ll give it a whirl. But it would require me to implement the LLVM backend… While it’s a noble goal to write a new backend (or modernize existing one), it’s a relatively complex and time-consuming task.
What if I could avoid rewriting compiler backend and leverage already existing GCC compiler?
The Nim language compiles to C and other languages. Taken its modern syntax and low-cost of integration, I decided to give it a go.
In this article, we’ll write a Nim program and execute it on the Epiphany chip.
About Nim
Likely you have never heard about Nim, or did you?
If not one of my co-workers at the time, I wouldn’t know about Nim at all. Unlike mainstream languages, ie. Go, Rust, C/C++, Java… Nim doesn’t seem to be strongly supported by industry. Aka, no big player, is sponsoring it.
Nevertheless, I am positively surprised by Nim’s pragmatic approach, such as:
- syntax – the language is just lovely. It resembles python but draws features from other languages and seamlessly packs them into one powerful and readable syntax.
- philosophy – mainstream languages such as C++, Rust, Python, or Go come with rigid philosophies on how to write the code. Here I don’t see anything as such, but rather an attempt to take what’s right and embed it to the language
- portability – Nim compiler emits C, C++, Objective-C or Javascript. It leverages the other compilers to compile the code for different platforms.
- interoperability (i.e., FFI) – one of the nicest I have seen so far
- performance – authors claim high performance… Taken it compiles to C, I bet it’s fast, but I would take the benchmarks with a pinch of salt. The method of "shim-meta-programming" brings certain limitations.
- features – async/await, defer, templates, generics, or macros? Yup it’s all here
- low footprint – you can reach minimal binary size by disabling certain features, i.e., the garbage collector
You might want to read more here:
Nim compiler
Since the Epiphany chip comes with fully functional GCC, the integration with the Nim compiler boils down to updating a few files only.
compiler/platform.nim
@@ -189,7 +189,7 @@ type
- cpuSparc64, cpuMips64, cpuMips64el, cpuRiscV64, cpuWasm32
+ cpuSparc64, cpuMips64, cpuMips64el, cpuRiscV64, cpuWasm32, cpuEpiphany
@@ -223,7 +223,8 @@ const
- (name: "wasm32", intSize: 32, endian: littleEndian, floatSize: 64, bit: 32)]
+ (name: "epiphany", intSize: 32, endian: littleEndian, floatSize: 64, bit: 32)]
lib/system/platforms.nim
@@ -35,6 +35,7 @@ type
+ epiphany, ## Adapteva epiphany coprocessor
@@ -93,5 +94,6 @@ const
+ elif defined(epiphany): CpuPlatform.epiphany
config/nim.cfg
+epiphany.standalone.gcc.exe = "epiphany-elf-gcc"
+epiphany.standalone.gcc.linkerexe = "epiphany-elf-gcc"
Yup, that’s pretty much it. We just had to let the Nim compiler know about the new architecture.
Application
I chose eprime as the example application that we are going to rewrite in Nim. In particular, we’ll focus on the device-side program and keep using the C host app. Also, we’ll reference a few C functions provided by e-libs
(instead of rewriting the whole thing)
You can find the C sources here:
https://github.com/adapteva/epiphany-examples/tree/master/apps/eprime
And the Nim version:
https://gist.github.com/mkaczanowski/c03ed98313ce118586d17208e883a9ee
The program is quite simple. It finds the prime numbers in parallel on all available cores. Nothing too fancy, but better than a simple hello world.
Compiling the program
The host side stays the same, but we need to compile the device program with Nim compiler:
$ nim c \
--cpu:epiphany \
--gc:none \
--opt:size \
--deadCodeElim:on \
-d:release \
--os:standalone \
--out:e_prime.elf \
--passL:"-T internal.ldf -le-lib -lm" \
e_prime.nim
Okay, let’s get through all those options:
cpu
– self-explanatory, use the Epiphany architectureopt
,deadCodeElim
,release
– optimize the output program size (remember we only have 32 kb of memory)os
– don’t include any OS specific code (ie. libc)passL
– pass some flags to C linker.
With gc:none
, we disabled the garbage collector. Nim comes with a few selectable garbage collectors we could use, but it does add a lot to the binary size, and it won’t fit into the memory.
One way to get around that is to tweak the linker script and hold the whole program in DRAM (see legacy.ldf
) but that also doesn’t come for free, and we’ll see an overall slowdown.
e_prime.nim
This is the Nim version of C program, see sources:
https://github.com/adapteva/epiphany-examples/blob/master/apps/eprime/src/e_prime.c
host <-> device communication
There are a few ways to communicate between the Zynq CPU and Epiphany chip, and we covered them in previous posts. The most common one is to access eCore local memory bank, such as in here:
var
count = cast[ptr uint32](0x7000)
num = cast[ptr uint32](0x7008)
primes = cast[ptr uint32](0x7010)
max_tests = cast[ptr uint32](0x7020)
The above snippet is equivalent to the C pointer declaration with a given address. To write to pointer location, we need to dereference a pointer:
primes[] = 0
interfacing with e-libs
To find a core_id
w could read directly from the memory-mapped register or use the e-libs
shorthand function. The latter sounds better, so let’s see how to call a C function from Nim:
proc e_get_coreid(): cint {.header: "e_lib.h"} # import C function
let
core_id = int(e_get_coreid()) # call C function
row = (coreid shr 6) and 0x3f
col = coreid and 0x3f
Neat, isn’t it? I am impressed with how seamless the Nim’s FFI is.
is_prime function
With python-alike syntax the C function translates to this:
proc is_prime(number: uint32): bool =
for i in 2..number div 2:
if number mod i == 0:
return false
return true
panicoverride.nim
Since we don’t use any particular OS (see os:standalone
) we need to let the Nim know how to handle the critical errors:
proc exit(code: int) {.importc, header: "", cdecl.}
{.push stack_trace: off, profiler:off.}
proc panic(s: string) =
exit(1)
{.pop.}
The compiler will ask you for that file, so place it next to the e_prime.nim
Summary
Once we compiled and run our application, this is what we should see:
Core (00,00) Tests: 22854 Primes: 3662 Current: 731603
...
Core (03,03) Tests: 22863 Primes: 3643 Current: 731921
Total tests: 364017 Found primes: 58600
Iterations/sec: 2660.000000
I am quite amazed to see the Nim program working on Epiphany with such little work done. If I were to port Go, Rust, or Python on the chip, I would have to invest much more time than I did today.
And you, how do you feel about Nim?
See other posts!
- # Parallella (part 1): Case study
- # Parallella (part 10): Power efficiency
- # Parallella (part 11): malloc
- # Parallella (part 12): Tensorflow?
- # Parallella (part 13): Closing notes
- # Parallella (part 2): Hardware
- # Parallella (part 3): Kernel
- # Parallella (part 4): ISA
- # Parallella (part 5): elibs
- # Parallella (part 6): FreeRTOS