Overview
Parallella board comes in three flavors:
- Microserver (P1600) with Zynq 7010
- Desktop (P1601) with Zynq 7010
- Embedded (P1602) with Zynq 7020
The boards differ mostly in peripherals available onboard (such as HDMI, GPIO, or eLink). In this miniseries, we’ll use "Microserver" (P1600) version with Zynq 7010 and Epiphany E16G301 chip version.
While the board looks similar to other SBCs like Raspberry Pi, etc., we should highlight two parts:
- Zynq 7010 (datasheet) – Dual-core ARM CPU with FPGA programmable logic
- Epiphany III (datasheet) – 16 core, power-efficient RISC coprocessor we want to play with
High level architecture
Zynq 7010 is split into two parts:
- Processing System (PS) – Dual-Core ARM A9 Processor
- Programmable Logic (PL) – FPGA fabric
Parallella uses the FPGA fabric (PL) as a shim-interconnection layer between ARM CPU and Epiphany chip. While there are a few other ways you could do that (like mini-PCIE on Coralboard) the Adapteva decided to design a high-speed and low latency chip-to-chip interface synthesized on the PL.
So, is having an FPGA advantage or not? Well, I certainly believe it is because it allows you to do so much more than you’d be able to do on a hard-build board. For instance, you could add Crypto Engine IP to offload the hash function calculation onto the fabric (just an idea).
Slightly more about FPGA
Okay, so we know the basics, but what’s exactly is an FPGA? (source)
Field Programmable Gate Arrays (FPGAs) are semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be reprogrammed to desired application or functionality requirements after manufacturing.
As you see, FPGA allows us to change the hardware without a single use of soldering iron, and that’s great, but how exactly does it work? In particular, how is the Epiphany to ARM interconnection implemented? To answer that question, we should look into the sources.
Vivado & Verilog & FPGA
Verilog is a hardware description language used to model electronic systems and to program FPGA on the parallella board. It’s not only the language being very different from what you know but also the design process is more complex and consists of the following steps:
- Design entry
- Synthesis
- Implementation
- Device programming
Vivado is a comprehensive IDE by Xilinx that makes the flow much easier to grasp. Thankfully parallella hw sources are Vivado compatible, so let’s take a look at them.
# Step 1: Download and install Vivado (~25G)
https://www.xilinx.com/support/download.html
# Step 2: Fetch the Verilog sources
git clone https://github.com/aolofsson/oh.git
cd oh/src/parallella/fpga
# Step 3: Update vivado version (this is example patch that worked for me)
wget "https://gist.githubusercontent.com/mkaczanowski/908fed6fc00818d2715d2b11d71a8f83/raw/a811279614a39a462263afee61ba543e61bee58f/vivado.diff" -O /tmp/vivado.diff
patch -p1 < /tmp/vivado.diff
# Step 4: Build the vivado project (assuming the vivado bin is in your $PATH)
make all
# Step 5: Open the project file with GUI
vivado headless_e16_z7010/system.xpr
Once you get the Vivado project up and running, you should take a look at the block diagram (under IP Integrator menu). This is what you should see:
If you look at the overview graph and the above block design, you'll notice a few similarities such as AXI BUS, GPIO Pins, Zynq ARM (PS), but some parts like eLink are missing. In fact, those are hidden under the paralllella_base_0
block, so let's list all the crucial components.
AXI BUS
AXI provides standardized communication between IPs (PS <--> PL). Quoting the manual:
The AXI specification provides a framework that defines protocols for moving data
between IP using a defined signaling standard. This standard ensures that IP can exchange
data with each other and that data can be moved across a system.
It's a multi-master / multi-slave communication interface where the master initiates a transaction, and the slave responds to it. The transactions involve the concept of a target address within a system memory space and data to be transferred. Memory-mapped systems often provide a more homogeneous way to view the system because the IPs operate around a defined memory map.
In the address editor, the offset for parallella is defined as 0x80000000 (1G), we'll need that address later on.
AXI INTERCONNECT
The AXI Interconnect core IP connects one or more AXI memory-mapped master devices to one or more memory-mapped slave devices. The eLink is master-slate IP, so we can see two passthrough AXI interconnections, one slave and one master.
PARALLELLA BASE: EDMA
Shamelessly copied from Wikipedia:
Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory (random-access memory), independent of the central processing unit (CPU). Without DMA, when the CPU is using programmed input/output, it is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work.
This is to accommodate Epiphany DMA controller capabilities on the chip.
PARALLELLA BASE: EMESH
The Epiphany architecture defines a multicore, scalable, shared-memory computing fabric. It consists of a 2D array of mesh compute nodes connected by a low-latency mesh network-on-chip (NOC).
The EMESH interface "translates" the AXI to the Epiphany NOC world (see packet2emesh.v or emesh2packet.v)
PARALLELLA BASE: ELINK
The eMesh network and memory architecture extends off-chip using source synchronous dual data rate LVDS links ("elinks"). Each Epiphany chip has 4 independent off-chip elinks, one in each physical direction (north, east, west, and south). The off-chip links allow for the glueless connection of multiple chips on a board and interfacing with an FPGA.
ELINK: MMU
A memory management unit (MMU) is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses.
The Epiphany chip has only a concept of physical memory; no MMU involved here. However, ARM CPU runs Linux that operates on the virtual memory, which requires MMU.
A small portion of DRAM (32 Mb) is shared between the Epiphany chip and the Linux system. The eMMU IP allows the chip to transparently access shared memory when the request "leaves" the eMesh via East, West, Nort, or South eLink interface.
ELINK: MAILBOX
A mailbox is a mechanism to exchange messages between processes. Data can be sent to a mailbox by one process and retrieved by another. A mailbox often is used as a FIFO.
On the parallella board, it's a FIFO queue where ARM CPU (supported by epiphany driver) is the reader, and the chip is a writer. This is a gist on how to read from a mailbox on the kernel side:
msg.from = reg_read(elink->regs, ELINK_MAILBOXLO);
msg.data = reg_read(elink->regs, ELINK_MAILBOXHI);
Epiphany chip
The Epiphany is a scalable multicore architecture with up to 4,095 processors sharing a common 32-bit memory space. The Epiphany combines fully-featured floating-point C/C++ programmable RISC processors, a high bandwidth distributed memory system, a low latency Network-On-Chip, and low overhead off-chip IO to bring an unprecedented level of processing to power-constrained systems.
Features summary:
- 16 high performance RISC CPU cores
- C/C++ and OpenCL programmable
- 32-bit IEEE floating point support
- 512KB on-chip distributed shared memory
- 32 independent DMA channels
- Up to 1GHz operating frequency
- 32 GFLOPS peak performance
- 512 GB/s local memory bandwidth
- 64 GB/s Network-On-Chip bisection bandwidth
- 8 GB/s off-chip bandwidth
- 1.5ns network per-hop latency
- <2 Watt maximum chip power consumption
FPGA Interfacing
The chip can be directly interfaced to an FPGA or ASIC by instantiating the eLink interface. The eLink block converts the high-speed serial link I/O interface to a lower speed parallel interface. To the system, the eLink interface looks like a simple memory-mapped interface.
The block diagram presented below hides the eLink (under parallella_base
block), so the close-up view might look like that:
Possible configurations
Since Epiphany chips can be integrated into a bigger mesh, there are a few ways we can design our system. A parallella board is a simple development board with a single chip, eLink and small-FPGA SOC. However, it's worthwhile to mention a few other design options you might want to consider:
Extensions
To connect Epiphany chips on separate boards we can use Porcupine board with a pair of ribbon cable.
Summary
In this chapter, we briefed through the hardware design. For some, it might be a lot to take in, especially if you have never touched FPGA, etc., but here is what I think you should know before you head to the next section:
- what's the FPGA?
- what role FPGA plays in the parallella board design?
- what components (or IPs) are "placed" on the FPGA fabric?
- what is eLink?
- what is Epiphany chip?
See other posts!
- # Parallella (part 1): Case study
- # Parallella (part 10): Power efficiency
- # Parallella (part 11): malloc
- # Parallella (part 12): Tensorflow?
- # Parallella (part 13): Closing notes
- # Parallella (part 2): Hardware
- # Parallella (part 3): Kernel
- # Parallella (part 4): ISA
- # Parallella (part 5): elibs
- # Parallella (part 6): FreeRTOS