In this post we’re going to show you a video filtering demonstration on an Amazon cloud FPGA using LegUp. The specific video filter we will showcase is Canny edge detection as we described in a previous post. This same approach could be used to implement any other image filter that uses convolution (blur, sharpen, emboss, etc.).
We’re from Toronto and fans of the Blue Jays, so we’ll use a slow motion video of Josh Donaldson hitting a home run to demonstrate our filter:
A paper describing the use of LegUp HLS to synthesize hardware cores for floating-point computations from C-language software will appear in the 2018 ACM/IEEE Design Automation and Test in Europe (DATE) conference, to be held at Dresden, Germany, in March 2018. The floating-point cores are fully IEEE 754 compliant, yet through software changes alone, can be tailored to application needs, for example, by reducing precision or eliminating exception checking, saving area and raising performance in non-compliant variants. The IEEE-compliant cores synthesized by LegUp HLS compare favourably to custom RTL cores from FloPoCo and Altera/Intel, despite the LegUp-generated cores being synthesized entirely from software. An advance copy of the paper, jointly authored by the University of Toronto and Altera/Intel, is available: PDF.
In this post we will explain how to implement a Canny edge detector on an FPGA. We will describe the entire algorithm in C code using LegUp, allowing us to avoid the difficulty of implementing this design in Verilog or VHDL.
First, watch this quick video of the the finished edge detector, running on an Altera DE1-SoC board, which costs $249. We also have this demo working on a Lattice HDR-60 board, which costs $425 and includes a built-in 720p camera.
The first thing you’ll notice is the output is black and white, with each pixel representing whether there is an edge at that particular region of the image. The brightness of the pixel represents how distinct the edge is at that location.
The example below shows Canny edge detection performed on the Lenna test image:
You may be asking, why do edge detection on an FPGA instead of a standard microprocessor? The main reason is that Canny edge detection requires significant computation. Typically this is not a problem when working with static images, but for a video application the processor must keep up with the incoming video frame rate, otherwise we would see a choppy output video. Instead, we want the video output to be updated in real-time, which means there is no delay between moving the camera and updating the screen. On an FPGA, we can exploit the parallelism inherent in the Canny edge detection algorithm and stream the incoming video pixels through a series of specialized hardware pipelines that perform the Canny edge algorithm in parallel stages.
The CL_DRAM_DMA example demonstrates lots of the Shell/CL interfaces and functionality. This blog post will walk through the custom logic (CL) portion of the example. You may have found that this example has more than 6000 lines of SystemVerilog code but with very little comments. To help you quickly understand the example from a high level, we created some block diagrams to overview the CL’s hierarchy, interface, connectivity, and functionality. We will also dive into some major modules and go through the implementations.
The “Hello World” example exercises the OCL Shell-to-CL AXI-Lite interface, the Virtual LED outputs and the Virtual DIP switch inputs. This blog post will walk through the custom logic RTL (Verilog), explain the AXI-lite slave logic, and highlight the PCIe APIs that can be used for accessing the registers behind the AXI-lite interface.
Amazon has recently announced the availability of their FPGA cloud, Amazon EC2 F1. We think that this is very exciting news, as it is the first time that FPGAs in the cloud are being available to the general public on a massive scale. This is the first step towards making FPGAs easier to use, as with the EC2 F1, a user no longer has to buy an FPGA and install it on site. FPGAs are typically much more expensive and cumbersome to buy than CPUs or GPUs, hence making them available in the cloud so that one can use them from anywhere around the world makes FPGAs much more accessible.
When the F1 instances first became available, we were excited to try them out, but we found that they were pretty difficult to use at first. Documentation was lacking (and incorrect in some case), and although Amazon provides a few examples with their AWS EC2 FPGA Hardware and Software Development Kit, the instructions to run the examples are scattered in different places and missing some steps. Understandably, they only released this service publicly in April 2017 and documentation may not have been their highest priority. To this end, we decided to write a unified guide which provides step-by-step instructions on how to run the two main examples provided by Amazon, cl_hello_world and cl_dram_dma, on the Amazon EC2 F1. This guide includes the instructions included in their AWS EC2 FPGA Hardware and Software Development Kit as well as information that we have written by trying out the examples ourselves. At the time of writing, we could not find such a step-by-step guide and we ran into issues here and there so we think that this guide will allow one to easily try out the F1 instances without getting stuck in some setup issue. So let’s dive right into it!
A paper describing the synthesis of a deep convolutional neural network (CNN) inference accelerator from C software with LegUp HLS will appear at the 2017 IEEE International System-on-Chip Conference (SOCC), at Munich, Germany, in September 2017. The work showcases the use of LegUp’s unique Pthreads flow to convert parallel software threads into spatial hardware parallelism. The accelerator incorporates a novel approach for zero-weight skipping, leveraging the ability to prune CNN weights with little inference accuracy loss.
J. H. Kim, B. Grady, R. Lian, J. Brothers, J.H. Anderson, “FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software,” IEEE SOCC, 2017 (PDF).
We are pleased to announce that LegUp 5.1 has been released!
This release is a culmination of more than 25 man-years of research and development. Prior to this release, LegUp has had 4 major releases for academic research. During these years, LegUp has been used by thousands of researchers around the world, making it the de-facto standard in high-level synthesis (HLS) research. In 2014, LegUp won the Community Award at the International Conference on Field Programmable Logic (FPL) for contributions to HLS research. LegUp has also been shown to produce state-of-the-art hardware.
We have brought all of the best features from our previous releases, and made it even better by adding new features, as well as improving the quality of the generated hardware.
Here are just some of the highlights of what we have added for this release.
LegUp IDE which provides a complete development environment with a debugger and a profiler.
Support for Xilinx, Lattice, Microsemi, and Achronix FPGAs (LegUp previously only supported Altera FPGAs).
Windows OS support (LegUp previously only supported Linux OS).
Improved memory architecture.
Improved user messages.
LegUp 5.1 comes with a 30-day free trial period so that you can freely try out the tool. Please note that during the trial period, you may only use LegUp for evaluation purposes only, and the generated hardware cannot be used in a product. To purchase a full license, please contact us at email@example.com.