We are excited to announce that we now have a public github repository with LegUp example code to give our users a better idea of real-life example applications using LegUp:https://github.com/LegUpComputing/legup-examplesThe main goal of the repository is to provide more design examples using high-level synthesis to help users understand the coding style needed to achieve good performance. These examples are all released using a BSD license so the code can be used for academic or commercial applications. We will add more examples to this repository over the coming months.We are also excited to announce that LegUp 6.3 has been released! You can download LegUp here.
New features and bug fixes for this release:
Allow LLVM intermediate representation as input to allow tools such as LeFlow
Various bug fixes
LegUp 6.3 comes with a 30-day free trial period so that you can try out the tool. Please note that during the trial period, you may only use LegUp for evaluation purposes, and the generated hardware cannot be used in a commercial product. To purchase a full license, please contact us at firstname.lastname@example.org.
We are excited to announce that LegUp 6.2 has been released! You can download LegUp here.We have fixed reported bugs from the LegUp 6.1 release from August, thanks for your feedback! We have also continued to add new features to this release of LegUp.
New features and bug fixes for this release:
Support for Intel Arria 10 hard floating point operations.
SW/HW Co-simulation now works with floating point operations.
Preliminary support for an AXI slave interface used to control a LegUp accelerator.
Support for integrating a user-defined Verilog module into a LegUp design
Improved support for C++ classes and structs.
Improved memory partitioning.
LegUp 6.2 comes with a 30-day free trial period so that you can try out the tool. Please note that during the trial period, you may only use LegUp for evaluation purposes, and the generated hardware cannot be used in a commercial product. To purchase a full license, please contact us at email@example.com.
In this post we will give an example of integrating an existing hardware core into a LegUp software project. This can be useful if you already have Verilog for an IP block which you want to call from your C software code. The example we will use is the Canny edge detector, and we will replace the Sobel filter module with a core written in Verilog.
We are excited to announce that LegUp 6.1 has been released! You can download LegUp here.Since the previous release of LegUp 5.1 last year, we have seen many users creating interesting projects using LegUp. We have also received lots of valuable feedback. Based on your feedback, we have added a number of new features to further improve the HLS design process with LegUp. We have also enhanced our tool’s reliability through bug fixes.Prior to the commercial releases, LegUp has had 4 major releases for academic research and has been used by thousands of researchers around the world, making it the de-facto standard in high-level synthesis (HLS) research. In 2014, LegUp won the Community Award at the International Conference on Field Programmable Logic (FPL) for contributions to HLS research. LegUp has also been shown to produce state-of-the-art hardware.We have brought all of the best features from our previous releases, and made LegUp even better by adding new features, as well as improving the quality of the generated hardware.
Here are just some of the highlights of what we have added for this release:
SW/HW Co-simulation: uses your C-based software test bench to automatically verify the LegUp-generated RTL.
A C++ FIFO template class to allow more flexible definition of FIFO data types.
New FPGA device support for Intel Arria 10, Microsemi PolarFire, and Xilinx Virtex UltraScale+, in addition to the existing device support for Intel, Xilinx, Lattice, Microsemi, and Achronix FPGAs.
Improved control-flow optimization.
Improved memory partitioning.
LegUp 6.1 comes with a 30-day free trial period so that you can try out the tool. Please note that during the trial period, you may only use LegUp for evaluation purposes, and the generated hardware cannot be used in a commercial product. To purchase a full license, please contact us at firstname.lastname@example.org.
We were invited to publish our AWS F1 Memcached acceleration work at the 2018 International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART). Our paper describes in detail the Memcached accelerator on AWS F1 as well as many of the interesting things (good and bad) that we learned about the F1 infrastructure along the way.
J. Choi, R. Lian, Z. Li, A. Canis, J. Anderson, “Accelerating Memcached on AWS Cloud FPGAs”, HEART 2018 (PDF)
We are excited to release a live demo of our Memcached server accelerator on AWS F1. Click here to try out the live demo!
In the demo, we will automatically spin up two AWS EC2 instances for you (for free) so that you can easily try out our FPGA Memcached server accelerator. An F1 instance is programmed as the Memcached server and an M4 CPU instance is used as the client to run the memtier_benchmark. Note that starting the two instances can take a few minutes, so please be patient if the demo page takes some time to load.
As shown below, the demo page shows two terminal windows, one for the client and another for the server. In the client window, you will be able to choose the number of connections and the number of requests to the Memcached server to start the benchmark. While a test is running, the server window shows the packets-per-second (PPS) received and sent by the F1 instance. As described in our previous blog post, 700K is the maximum PPS on F1. When the test finishes, the client windows shows the measured throughput and latency to the Memcached server. For the demo, we use 100-byte data values, 1:1 set/get ratio, and Memcached pipelining of 16.
We attended RedisConf last week to learn more about Redis and meet users who are using Redis/Memcached. Many people were interested in our 11M+ Memcached ops/sec result achieved with a single AWS EC2 F1 instance, but the 1-page summary document was not comprehensive enough to answer all the questions. Here we explain in more detail about the hardware architecture and how we set up the experiment.
We had described the architecture in a previous blog post, but we briefly review it here. As shown in the figure below, the entire TCP/IP (and UDP) network stack, as well as the Memcached server logic, is implemented in FPGA hardware. We deeply pipeline the FPGA hardware (if this phrase isn’t clear, please see “What is FPGA Pipelining?” at the end of this post) to process many network packets and Memcached requests in flight. This cannot be done on a CPU, hence we are able to achieve a big speedup vs. CPU Memcached servers. On the F1, the FPGA is not directly connected to the network, so we use the CPU to transfer incoming network packets directly from the NIC to the FPGA and also outgoing network packets from the FPGA back to the NIC. Note that the Memcached accelerator is a prototype and currently only supports get and set commands.
We are pleased to present the world’s fastest cloud-hosted Memcached on AWS using EC2 F1 (FPGA) instances. With a single F1 instance, LegUp’s Memcached server prototype achieves over 11M ops/sec, a 9X improvement over ElastiCache, at <300 μs latency. It offers 10X better throughput/$ and up to 9X lower latency compared to ElastiCache. Please refer to our 1-page handout for more details.
On our last blog post, we wrote about using LegUp to perform networking processing on AWS cloud FPGAs (F1). In this post, we describe accelerating Memcached on AWS F1.
Memcached is a high-performance distributed in-memory key-value store, widely deployed by companies such as Flickr, Wikipedia and WordPress. Memcached is typically used to speed up dynamic web applications by caching chunks of data (strings, objects) in RAM, which alleviates the load on the back-end database.
The figure below shows a typically deployment where Memcached is used as a caching layer to provide fast accesses to data for front-end web servers. When the data is found on a Memcached server, trips to disk storage are avoided (i.e. disk stores attached to the back-end databases). Memcached is also used by Facebook, Twitter, Reddit, and Youtube.
As network bandwidths continue to increase from 10 Gbps to 25Gbps and beyond, cloud users need to be able to process high-bandwidth network traffic intelligently, to perform real-time computation or to gain insight into network traffic for detecting security or service issues.
We live in an exciting time, with FPGA cloud instances now available from Amazon and Alibaba. Traditionally, FPGAs (field programmable gate arrays) have been used in network switches or for high-frequency financial trading because of their superior ability to perform high-bandwidth network processing. We believe many cloud applications that require high-speed network and data stream processing can achieve 10X better latency and throughput on a cloud FPGA compared to a standard commodity server.
LegUp Cloud FPGA Platform
We have developed a cloud FPGA platform that makes it easier for a software developer trying to program high-speed data processing on a cloud FPGA. Behind the scenes, the LegUp platform hides all the low-level details of getting network traffic to/from the FPGA and handling the network layer. We currently support AWS F1 FPGA instances.