Amazon EC2 F1 Tutorial: Understanding the CL_DRAM_DMA example


The CL_DRAM_DMA example demonstrates lots of the Shell/CL interfaces and functionality.  This blog post will walk through the custom logic (CL) portion of the example.  You may have found that this example has more than 6000 lines of SystemVerilog code but with very little comments.  To help you quickly understand the example from a high level, we created some block diagrams to overview the CL’s hierarchy, interface, connectivity, and functionality.  We will also dive into some major modules and go through the implementations.

The block diagram legends are shown in the above figure.
An arrow represents the connection from master to slave. There are four major types of signal interfaces in the example. The interfaces are defined as the SystemVerilog Interface constructs in, namely axi_bus_t (used for AXI-4 and AXI-lite signals), cfg_bus_t, and scrb_bus_t. The signals surrounded by grey hexagons are the input or output ports of the module. Black boxes represent RTL modules and the dash-lined circles represent the logic within current module.

Now let’s start with the top level block diagram of the CL_DRAM_DMA. You might mouse over the image to zoom-in or click to see an enlarged image.

Figure 1: Top-level block diagram of CL_DRAM_DMA

The left side of the diagram shows five major incoming interfaces from Shell to CL:

  1. The sda_cl_bus AXI-lite interface accesses a 1KiB RAM inside the CL_SDA_SLV module;
  2. The sh_cl_dma_pcis_bus AXI-4 interface is for the access to the four DDR DRAMs;
  3. The lower four bits of the sh_cl_ctl0 input port drive the enable ports of the four ddr*_scrb_bus;
  4. The sh_ocl_bus AXI-lite interface talks to the CL_OCL_SLV module which then accordingly controls six test config buses (*_tst_cfg_bus), that are used for the four DDRs, the PCIM master (CL_PCIM_MSTR), and the interrupt generator/checker (CL_INT_SLV);
  5. Lastly, the interrupt request acknowledge input (sh_cl_apppf_irq_ack) goes into the interrupt generator/checker (CL_INT_SLV) in response to the interrupt request output (cl_sh_apppf_irq_req).

Here are the five custom logic modules instantiated by this top-level RTL,

  1. The CL_DMA_PCIS_SLV module (we will dive in later) takes in three sets of inputs, the sh_cl_dma_pcis_bus AXI-4 interface, the ddr*_scrb_bus.enable, and the ddr*_tst_cfg_bus; and outputs four sets of AXI-4 buses (lcl_cl_sh_ddr(a/b/d)cl_sh_ddr_bus) to interface with the four DDR DRAMs. The cl_sh_ddr_bus goes out from CL to Shell in order to access DDRC that resides in the Shell. The other three buses, lcl_cl_sh_ddr(a/b/d), are combined into a 2-dimensional bus (lcl_cl_sh_ddr_2d) feeding into the SH_DDR module that instantiates the three DRAM interfaces in the CL (A, B, D).  The CL_DMA_PCIS_SLV module also outputs the memory scrubbing status (ddr*_scrb_bus.addr/state/done) for debugging purpose.  Another output of the module, sh_cl_dma_pcis_q, is the pipelined version of sh_cl_dma_pcis_bus, also exposed for debugging purpose.
  2. The CL_SDA_SLV module instantiates an on-FPGA memory (BRAM) along with the AXI-Lite slave logic that is accessed by the sda_cl_bus AXI-lite master.
  3. The CL_OCL_SLV module implements the slave logic facing the sh_ocl_bus AXI-lite master and accordingly outputs six test config buses (*_tst_cfg_bus).
  4. The CL_INT_SLV module receives interrupt test config signals via int_tst_cfg_bus and demonstrates the interrupt request feature.
  5. The CL_PCIM_MSTR module receives PCIM test config signals via pcim_tst_cfg_bus and demonstrates the PCIM master interface for outbound PCIe transactions (CL to Shell).


Figure 2: Block Diagram of CL_DMA_PCIS_SLV

Now let’s take a closer look at the CL_DMA_PCIS_SLV module. As mentioned above, this module takes in three sets of inputs, the sh_cl_dma_pcis_bus AXI-4 interface, the ddr*_scrb_bus, and the ddr*_tst_cfg_bus. The sh_cl_dma_pcis_bus AXI-4 interface signals first go through an “AXI register slice” module (becomes sh_cl_dma_pcis_q) then feed into an AXI_CROSSBAR module. The AXI crossbar module can arbitrate and steer the request and response traffic between two incoming AXI-4 interfaces (connecting to master) and four outgoing AXI-4 interfaces (connecting to slave). In this example, only one incoming interface is used and is connected to sh_cl_dma_pcis_q; the other one is unused and tied-off. Each of the four outgoing interfaces (lcl_cl_sh_ddr(a/b/d)_q and cl_sh_ddr_q) is for access to one of the four DDR interfaces. Each of the four AXI-4 interfaces goes through one or two “AXI register slice” cores (namely, src_register_slice, dest_register_slice, and axi_register_slice) and then feeds into a CL_TST_SCRB module. Besides the AXI-4 interface input, each CL_TST_SCRB module receives a ddr*_tst_cfg_bus and a ddr*_scrb_bus, that are pipelined using the lib_pipe modules. With these three inputs, the CL_TST_SCRB module outputs a AXI-4 master interface that eventually connects to the DDR modules.

Figure 3: Block diagram of CL_TST_SCRB

Within the CL_TST_SCRB module, a MEM_SCRB module is instantiated to perform memory scrubbing. The MEM_SCRB module implements an FSM internally that starts when receiving the scrb_bus.enable signal, and controls the scrb_* AXI-4 master interface to write zeros to the address range from 0 to MAX_ADDR of DDR DRAM. The MEM_SCRB module also outputs the FSM state (scrb_bus.state), scrubbing address (scrb_bus.addr), and scrubbing completion status (scrb_bus.done), which are eventually propagated to the top-level and connected to the cl_sh_status0 and cl_sh_id0/1 output ports (see Figure 1).
Similarly, the CL_TST module performs auto testing for DDR DRAMs by controlling the atg_* AXI-4 master interface based on the cfg_bus input.
Within the CL_TST_SCRB module, the third AXI-4 master interface (slv_*) is connected to one of the AXI_CROSSBAR‘s four outgoing interfaces, which is initially driven by the sh_cl_dma_pcis_bus that comes from the Shell.
The output ddr_axi4 interface of the CL_TST_SCRB module is selected from the three AXI-4 interfaces based on the scrb_enable and atg_enable signals.

That is about it for an overview of the example.  Hope you have got better ideas about the example — how interfaces are connected, what functionalities are implemented, and etc.

Also, we will write another post to explain the software side of the example, which should hopefully give you a better understanding of the complete picture of the system.


2 thoughts on “Amazon EC2 F1 Tutorial: Understanding the CL_DRAM_DMA example

    1. Ruolong LianRuolong Lian Post author

      Hi Tommy,

      I actually have the same question, so I am not sure about the answer.
      I guess scrub_bus can either be, 1) controlled by some device initialization logic in the CL to clean the DDR after the FPGA is programmed, or 2) controlled by the host through an AXI-Lite interface. In either case, extra RTL module is needed, the initialization logic or an AXI-Lite slave for the host to control the scrub_bus through AXIL interface.


Leave a Reply

Your email address will not be published. Required fields are marked *