

# The Benefits of FPGA Coprocessing

The Xilinx ESL Initiative brings the horsepower of an FPGA coprocessor closer to the reach of traditional DSP system designers.

by Tom Hill DSP Marketing Manager Xilinx, Inc. tom.hill@xilinx.com

High-performance DSP platforms, traditionally based on general-purpose DSP processors running algorithms developed in C, have been migrating towards the use of an FPGA pre-processor or coprocessor. Doing so can provide significant performance, power, and cost advantages (Figure 1).

Even with these considerable advantages, design teams accustomed to working on processor-based systems may avoid using FPGAs because they lack the hardware skills necessary to use one as a coprocessor. Unfamiliarity with traditional hardware design methodologies such as VHDL and Verilog limits or prevents the use of an FPGA, oftentimes resulting in more expensive and power-hungry designs. A new group of emerging design tools called ESL (electronic system level) promises to address this methodology issue, allowing processor-based developers to accelerate their designs with programmable logic while maintaining a common design methodology for hardware and software.

e the property of their respective ow



Figure 1 – DSP hardware platform

# Boosting Performance with FPGA Coprocessing

You can realize significant improvements in the performance of a DSP system by taking advantage of the flexibility of the FPGA fabric for operations benefiting from parallelism. Common examples include (but are not limited to) FIR filtering, FFTs, digital down conversion, and forward error correction (FEC) blocks.

Xilinx<sup>®</sup> Virtex<sup>TM</sup>-4 and Virtex-5 architectures provide as many as 512 parallel multipliers capable of running in excess of 500 MHz to provide a peak DSP performance of 256 GMACs. By offloading operations that require high-speed parallel processing onto the FPGA and leaving operations that require high-speed serial processing on the DSP, the performance and cost of the DSP system are optimized while lowering system power requirements.

# Lowering Cost with FPGA Embedded Processing

A DSP hardware system that includes an FPGA coprocessor offers numerous implementation options for the operations contained within the C algorithm, such as partitioning the algorithm between the DSP processor, the FPGA-configurable logic blocks (CLBs), and the FPGA embedded processor. The Virtex-4 device offers two types of embedded processors: the MicroBlaze<sup>TM</sup> soft-core processor, often used for system control, and the higher performance PowerPC<sup>TM</sup> hard-core embedded processor. Parallel operations partitioned into the FPGA fabric can be used directly in a DSP datapath or configured as a hardware accelerator to one of these embedded processors.

The challenge facing designers is how to partition DSP system operations into the available hardware resources in the most

### C to Gates

When targeting an FPGA, the term "C to gates" refers to a C-synthesis design flow that creates one of two implementation options – direct implementation onto the FPGA fabric as a DSP module or the creation of a hardware accelerator for use with the MicroBlaze or PowerPC 405 embedded processor (Figure 2).

When an operation lies directly in the DSP datapath, the highest performance is achieved by implementing an operation as a DSP module. This involves synthesizing the C code directly into RTL and then instantiating the block into the DSP datapath. You can perform this instantiation using traditional HDL design methodologies or through system-level design tools such as Xilinx System Generator for DSP. Through direct instantiation, you can achieve the highest performance with minimal overhead.



Figure 2 – C implementation options for a DSP hardware system

efficient and cost-effective manner. How best to use FPGA embedded processors is not always obvious, but this hardware resource can have the greatest impact on lowering overall system cost. FPGA embedded processors provide an opportunity to consolidate all non-critical operations into software running on the embedded processors, minimizing the total amount of hardware resources required for the system. The leading C-synthesis tools are capable of delivering performance approaching that of hand-coded RTL – but achieving this requires detailed knowledge of C-synthesis tool operation and coding styles. Code modifications and the addition of inline synthesis instructions for inserting parallelism and pipeline stages are typically required to achieve the desired performance. Even with these modifications, however, the productivity gains can be significant. The C-system model remains the golden source driving the design flow.

An alternative and often simpler approach is to create a hardware accelerator for one of the Xilinx embedded processors. The processor remains the primary target for the C routines, with the exception that performance-critical operations are pushed to the FPGA logic in the form of a hardware accelerator. This provides a more software-centric design methodology. However, some performance is sacrificed with this

Development Tools

Power Consumption

On-Chip Memory

Competitive Price

I/C

Development Boards

Optimal MIPS/MMACS

Reference Designs Available Available Algorithms

Code-Compatible Upgrade

Available Hardware Accelerators

Vendor Roadman

Vendor Support

Packaging

Third-Party Support

Maximum MIPS/MMACS

approach. C routines are synthesized to RTL, similar to the DSP module approach, except that the top-level entity is wrapped with interface logic to allow it to connect to one of the Xilinx embedded processor buses. This creates a hardware accelerator that can be imported into the Xilinx EDK environment and called through a softwarefriendly C function call.

The performance requirements for mapping C routines into hardware accelerators are typically less aggressive. Here the objective is to accelerate the performance

> beyond that of pure software while maintaining a softwarefriendly design flow. Although the coding techniques and in-line synthesis instructions are still available, you can typically achieve your desired performance gains without their use.

#### Design Methodology – The Barrier to Adoption

The effort and breadth of skill required to correctly partition and implement a complex DSP system is formidable. In 2005, the market research firm Forward Concepts conducted a survey to determine the most important FPGA selection criteria for DSP. The published results, shown in



4000

8000

NORMALIZED WEIGHTED RESPONSES

12000

16000



Figure 4 – Xilinx ESL Initiative design flows

Figure 3, identify development tools as the most important.

The survey illustrates that the benefits of a DSP hardware system utilizing an FPGA coprocessor are well understood, but that the current state of development tools represents a significant barrier to adoption for traditional DSP designers.

#### **The Xilinx ESL Initiative**

ESL design tools are pushing digital design abstraction beyond RTL. A subset of these tool vendors are specifically focused on mapping system models developed in C/C++ into DSP hardware systems that include FPGAs and DSP processors. Their vision is to make the hardware platform transparent to the software developer (Figure 4).

Rather than attempting to solve one piece of this methodology puzzle internally, this year Xilinx launched a partnership program with key providers of ESL tools called the ESL Initiative. The focus of this partnership is to empower designers with software programming skills to be able to easily implement their ideas in programmable hardware without having to learn traditional hardware design skills. This program is designed to accelerate the development and adoption of world-class design methodologies through innovation within the ESL community.

#### Conclusion

When combined, the collective offerings from Xilinx ESL partners offer a wide spectrum of complementary solutions that are optimized for a range of applications, platforms, and end users. Xilinx too has focused its efforts on complementary technology. For example, AccelDSP Synthesis provides a hardware path for algorithms developed in floating-point MATLAB, while System Generator for DSP allows modules developed using ESL designs to be easily combined with Xilinx IP and embedded processors. The quickest path to realizing a programmerfriendly FPGA design flow is through a motivated and innovative set of partners.

For more information about the ESL Initiative, visit *www.xilinx.com/esl.* 

# DSP CHIP SELECTION CRITERIA