

# **RF Engines Limited**

# Data Sheet



# The Ventrix range of Polyphase DFT Cores. 8 to over one million points

Reference: D02003-Polyphase DFT DatasheetRevision: 1.0Date: 04 Oct 2002Author: IMV,EJF,JL

 Tel
 : +44 (0)1983 550330

 Fax
 : +44 (0)1983 550340

 E-mail
 : info@rfel.com

 Web
 : www.rfel.com



Copyright © 2002 by RF Engines Limited ALL RIGHTS RESERVED The contents of this desument may not be reproduced in who

The contents of this document may not be reproduced in whole or in part without the written or part without the consent of RF Engines Limited.

RF Engines, Innovation Centre, St Cross Business Park, Newport, Isle of Wight, England, PO30 5WB

#### Introduction

This datasheet describes the 'Ventrix' range of high specification Pipelined Polyphase DFT cores offered by RF Engines Limited (RFEL). These cores process complex input data in CONTINUOUS REAL TIME, with no gaps in the data, at sample rates of up to 800 Msps or more in higher radix designs. The complex 3-tap, 10-bit, twice oversampled 16K-point version fits in a single Xilinx XC2V3000. Architectural designs for a one million point, 5-tap, 14-bit, 20MHz complex (real-time) version has shown that a single XC2V6000 and 3 banks of 64-bit, 16MB SDRAM would be required. This document provides details of the cores, optional items and design services available from RFEL.

Ventrix cores are intended for use in applications where filter performance and processing speed are critical and optimum use of available silicon is required. They are fully pipelined for maximum data throughput, and complement our range of Pipelined Frequency Transform (PFT) and other high performance products. The cores are available for licence in net-list or bitstream form.

#### Benefits of the Polyphase DFT

The Polyphase DFT can provide vastly superior filter performance compared with a weighted FFT. Figure 1 shows the frequency response of a 5-tap Polyphase DFT bin compared with a Blackman-Harris weighted FFT bin of the same transform length. It can be seen that the Polyphase DFT bin in-band ripple, stopband rejection and roll-off are all significantly better than the Blackman-Harris weighted FFT bin.



#### Figure 1. 5-tap Polyphase DFT (5120 effective window) Frequency Response vs 1024- point Blackman-Harris Windowed FFT Frequency Response.

The number of polyphase taps can be optimised to match the required filter performance requirements. Figure 2 shows the filter shapes of a range of Polyphase DFTs with different numbers of polyphase taps.



# Figure 2. 3-tap, 5-tap and 8-tap Polyphase DFT Bin Frequency Responses.

The improved filter performance of a Polyphase DFT can provide major benefits for systems that require high dynamic ranges, good filter selectivity, precise amplitude accuracy or a combination of these.

# General Description

The RF Engines Polyphase DFT solution is built from a highly optimised pipelined polyphase front-end core followed by a high performance Pipeline FFT core from the Vectis range of cores. Both cores are highly parameterisable by RFEL to allow an optimal solution for the application. Custom filter design techniques allow RFEL to design very high specification filters of almost any length for use within the Polyphase DFT.

Figure 3 shows the Weighted OverLap and Add (WOLA) architecture that is used to implement the Polyphase DFT. This structure is close to the actual hardware architecture used to achieve the Polyphase DFT, and is functionally equivalent to the more widely published Polyphase DFT architecture.



Figure 3. WOLA Architecture.

*L* samples of complex input data are stored in a shift-register of length *L*, where the newest samples replace the oldest ones. After a period of *M* input samples, the *L* stored samples are weighted by the *L* prototype filter coefficients h(n). These weighted samples are then split into *L*/*K* blocks of *K* samples, and added sample wise to form the block of *K* input samples to the DFT.

The DFT part of the Polyphase DFT is implemented using one of the Vectis range of Pipelined FFT cores. The ratio of K/M determines the output sample rate of the filter bank (K = M for critically sampling, K = 2M for twice oversampling etc). Currently, critical and twice oversampling are supported by the Ventrix range of cores.

# Features

Proven in Xilinx Virtex E hardware. Placed and routed in VirtexII, and Altera Stratix.

Continuous real time processing in excess of 800Msps, complex data 8 to over one million-point versions available

Bit widths and bit growth adjustable (factory setting)

Twiddle bit width adjustable (factory setting) Internal memory partitioning adjustable (factory setting) Fully pipelined design Enables many channels to be interleaved through a single high-speed core

#### **Applications**

Wide-band filter banks Communications systems Electronic warfare (radar, sonar, surveillance) Medical instruments Real-time spectral analysis Multi-channel systems, where many low speed channels are interleaved through the high-speed core.

#### Tested Configurations

Figure 4 shows the configuration used to test a critically sampled version and a twice oversampled version of the Polyphase DFT core within the RFEL development system.



Figure 4. Polyphase DFT System Configuration

The Polyphase DFT design has been implemented and tested in the RFEL development system hardware, which has 4 x Virtex1000E-ehq240 FPGAs connected in series, with the first device fed by an 8-bit A/D with a maximum Fs of 256MHz.

The tested design is a twice oversampled 5120 point weight overlap and add feeding a 10-stage (1024 point) radix-2 DIF complex Pipeline FFT. Bit widths are as shown in Figure 4.

The Polyphase DFT core is fed by a Distributed Half-band Filter (DHBF) which converts a real IF into a complex baseband signal at up to 110 Msps. The DHBF is implemented in the first FPGA of our development system and utilises less than 10% of the Virtex 1000E with no block RAM usage. The DHBF could be included in the same FPGA as the Polyphase DFT design. RFEL can also supply the DHBF as a Licensable IP core.

The Polyphase DFT core has normally ordered inputs and bit reversed outputs. An optional bit-reverser is implemented in the same device to provide a normally ordered output.

The pair of complex I/Q outputs are block interleaved, with half (n/2 points) of the spectrum appearing on I/Q output pair 1 and the second half appearing on I/Q output pair 2. The critically sampled core outputs the two complex data streams at a rate of Fs/4, and the twice over sampled core at Fs/2, where Fs is the A/D sample rate.

# Silicon Size and Speed

The tested twice over-sampled 5120 point weight overlap and add 1024 point Polyphase DFT design fits into a single Xilinx Virtex1000E-ehq240-6, using 10076 slices (82% of the logic resource), and 71 out of the 96 available block RAMs.

| Function    | CLB Slices | Block RAMs |
|-------------|------------|------------|
| 1K Poly DFT | 10076      | 71         |
| DHBF        | 1235       | 0          |

# Table 1. Tested Polyphase DFT System Resource Utilisation in XCV1000EFPGAs.

The two tested examples of Polyphase DFT core can sustain a constant pipelined data rate of 110Msps (limited by our demo system clock).

The core is parameterisable at the VHDL level allowing many variants in terms of filter shape, transform size and bit widths. Please refer to Annex A for details.

| Parameter                             | Specification                       |
|---------------------------------------|-------------------------------------|
| Maximum input rate                    | Fs = 220Msps real from A/D into the |
|                                       | DHBF. 110Msps complex data into     |
|                                       | Polyphase DFT.                      |
| Maximum output rate = input data rate | 2 complex channels at 110Msps       |
|                                       | (REAL TIME!)                        |
| Transform size (n points)             | 1024 points                         |
| Filter dynamic range                  | 80dBc                               |
| Filter ripple                         | +/- 0.1dB                           |
| Filter pass band width                | Fs / 2048 (107KHz for Fs = 220Msps) |
| Filter transition band width          | 75% of pass band width              |
| Filter coefficient bit-width          | 16 bits                             |
| Input bit-width                       | 8 bits I, 8 bits Q                  |
| Output bit-width                      | 16 bits I, 16 bits Q                |
| Bit growth per stage                  | 1 bit per stage until 16 bits       |
| Twiddle bit width                     | 17 bits                             |

#### Table 2. Performance Specifications of the Implemented Virtex E Design

| Parameter                    | Specification                  |
|------------------------------|--------------------------------|
| transform size (n points)    | Parameterisable at factory     |
|                              | 8 to over one million-points   |
| Window length                | Parameterisable at factory     |
| Filter performance           | Factory preset or programmable |
| Filter coefficient bit-width | Parameterisable at factory     |
| Input bit-width              | Parameterisable at factory     |
| Output bit-width             | Parameterisable at factory     |
| Bit growth per stage         | Parameterisable at factory     |
| Twiddle bit-width            | Parameterisable at factory     |

# Table 3. Polyphase DFT Cores Available

# Core implementation in other devices

This core can be targeted at other devices such as the Altera Stratix, Xilinx Virtex II / Pro.

#### Power Requirements

The power requirements for the core are highly dependent on the target device, size of implementation, and clock rate. RFEL can provide individual power estimates for a particular design where required.

Please note that RFEL have conducted tests with a semi-custom ASIC supplier, where a VHDL design has been successfully synthesised for production. This test resulted in a reduction in power of around 75%.

# Polyphase DFT output format

The data output format of the Polyphase DFT is 2 pairs of 2's-complement data streams, (I1 & Q1), (I2 & Q2). The pair of complex I/Q outputs are block interleaved, with half (n/2 points) of the spectrum appearing on I/Q output pair 1 and the second half appearing on I/Q output pair 2. Each data stream runs at half the data rate at the input to the core for the critically sampled version, and at the same data rate as the input for the twice oversampled version. RFEL can output the data in other configurations and formats including IEEE floating point.

Figure 5 shows the log magnitude (10  $\log_{10} (I^2 + Q^2)$ ) of each of the output pairs. One half of the spectrum is shown at the top, and the other half is at the bottom. This view of the block-interleaved outputs was generated using the ModelSim VHDL simulator.



Figure 5. Simulation trace showing block interleaved output format

#### Processing Delay

An example of processing delay is shown in the ModelSim VHDL simulation screen shot of Figure 6. This figure shows the outputs of the 5120 sample window, 1024-point twice over-sampled Polyphase DFT. The initial period before filter build-up is equal to the Pipeline FFT latency + bit-reversal latency + polyphase front-end pipeline latency. The polyphase front-end pipeline latency is approximately 12 clock periods (depending on the target technology, number of polyphase taps, bit-width etc), and is generally insignificant compared with the total filter bank latency. The Pipeline FFT and bit-reversal latencies can be calculated accurately from the equations in the Vectis FFT Data Sheet (D02002).

The Pipeline FFT within the twice over sampled architecture does not have a rate conversion front end, so the FFT latency is approximately 512 input sample clocks for this 1024-point transform, and the bit reversal a further 512 input sample clocks.

The remaining latency is due to the polyphase filter fill-up time, which is a function of filter shape (and hence window length). The example shown has a window length of 5120 samples as can be seen by the fill-up time of 10 frames of 512 samples. A comprehensive discussion on filter bank latency is given in the RFEL white paper "Transient response white paper" available at <u>www.rfel.com</u>



Figure 6. PFFT Input/Output diagram

#### Parameterisability

The architecture of the core has been designed so that many of the parameters can be modified relatively easily for a minimal non-recurring engineering charge. These changes include:- filter characteristics, number of stages, memory partitioning, bit widths and bit growth.

An external interface can be provided so that user-defined window coefficients are loaded by the user.

Using our extensive in-house design experience, RF Engines provide a free initial consultancy service to analyse your requirements within a system environment. If you wish to make use of this service please send a brief email to <u>sales@rfel.com</u> outlining your requirements.

| Signal   | Direction | Туре                   | Width                               | Function                                                                                                                                                                                                                                             |
|----------|-----------|------------------------|-------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| clk      | IN        | std<br>Iogic           | 1 bit                               | Complex input data rate clock at Fs/2.                                                                                                                                                                                                               |
| rst      | IN        | std<br>Iogic           | 1 bit                               | Active High, resets the control logic of the Polyphase DFT.                                                                                                                                                                                          |
| sync_in  | IN        | std<br>logic           | 1 bit                               | Active High Pulse indicating the first<br>data sample of a new 'n'-point data<br>block. 1 clock wide at Fs/2.<br>'n' = transform length for critically<br>sampled version.<br>'n' = $\frac{1}{2}$ transform length for twice<br>oversampled version. |
| I_in     | IN        | std<br>logic<br>vector | (wola_data_width_<br>in 1 downto 0) | In-phase data, 2's complement at complex data rate Fs/2.                                                                                                                                                                                             |
| q_in     | IN        | std<br>logic<br>vector | (wola_data_width_<br>in 1 downto 0) | Quadrature phase data, 2's<br>complement at complex data rate<br>Fs/2.                                                                                                                                                                               |
| sync_out | OUT       | std<br>Iogic           | 1 bit                               | Active High Pulse indicating the first data sample of each n-point result data block.                                                                                                                                                                |
| i1_out   | OUT       | std<br>logic<br>vector | (data_width_out-1<br>downto 0)      | In-phase negative spectrum, 2's<br>complement.<br>Fs/4 for critically sampled version.<br>Fs/2 for twice oversampled version.                                                                                                                        |
| i2_out   | OUT       | std<br>logic<br>vector | (data_width_out-1<br>downto 0)      | In-phase positive spectrum, 2's<br>complement.<br>Fs/4 for critically sampled version.<br>Fs/2 for twice oversampled version.                                                                                                                        |

| q1_out | OUT | std<br>logic<br>vector | (data_width_out-1<br>downto 0) | Quadrature phase negative<br>spectrum, 2's complement.<br>Fs/4 for critically sampled version.<br>Fs/2 for twice oversampled version. |
|--------|-----|------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
| q2_out | OUT | std<br>logic<br>vector | (data_width_out-1<br>downto 0) | Quadrature phase positive<br>spectrum, 2's complement.<br>Fs/4 for critically sampled version.<br>Fs/2 for twice oversampled version. |

# Table 4. Polyphase DFT Interface Specification



# Figure 7. Polyphase DFT Symbol (Twice oversampled version shown).

#### Interface Symbol

Figure 7 shows the interface symbol for the a standard Polyphase DFT IP core, although interfaces can be tailored to individual requirements

#### <u>Pinout</u>

Pinouts can be provided to suit customer and FPGA package requirements for total chip designs.

#### Verification Methods

Extensive functional (pre-synthesis) and timing (post place-and-route) simulations have been performed using the ModelSim simulator. Simulation

scenarios (including data files) and the test benches used for design verification are provided with the core.

#### Core Delivery

| Supplied Item          | Description                        |
|------------------------|------------------------------------|
| Documentation          | Specification \ Data Sheet, Test   |
|                        | bench descriptions.                |
| Design Format          | EDIF netlist or programming bit-   |
|                        | stream.                            |
| Constraints File       | UCF (user constraints file)        |
| Verification           | VHDL Test bench including ModelSim |
|                        | scripts, Test Data Files,          |
|                        | VHDL functional simulation netlist |
|                        | (pre-compiled for ModelSim).       |
| Instantiation Template | VHDL                               |

#### Table 5. Items Provided With Each Core

The Cores are delivered as an EDIF file or ngo netlist with a .ucf user constraints file. Alternatively, they can be supplied as a bit-stream if the target FPGA is fully defined and only holds the core. VHDL test benches are provided along with a VHDL simulation model. Supporting documentation including data sheets and user guides are included.

Design support services are available to help you incorporate the core into your design. Debugging support can also be provided.

#### What is Available Now?

The 1K-point cores as described above are available for immediate licensing. Other variants of the core in terms of filter characteristics, output sample rate, transform length, input data width, bit growth per stage, twiddle width and blockRAM / distributed memory split can be supplied under contract. RFEL will optimise the core based on exact customer requirements.

Accuracy and precision are system design considerations that affect data bitwidth and twiddle width. RFEL can offer system-engineering advice to aid the selection of the optimal core configuration.

# **Optional Items**

RFEL is an off-the-shelf IP supplier and designer of front-end RF signal processing solutions. Please do not hesitate to contact us for information on any of the optional items listed below:-

Bit-reverser Fixed-point to floating-point output converter Distributed Half Band Filters NCO down-converters Highly optimised FIR filters using canonical signed bit multiplier techniques. Multi-channel input interleavers Multi-channel output de-interleavers Mixed radix solutions

#### <u>How to Buy</u>

The standard or modified cores are sold under an application licence. The price is normally made up of an up-front payment followed by royalties. This pricing model is flexible to encompass single use implementations or large volume use. The Licence Agreement and quotations can be provided by contacting sales@rfel.com.

#### What should you do now?

Please send a brief email to <u>sales@rfel.com</u> outlining your basic requirements. Alternatively, if you wish to be kept updated with news of our latest developments and alerted when new cores become available, please ask to be put on our email update list at the address above.

# Table 6. Glossary

| A/D  | Analogue to Digital Converter                    |
|------|--------------------------------------------------|
| ASIC | Application Specific Integrated Circuit          |
| CLB  | Configurable Logic Block                         |
| DHBF | Distributed Half Band Filter                     |
| EDIF | Electronic Data Interchange Format               |
| FFT  | Fast Fourier Transform                           |
| FPGA | Field Programmable Gate Array                    |
| HDL  | Hardware Description Language                    |
| LSB  | Least Significant Bit                            |
| MSB  | Most Significant Bit                             |
| Msps | Million Samples Per Second                       |
| PFT  | Pipelined Frequency Transform                    |
| RFEL | RF Engines Limited                               |
| RPM  | Relationally Placed Macro                        |
| UCF  | User Constraints File                            |
| VHDL | Very High Speed IC Hardware Description Language |
| WOLA | Weighted Overlap And Add                         |

# <u>Annex A</u> <u>Core Implementations</u>

# Please read these notes before referring to the tables below:

The tables below provide silicon usage data for the Polyphase DFT designs in various configurations. RFEL have created these tables based on synthesis of the standard design, presuming that a total FPGA solution is required. The following factors should be considered, as they will most certainly reduce the silicon usage significantly.

# External Memory

The figures assume a total FPGA solution with no external memory. However, as the transform length increases, its memory use grows at a faster rate than the logic requirement. Therefore it is possible to use smaller FPGAs with external memory, as in the RFEL Virtex II cPCI card. The use of external memory can also enable the practical implementation of very large transforms (>million points).

#### Internal Memory Partitioning

The figures use a particular internal memory split between internal RAM for large delay elements and logic fabric for the smaller delay elements. The Xilinx Virtex SRL component is especially efficient in this respect. Memory partitioning can be adjusted to free-up either BlockRAM or logic. Optimal partitioning can make the difference between the design fitting into a particular FPGA or not.

#### Bit Width and Bit Growth

As with the Vectis Pipelined FFT range, the Ventrix Polyphase DFT offers the ability to tailor bit widths at all stages, allowing the designer to achieve optimal use of silicon. Over-specifying the bit widths through the FFT would give a sub-optimal solution with regard to size and again could make the difference between fitting the design into a smaller FPGA rather than a bigger, more expensive device. The target architecture also needs to be taken into account. Block RAM widths and multiplier widths have a significant effect on silicon usage and maximum operating frequency.

#### Complex Data Sample Rate

If the complex data sample rate is significantly less than indicated, a different architecture can be used that will significantly reduce the silicon requirements. The actual core speed will generally be controlled by the analogue to digital (A/D) converter device chosen. Several A/D outputs could be multiplexed through one core utilising the core's very fast clock capability. RFEL have already implemented designs that use this multiplexed feature.

#### **Conclusion**

Silicon usage of the Polyphase DFT is highly dependent on the actual configuration. Whilst the usage figures in these tables are for particular cases, they can be reduced significantly by changes to the above-mentioned factors. To obtain a silicon usage estimate more closely matched to your requirements, please contact RFEL.

# Ordering Information

Part Number Description

Example: Ventrix-R2-4096-XVE-14-17-16-4

Ventrix = Generic name for the Polyphase DFT range R2 = Radix 2 (Other options = R4 to R16) 4096 = 4096 Points (Other options are 8,16,32.... etc) XVE =Virtex-E (Other options A2= Altera 20K, AA = Altera Apex, AS = Altera Stratix, XV2 = Virtex II) 14 = Input bit-width 17 = Twiddle bit-width 16 = Output bit-width 4 = Polyphase taps

#### <u>Models</u>

Models available in Matlab

The example figures shown in the tables below can be cross-referred with Annex B to determine which FPGAs the design could be fitted into.

| 10 bit Input<br>16 bit output | Xilinx Virtex2<br>@ ~150 Msps complex |             |            |  |  |
|-------------------------------|---------------------------------------|-------------|------------|--|--|
| Stages (Points)               | CLB Slices                            | Multipliers | Block RAMs |  |  |
| 1024 points                   | 4731                                  | 42          | 25         |  |  |
| 2048 points                   | 5285                                  | 46          | 43         |  |  |
| 4096 points                   | 5732                                  | 50          | 66         |  |  |

# Critically Sampled Polyphase DFT (5-tap)

| 10 bit Input<br>16 bit output | Altera Stratix<br>@ ~160 Msps complex |      |     |      |       |
|-------------------------------|---------------------------------------|------|-----|------|-------|
| Stages (Points)               | l Ec                                  | ESBs |     |      | DSP   |
|                               | LES                                   | 512s | 4Ks | Mega | Mults |
| 1024 points                   | 6919                                  | 13   | 71  | 0    | 68    |
| 2048 points                   | 7644                                  | 15   | 132 | 0    | 74    |
| 4096 points                   | 8331                                  | 16   | 152 | 1    | 80    |

# **Twice Over sampled Polyphase DFT**

| 10 bit Input<br>16 bit output | Xilinx Virtex2<br>@ ~150 Msps complex |    |    |  |  |
|-------------------------------|---------------------------------------|----|----|--|--|
| Stages (Points)               | CLB Slices Multipliers Block RA       |    |    |  |  |
| 1024 points                   | 5366                                  | 44 | 22 |  |  |
| 2048 points                   | 5879                                  | 47 | 39 |  |  |
| 4096 points                   | 6275                                  | 50 | 64 |  |  |

| 10 bit Input<br>16 bit output | Altera Stratix<br>@ ~160 Msps complex |      |     |      |       |
|-------------------------------|---------------------------------------|------|-----|------|-------|
| Stagos (Points)               | l Es                                  | ESBs |     |      | DSP   |
| Stages (Fonts)                | LLS                                   | 512s | 4Ks | Mega | Mults |
| 1024 points                   | 8313                                  | 14   | 20  | 0    | 88    |
| 2048 points                   | 9005                                  | 11   | 29  | 0    | 94    |
| 4096 points                   | 9683                                  | 12   | 94  | 0    | 100   |

Specifications are subject to change without notice.

# Annex B – FPGA Size Guide

| Device    | System Gates | CLB Slices | Multipliers | BlockRAMs<br>(18Kbits) |
|-----------|--------------|------------|-------------|------------------------|
| XC2V40    | 40K          | 256        | 4           | 4                      |
| XC2V80    | 80K          | 512        | 8           | 8                      |
| XC2V250   | 250K         | 384        | 24          | 24                     |
| XC2V500   | 500K         | 3072       | 32          | 32                     |
| XC2V1000  | 1M           | 5120       | 40          | 40                     |
| XC2V1500  | 1.5M         | 7680       | 48          | 48                     |
| XC2V2000  | 2M           | 10752      | 56          | 56                     |
| XC2V3000  | 3M           | 14336      | 96          | 96                     |
| XC2V4000  | 4M           | 23040      | 120         | 120                    |
| XC2V6000  | 6M           | 33792      | 144         | 144                    |
| XC2V8000  | 8M           | 46592      | 168         | 168                    |
| XC2V10000 | 10M          | 61440      | 192         | 192                    |

#### Xilinx Virtex-II Device Features

# Altera Stratix Device Features

| Device  | LE's   | DSP<br>Multipliers | M512 RAM<br>(32 x 18) | M4K RAM<br>(128 x 36) | Mega RAM<br>(4K x 144) |
|---------|--------|--------------------|-----------------------|-----------------------|------------------------|
| EP1S10  | 10.570 | 48                 | 94                    | 60                    | 1                      |
| EP1S20  | 18460  | 80                 | 194                   | 82                    | 2                      |
| EP1S25  | 25660  | 80                 | 224                   | 138                   | 2                      |
| EP1S30  | 32470  | 96                 | 295                   | 171                   | 4                      |
| EP1S40  | 41250  | 112                | 384                   | 183                   | 4                      |
| EP1S60  | 57120  | 144                | 574                   | 292                   | 6                      |
| EP1S80  | 79040  | 176                | 767                   | 364                   | 9                      |
| EP1S120 | 114140 | 224                | 1118                  | 520                   | 12                     |