All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Share

Description

FPGA Polyphase Filter Bank Study & Implementation. Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor. Image Communications/Reconfigurable Computing Lab. Electrical Engineering Dept. UCLA. Introduction.

Transcript

FPGA Polyphase Filter Bank Study & ImplementationRaghu RaoMatthieu TisserandMike SeveraProf. John VillasenorImage Communications/Reconfigurable Computing Lab.Electrical Engineering Dept.UCLAIntroductionThis document describes a polyphase filter bank and summarizes the results of a feasibility study of its implementation on FPGA-based architectures with respect to size, timing and bandwidth requirementsUnder the SLAAC program, UCLA and Los Alamos National Labs have collaborated in mapping new adaptive algorithms to configurable computing platformsProject GoalsThe current portion of the collaboration has involved the feasibilty and implementation of a Polyphase Filter bank using various FPGAs and hardware architectures.The Polyphase implementation is a multi-rate filter structure combined with a DFT designed to extract subbands from an input signalIt is an optimization of the standard approaches and offers increased efficiency in both size and speed, aspects that are well suited to reconfigurable computingTask heretofore implemented only in ASIC; offers a good opportunity as an example of migration from ASIC to an Adaptive platformBasic Project Parameters128 Megasamples/sec input signal 16 distinct subband outputs Implement using Polyphase filter and DFT structure Poly-DFT128 MssPolyphase Filter ArchitectureCOMMUTATORPolyphase filter bank16 positive frequency binsInput samplesFFTPolyphase Filter ArchitectureCommutator: distributes signal to n lines reduces clock speed by factor of n Polyphase Filter bank: 32 1-input, 1-output polyphase filters or 16 1-input, 2-output optimized polyphase filters FFT: 2n-point real FFT n-point complex FFT System Requirements8 to 16 bits@ 32MHz8 to 16 bits@ 32MHzPoly-DFT128MHz system speed Note: 4 samples at 32MHz equivalent to one sample at 128 MHz All lines are buses equal to sample precision (from 8 to 16 bits) Precision has been implemented as a generic in VHDL makes precision configurable allows easy assessment of precision’s affect on feasibility What Happens Inside?Data will be sent to 32 filters... i.e., need to be latched and further demuxed by factor of 8 Clock speed reduced also by factor of 8 to 4MHz Demux32MHz4MHzAnd then?Some work gets done Polyphase filtering, DFT: @ 4MHz note: using resource-sharing filter structures, initial decimation only by factor of 4, smaller filter bank, work gets done @ 8MHz (slides on this method later) Poly-DFT32MHz4MHzAnd finally?16 samples at 4MHz are available to the remuxing logic 16 samples are required for system Re-Mux runs at 16MHz and samples 4 DFT outputs at a time Results data has latency of a minimum of 12 clock cycles due to demux/remux (plus polyphase/DFT latency) Re-mux32MHz4MHz16MHzPolyphase Filter BanksThe following slides describe the regular polyphase filter bank, the transpose form FIR filter, and optimizations based on symmetryThis is a symmetric FIR filter, i.e., the first n/2 and the last n/2 coeffs are the same, albeit in reverse order.We can exploit this symmetry to implement an optimal form of the filter bank, using resource sharing.We also describe two methods of exploiting resource sharing. The advantage of these schemes is the reduction in the size of both the filter bank and the commutator. The Polyphase Filter bank DesignFirst step is to design a prototype low-pass, FIR filter h(n) with the desired filter parameters The I polyphase filters pk , each of integer length K = M/I are derived from the length M FIR filter h(n) via pk(n) = h(k + nI), k = 0..I-1, n = 0..K-1 (M is selected to be a multiple of I) Our Polyphase DesignK = M/I : M = 128, K = 4, I = 32 pk(n) = h(k + nI), k = 0..I-1, n = 0..K-1 : p0 = h(0 + 0), h(0 +32), h(0 + 64), h(0 + 96) p1 = h(1 + 0), h(1 + 32), h(1 + 64), h(1 + 96) p31 = h(31 + 0), h(31 + 32), h(31 + 64), h(31 + 96) Our Polyphase Designp0h(0), h(32), h(64), h(96)p1h(1), h(33), h(65), h(97)p2h(2), h(34), h(66), h(98)p31h(31), h(63), h(95), h(127)Polyphase filter bank, 32 filters with 4 taps eachDecimate by 32DFT32 - filtersSymmetry - how is it useful?Symmetric filter bankA1B1C1D1A2B2C2D2A3B3C3D3A4B4C4D424 more filters24 more filtersD4C4B4A4D3C3B3A3A2D2C2B2D1C1B1A1Symmetry - how is it useful?Given an n-tap filter with coefficients h(0..n) h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 In a symmetric filter of n taps, coefficient h(i) = h(n-1-i), i.e., we can re-label the above filter coefficients as h0 h1 h2 h3 h4 h5 h6 h7 h7 h6 h5 h4 h3 h2 h1 h0 What does this mean for our polyphase structure? Symmetry - how is it useful?What does this mean for our polyphase structure? h0 h8 h0 h7h1 h9h1 h6h2 h10h2 h5h3 h11h3 h4h4 h12h4 h3h5 h13h5 h2h6 h14h6 h1h7 h15h7 h0Symmetry - how is it useful?What does this mean for a polyphase structure? We can reduce number of coefficient multipliers h0 h7 .. x15 .. x7h1 h6.. x14 .. x6.. x13 .. x5h2 h5 h0 h7 h3 h4 h1 h6.. x12 .. x4h4 h3 h2 h5 .. x11 .. x3h5 h2 h3 h4 .. x10 .. x2h6 h1 .. x9 .. x1h7 h0 .. x8 .. x0Symmetry - how is it useful?What does this mean for a polyphase structure? h0 h7 .. x15 .. x7 x0 .. X8 ..h1 h6 x1 .. X9 .... x14 .. x6.. x13 .. x5h2 h5 x2 .. X10 .... x12 .. x4h3 h4 x3 .. X11 ..The CommutatorThe commutator is half the size for this architecture. After feeding 8 filters, it reverses direction.Samples 1 to 889116It goes to filters 1, 2, 3, 4, 5, … 8 and then 8, 7, 6, 5, 4, 3, 2, 1 and reverses direction again.Symmetry - how is it useful?Hardware ImplementationTranspose Form of the FIR filterx(n)h0h1h2h3y(n)registeraddermultiplierResource Sharing Optimization - Scheme 1x(n)h0h1h2h3y(n)Convolution of even samplesConvolution of odd samplesy(n)Clocked for even samplesClocked for odd samplesControl 1 - odd 0 - evenResource Sharing Optimization - Scheme 2x(n)h0h1h2h3even sample convolutiony(n)y(n)odd sample convolutionClocked for even samplesClocked for odd samplesComparison of SchemesNOTE: schemes1 and 2, also reduce the size of the commutator. With these schemes only a N/2 commutator is needed (decimate by 16). Polyphase filter bank with resource shared filters32 point real16 resource sharing filtersDecimate by 16DFT32 o/pSince each filter convolves alternate samples, giving two outputs, one a convolution of even samples and the other a convolution of odd samples, it also acts to decimate by 2. So, the initial decimator needs to decimate only by 16.The CommutatorThe commutator is half the size for this architecture. After feeding 16 filters, it reverses direction.Samples 1 to 161617132It goes to filters 1, 2, 3, 4, 5, … 16 and then 16... 6, 5, 4, 3, 2, 1 and reverses direction again.A clocking scheme to enable flipflops alternatelyThe flipflops in different colors need to be latch data alternately.When blue is on, green is off. This can be accomplished by a 2 phase clocking scheme. Positive edge DFFNegative edge DFFClock divider circuitAlternate scheme using enable flipflopsClock enableInstead of positive and negative DFFs, enable FFs can be used to convolve alternate samples. This clock enable also can be used as the select line to the muxes and demuxes.Initial StudiesThe initial work involved approaching the topic from a theoretical standpoint understand polyphase theory implement polyphase structure simulation DSP Canvas MatLab create filter based on design specs from Fiore’s paper generate initial size estimates based on knowledge of the size of components and number of CLB’s necessary to implement them on an FPGA Feasibility ExperimentsThese experiments evaluated the feasibility of implementing the polyphase filter bank on an Altera Flex10K250A (part EPF10K250AGC599-1) a Xilinx XC40150 (part XC40150XV-09-BG560) and a Xilinx VirtexXCV1000 (part XCV1000-4-BG560)All experiments were synthesized using Synplify 5.1.4 and placed and routed with Maxplus2 9.1 The filter bank consisted of a decimator at the input, feeding a bank of either 16 or 32, 4 tap filters (filters optimized for symmetry have 2 outputs). The outputs of the filter bank feed a commutator that “re-muxes” data onto 4 lines that will feed a DFT (assumption that the DFT is on another chip).Results for non-symmetry optimized filter bankFlex10K250A, part EPF10k250AGC599-1, does not fit.The critical resource on an Altera Flex10K is the carry chain (fast interconnect) routing.32 filters, with 1 output each, not optimized for symmetry Results for non-symmetry optimized filter bankXilinx Virtex, part XCV1000-4-BG560This has 32 filters, with 1 output each, not optimized for symmetry D - data precision C - coeff precisionResults for symmetry optimized filter bankFlex10K250A, part EPF10k250AGC599-1This has 16 filters, with 2 outputs each, optimized for symmetry Results for symmetry optimized filter bankXilinx XC40150XV-09-BG560This has 16 filters, with 2 outputs each, optimized for symmetry Results for symmetry optimized filter bankXilinx Virtex XCV1000-4-BG560This has 16 filters, with 2 outputs each, optimized for symmetry FFT ImplementationThe following slides describe some optimizations of the FFT and how its inclusion into the system logic affects size and speed. Goal of system is 16 distinct positive frequency bins An N-point FFT produces N/2+1 distinct bins Our input sequence is real The FFT of a real valued sequence of 2N points can be computed efficiently by employing an N-point complex FFT 32-point Real FFT ImplementationX(n), the 2N point real sequence is divided into 2, N-point sequences as follows:h(k) = x(2k), k = 0, 1, …., N - 1 g(k) = x(2k + 1), k = 0, 1, …., N - 1 i.e.. The function h(k) is equal to the even-numbered samples of x(k), and g(k) is equal to the odd-numbered samples.A N-point complex valued sequence y(k) can be written asy(k) = h(k) + j g(k)The DFT of y(k) is then computed. FFT cont’d.Y(k) = H(k) + Wk2N G(k), k = 0, 1, …., N-1Y(k + n) = H(k) - Wk2N G(k), k = 0, 1, …., N-1To compute the real and imag. parts of the output, H(k) and G(k) can be expressed in terms of even and odd components.H(k) = Re(k) + j Io(k)G(k) = Ie(k) - j Ro(k)Substituting this in Y(k), we get,Y(k) = Yr(k) + j Yi(k), whereFFT of a 2N point real sequence from a N point complex FFTEven samplesG(k)realWk2N16 point complex FFT32 point real sequenceOdd samplesG(k + N)imagArea and delay numbers for the 32-point real FFT Altera Flex10K-250A GC599-1Xilinx xc40150-09-bg560: Area 2530 out of 5184(48% of chip), 20.001 MHz. Xilinx Virtex xcv1000-4-bg560: Area 1754 out of 12288(14% of chip), 48.96 MHz.(virtex precision 13 & 13, XC40150: 8 &13)Full System EstimatesThe entire polyphase filter bank along with the FFT does not fit on an Altera Flex device. But it does fit on the Xilinx XC40150 and Virtex.Decimation factor = 32, 17 positive frequency binsData precision = 13, Coeff precision = 13Xilinx xc40150-09-bg560 (D=8, C=13)4581 CLBs out of 5184 - 88% of chip.Freq: 20.492 MHzXilinx Virtex - xcv1000-4-bg5607156 CLB slices out of 12288 - 58% of chip.(11631 LUTs).Freq: 56.715 MHz.Polyphase filter bank on a Xilinx XC40150XV-09-BG560Area and delay estimation flowVerify VHDL by checking the RTL level schematics, checking the number of adders, multipliers and registers.VHDLSynthesisRTL schematicsPlace & routeArea reportTiming analysisTiming reportRTL level schematics and design browser from SynplifyFuture WorkSimulate and test polyphase VHDL implementation using LANL test vectorsWork together with LANL to facilitate possible demo of polyphase workImplement Scheme 2 of resource sharing symmetrical filter bankStudy the advantages and disadvantages with regards to system goals of FFT replacing the FFT with a DCTLook into adaptive filtering techniques Modifying our current polyphase design to accommodate configurable or even programmable rate ConclusionsVery productive intitial phase of collaboration between UCLA and LANLOur work has resulted in some innovations at the algorithmic levelTask migration from ASIC to FPGAThis study has provided useful sizing information for the Altera Flex and Xilinx Virtex families as well as some initial benchmarks of basic DSP methods used in UWB

Related Search

Previous Document

Next Document

Related Documents

Sep 20, 2017

Sep 22, 2017

Sep 24, 2017

Oct 3, 2017

Oct 5, 2017

Oct 7, 2017

Oct 23, 2017

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks