Design and Implementation of Parallel FIR Filter Using High Speed Vedic Multiplier

Vaibhav V. Manusmare, Devendra S. Chaudhari

Abstract: The demand for high speed processing has been increasing as a result of expanding computer and signal processing applications. Higher throughput arithmetic operations are important to achieve the desired performance in many signal processing and image processing applications. One of the key arithmetic operations in such applications is multiplication which determines the performance of the entire system. Thus the optimization of the multiplier speed and area is a challenge for many processors. This challenge has been successfully overcome by the use of ancient Vedic multiplier. This paper illustrates design and implementation of parallel Finite Impulse Response (FIR) filters using Vedic mathematics based Urdhva Tiryabhyam algorithm. The system is aiming to reduced propagation delay and area of the filter. The proposed system based on Vedic multiplier is compared with that on conventional multiplier on the basis of resources and time required for processing given data. The comparison shows the 36.29% and 15.70% reduction in propagation delay for two-parallel and three-parallel FIR filter using Vedic multiplier as compared to that of conventional multiplier. The architecture is coded in VHDL and synthesized and simulated by using Xilinx Design Suite 13.1 ISE.

Keywords: Vedic mathematics, Urdhva Tiryagbhyam, Parallel FIR filter.

I. INTRODUCTION

Digital Signal Processing (DSP) is fastest growing area with large number of challenges in front of engineering community. DSP is one of the core technologies used in communication systems. Due to the explosive growth of multimedia application, the demand for high-performance and low-power DSP is getting higher [1]. Many system applications based on DSP especially filtering, Internet of Things (IoT), etc., require extremely fast processing of a large amount of digital data. DSP operations like convolution, correlation, fast Fourier transform (FFT), discrete Fourier transform (DFT), discrete cosine transform (DCT), frequency domain filtering and so forth make utilization of multipliers [2]. Multiplications are important and tedious task among arithmetic operations. Computational speed and execution time are the two elements that choose the productivity of augmentation calculation. The execution time for the processor highly depends on the speed of operation of multiplier unit. In many DSP algorithms multiplication consumes more time compared to other basic operations, so the critical delay path for the complete operation is determined by the delay required for the multiplication unit and it substantiates the performance of the algorithm [3]. Also the speed of a processor is estimated in terms of number of multiplications it can handle in unit time. Thus the speed of multiplier unit has a great importance in performance of a processor.

An FIR filter is one of the fundamental processing elements in any DSP system. Parallel and pipelining processing are two techniques used in DSP applications, which can both be exploited to reduce the power consumption. Parallel processing can be applied to digital FIR filters to either increase the effective throughput or reduce the power consumption of the original filter [4]. In parallel processing, multiple outputs are computed in parallel in a clock period. Therefore, the effective sampling speed is increased by the level of parallelism. The parallel processing of an FIR filter involves the replication of the hardware units in which inputs can be processed in parallel and several outputs can be processed at the same time. In many design situations, the hardware overhead incurred by parallel processing cannot be tolerated due to limitations in design area. Thus realizations of parallel FIR filter consuming less area than traditional parallel FIR filter is advantageous which is achieved by designing parallel FIR structures consisting of advantageous polyphase decomposition dealing with symmetric convolutions [5]. This paper is organized into 7 sections. An introduction in first section is followed by background of Vedic mathematics. Third section is based on Urdhva Tiryabhyam algorithm. Fourth section describes the implementation done in given area. Fifth section provides outline of the proposed system. Sixth section describes results obtained in implementation. Conclusion and future work is stated in the seventh section.

II. VEDIC MATHEMATICS

Vedic mathematics is an ancient mathematics which has been found efficient when dealing with speed and area of the processor. The word Vedic was derived from the Sanskrit word ‘Veda’ which means knowledge. Veda is a gift from ancient sages of India to this world. Vedic mathematics provides the solution to the problem of long computation time by reducing the time delay needed for the operations to be performed. The concept of ancient Vedic Mathematics was brought by Jagadguru Swami Shri Bharati Krishna Tirthaji (1884-1960), a scholar of Sanskrit, mathematics, history and philosophy, after his eight years of research on Atharva Vedas. Vedic Mathematics has been formulated on 16 Sutras (aphorisms) and 13 Sub-Sutras (corollaries). These sutras offer magical short cut methods to all basic mathematical
operations. Owing to its simplicity and regularity, it finds its utility and applications in the field of arithmetic, geometry, trigonometry, quadratic equation, factorization and calculus [6]. The powerful applications of Vedic mathematics are in fields of digital signal processing, chip designing, high speed low power VLSI arithmetic and algorithms and encryption systems.

III. URDHVA TRIYAGBHYAM

Urdhva Triyagbhyam means ‘Vertically and Crosswise’ is the most commonly used method of Vedic mathematics for multiplication. This multiplication formula is equally applicable to all cases of algorithm including decimal number as well as binary number multiplication. The individual digits of both the operands are subjected to vertical and cross wise multiplication [7]. Fig. 1 illustrates schematic representation of Urdhva Triyagbhyam sutra for two decimal numbers. Multiplication is performed through the numbers at the end of arrows and previous carry is added at each step. Also more than one multiplication of any step (Step 2) is added together with previous carry. Result bit is defined as unit place digit and tens place digit is termed as carry for the next step.

Because of the partial product and sums being calculated in parallel, clock frequency of processor does not affect multiplier. Therefore processors need not to operate on increasingly high frequency and thus optimizes the processing power. Due to its regular structure, it can be easily layout in silicon chip. Thus, it is time, power and space efficient technique as compared to conventional method of multiplication [8].

IV. RELATED WORK

An FIR filter plays important role in many DSP applications. To make system power efficient parallel processing is becoming trend now a days. D. A. Kumar et al [4] presented performance analysis of Fast FIR Algorithm (FFA) based FIR filter and symmetric convolution based FIR filter structures considering 2-parallel and 3-parallel filters. These filter structures has been designed with Carry Save Adder (CSA) and Ripple Carry Adder (RCA) by replacing the existing adders. The performance metrics of the above two structures has been done by designing using Verilog HDL and simulated and synthesized using Xilinx ISE 13.2 for Spartan 3E. From the results they concluded that CSA has better performance compared to that of RCA in terms of delay for both structures.

B. Divya and A. Pazhani [5] proposed new parallel FIR Filter structures based on FFA algorithms, which was beneficial to symmetric coefficients where the number of taps is the multiple of 2 or 3. These structures exploit the inherent nature of symmetric coefficients reducing half the number of multipliers in sub-filter section at the expense of additional adders in pre-processing and post processing blocks. Also they compared non-symmetric structure with symmetric structure for 6-parallel FIR filter of 26 tap. It has been observed that symmetric parallel FIR structure utilizes less area than that of non-symmetric structure.

In filtering multiplier is a crucial element that decides performance of the processor. Thus design of multipliers with high speed, low power consumption, less area has got special attention over a decade. This has been made possible by the use of Vedic multiplier. S. P. Pohokar et al [9] designed basic building block of 16×16 Vedic multiplier based on Urdhva Triyagbhyam sutra using VHDL. For design of 16×16 Vedic multiplier, successive 2×2 bit, 4×4 bit and 8×8 bit Vedic multiplier need to be design. The design of 8×8 Vedic multiplier was used as a basic building block for design of 16×16 Vedic multiplier, whereas design of 8×8 was implemented by using 4×4 Vedic multiplier as basic building block. The adder used for adding partial product generated in Vedic multiplication was Carry Save Adder. It has been observed that as size of the multiplier was increased from 2×2 bit to 16×16 bit, the time delay and memory requirement was reduced. Later G. C. Ram et al [10] developed a system replacing Carry Save Adder in Vedic multiplier by Binary to Excess Converter (BEC). The aim of using BEC was to increase the speed of operation, to reduce power consumption and usage of gates compared to Carry Save Adder and Ripple Carry Adder.

V. PROPOSED IMPLEMENTATION

The proposed system is an approach to the implementation of parallel FIR filter on FPGA using Urdhva Triyagbhyam algorithm. The performance speed of parallel FIR filter can be improved by utilizing property of symmetry. To exploit the symmetry of coefficients, main idea is to manipulate the polyphase decomposition to earn as many subfilter blocks as possible, which contain symmetric coefficients so that half the number of multipliers within a single subfilter block can be utilized for the multiplications of whole taps. The Two-Parallel and Three-Parallel FIR filter is designed using Vedic algorithm.

1. Two-Parallel FIR Filter

Two-Parallel FIR filter consists of two filter inputs \( (X_0, X_1) \), two filter coefficients \( (H_0, H_1) \) and two filter outputs \( (Y_0, Y_1) \) as shown in Fig. 2.

![Fig.2 Symmetric Convolution based Two-Parallel FIR Filter](image)

The output equation for this filter structure is given by...
\[
Y_0 = \frac{1}{2} [(H_0 + H_1) (X_0 + X_1) - (H_0 - H_1) (X_0 - X_1)]
\]
\[
Y_1 = \frac{1}{2} [(H_0 + H_1) (X_0 + X_1) + (H_0 - H_1) (X_0 - X_1)] - H_1 X_1 + z^2 H_1 X_1
\]

It contains three subfilter blocks out of which two can be used for symmetrical convolution.

2. Three-Parallel FIR Filter

Three-Parallel FIR filter consists of three filter inputs \((X_0, X_1, X_2)\), three filter coefficients \((H_0, H_1, H_2)\) and three filter outputs \((Y_0, Y_1, Y_2)\) as shown in Fig. 3.

![Fig.3 Symmetric Convolution based Three-Parallel FIR Filter](image)

The output equation for this filter structure is given by
\[
Y_0 = \frac{1}{2} \left[ (H_0 + H_1) (X_0 + X_1) + (H_0 - H_1) (X_0 - X_1) \right] - H_1 X_1 + z^2 \left[ (H_0 + H_1) (X_0 + X_1) + (H_0 - H_1) (X_0 - X_1) - (H_0 - H_1) (X_0 - X_1) \right]
\]
\[
Y_1 = \frac{1}{2} \left[ (H_0 + H_1) (X_0 + X_1) - (H_0 - H_1) (X_0 - X_1) \right] + z^2 \frac{1}{2} \left[ (H_0 + H_2) (X_0 + X_2) + (H_0 - H_2) (X_0 - X_2) \right] - \frac{1}{2} \left[ (H_0 + H_1) (X_0 + X_1) + (H_0 - H_1) (X_0 - X_1) \right]
\]
\[
Y_2 = \frac{1}{2} \left[ (H_0 + H_2) (X_0 + X_2) - (H_0 - H_2) (X_0 - X_2) \right] + H_1 X_1
\]

It contains six subfilter blocks out of which four can be used for symmetrical convolution.

VI. RESULTS

A. Area Comparison

Slice flip flops are resources on the FPGA that can perform logic functions. Logic resources are grouped in slices to create configurable logic blocks. A slice contains a set number of LUTs, flip-flops and multiplexers. An LUT is a collection of logic gates hard-wired on the FPGA. LUTs store a predefined list of outputs for every combination of inputs and provide a fast way to retrieve the output of a logic operation. Table I and II illustrate resource utilization summary for Two-Parallel and Three-Parallel FIR filter respectively.

### Table I: Logic Utilization for Two-Parallel FIR Filter

<table>
<thead>
<tr>
<th>Logic Utilization</th>
<th>Conventional Multiplier</th>
<th>RCA based Vedic Multiplier</th>
<th>CSLA based Vedic Multiplier</th>
</tr>
</thead>
<tbody>
<tr>
<td>No. of Slices</td>
<td>1356</td>
<td>571</td>
<td>612</td>
</tr>
<tr>
<td>No. of Slice Flip-flops</td>
<td>1299</td>
<td>856</td>
<td>865</td>
</tr>
<tr>
<td>No. of 4-input LUTs</td>
<td>1950</td>
<td>820</td>
<td>958</td>
</tr>
</tbody>
</table>

### Table II: Logic Utilization for Three-Parallel FIR Filter

<table>
<thead>
<tr>
<th>Logic Utilization</th>
<th>Conventional Multiplier</th>
<th>RCA based Vedic Multiplier</th>
<th>CSLA based Vedic Multiplier</th>
</tr>
</thead>
<tbody>
<tr>
<td>No. of Slices</td>
<td>2489</td>
<td>1709</td>
<td>1851</td>
</tr>
<tr>
<td>No. of Slice Flip-flops</td>
<td>2451</td>
<td>1842</td>
<td>1852</td>
</tr>
<tr>
<td>No. of 4-input LUTs</td>
<td>3764</td>
<td>3019</td>
<td>3356</td>
</tr>
</tbody>
</table>

In Fig. 4 and Fig. 5, area of Two-Parallel and Three-Parallel FIR filter is compared based on number of slices, flip flops and LUTs. It is observed that proposed Vedic multiplier accumulates fewer resources than conventional multiplier for both the filters.

![Fig.4 Area Comparison for Two-Parallel FIR Filter](image)

![Fig.5 Area Comparison for Three-Parallel FIR Filter](image)

B. Time Comparison

Processing speed is the factor that decides efficiency of the system. Table III shows propagation delay required using different multipliers for both the filters.

### Table III: Time Comparison Analysis
Multipliers | Propagation Delay (nsec) | Two-Parallel FIR Filter | Three-Parallel FIR Filter
--- | --- | --- | ---
Conventional Multiplier | 12.240 | 14.259 | 
RCA based Vedic Multiplier | 8.311 | 13.061 | 
CSLA based Vedic Multiplier | 7.798 | 12.019 | 

When this filter structures are compared for different multipliers based on time of completion, it was observed that conventional multiplier accumulates more time due to more number of iteration whereas Vedic multiplier which processes parallel operation utilizes less time.

**C. Simulation Results**

Technology view describes top block which shows the set of inputs and outputs, Register Transfer Logic (RTL) view designates the internal architectural blocks along with the connections between input and output pins. Timing waveform is generated by writing test bench program which contains the set of input test vectors applied to design. RTL schematic view of proposed 16x16 bits Vedic multiplier is shown in Fig. 7.

![Fig. 7 RTL Schematic View of 16x16 Vedic Multiplier](image)

**Fig. 7 RTL Schematic View of 16x16 Vedic Multiplier**

Technology view of Two-Parallel and Three-Parallel FIR filter is shown in Fig. 8 and Fig. 9 respectively.

![Fig. 8 Technology View of Two-Parallel FIR Filter](image)

**Fig. 8 Technology View of Two-Parallel FIR Filter**

![Fig. 9 Technology View of Three-Parallel FIR Filter](image)

**Fig. 9 Technology View of Three-Parallel FIR Filter**

Fig. 10 and Fig. 11 provides timing waveform of Two-Parallel and Three-Parallel FIR filter respectively which represent output obtained from various input vector provided in the test bench program during simulation.

![Fig. 10 Timing Waveform for Two-Parallel FIR Filter](image)

**Fig. 10 Timing Waveform for Two-Parallel FIR Filter**
VII. CONCLUSION

FIR filter is one of the fundamental processing elements used in DSP applications. High speed realization of FIR filters with less area consumption has become more demanding over the years. The parallel FIR structures dealing with symmetric convolutions is designed and implemented using Vedic mathematics based Urdhva Triyagbyham algorithm on FPGA.

The delay for two-parallel FIR filter using CSLA based Vedic multiplier is 7.798 ns while that of using conventional multiplier is 12.240 ns. Also for three-parallel FIR filter the delay for conventional multiplier is 14.259 ns get reduced to 12.019 ns for CSLA based Vedic multiplier. Thus Vedic multiplier shows the improved speed among the conventional multiplier and it also reduces area of the system. CSLA based Vedic multiplier provides advantage over RCA based Vedic multiplier in terms of processing speed. Also RCA is efficient for the processor demanding less area.

REFERENCES


Vaibhav V. Manusmare received the B.E. degree in Electronics and Communication from Kavikulgur Institute of Technology and Science (KITs), Ramtek, Nagpur in 2014 and currently pursuing the M. Tech. degree in Electronics System and Communication from Government College of Engineering, Amravati.

Dr. Devendra S. Chaudhari obtained BE, ME, from Marathwada University, Aurangabad and PhD from Indian Institute of Technology, Bombay, Mumbai. He has been engaged in teaching, research for period of about 25 years and worked on DST-SERC sponsored Fast Track Project for Young Scientists. He has worked as Head Electronics and Telecommunication, Instrumentation, Electrical and in-charge Principal at Government Engineering Colleges. Presently he is working as Head, Department of Electronics and Telecommunication Engineering at Government College of Engineering, Amravati. Dr. Chaudhari published research papers and presented papers in international conferences abroad at Seattle, USA and Austria, Europe. He worked as Chairman / Expert Member on different committees of All India Council for Technical Education, Directorate of Technical Education for Approval, Graduation, Inspection, Variation of Intake of diploma and degree Engineering Institutions. As a university recognized PhD research supervisor in Electronics and Computer Science Engineering he has been supervising research work since 2001. One research scholar received PhD under his supervision. He has worked as Chairman / Member on different university and college level committees like Examination, Academic, Senate, Board of Studies, etc. he chaired one of the Technical sessions of International Conference held at Nagpur. He is a fellow of IE, IETE and life member of ISTE, BMESI and member of IEEE (2007). He is recipient of Best Engineering College Teacher Award of ISTE, New Delhi, Gold Medal Award of IETE, New Delhi, Engineering Achievement Award of IE (I), Nashik. He has organized various Continuing Education Programme and delivered Expert Lectures on research at different places. He has also worked as ISTE Visiting Professor and visiting faculty member at Asian Institute of Technology, Bangkok, Thailand. His present research and teaching interests are in the field of Biomedical Engineering, Digital Signal Processing and Analogue Integrated Circuits.