# Low Power and Area Efficient Borrow Save Adder for MAC Unit in VLSI Application

# S. Gayathri Priya<sup>[1]</sup>, J. Jagan Babu<sup>[2]</sup>, V. Kumaravel<sup>[3]</sup>, M. Pooja<sup>[4]</sup>

<sup>1234</sup>Assistant Professor, Department of Electronics and Communication Engineering, R.M.D Engineering College, Chennai

 $sgp.ece@rmd.ac.in^{1}$ ,  $jj.ece@rmd.ac.in^{2}$ ,  $kumaravel.ece@rmd.ac.in^{3}$ ,  $mp.ece@rmd.ac.in^{4}$ 

#### Abstract

In hypercompetitive embedded system environment, to develop the unique characteristic of machine learning computation for more efficient MAC design for reduced both the area and power. In this paper, Multiplyaccumulate (MAC) computations account for a large part of machine learning accelerator operations use in pipelined structure is usually adopted to improve the performance by reducing the number of adder circuits. The proposed a pipelining method that eliminates some of the flip-flops in carry look adder in selectively. Here, introduce the applying the Feed forward-Cutset-Free (FCF) pipelining method in borrow save adder (BSA) to the accumulator by reducing the design, optimized the power dissipation and undesired data transition in (MFCF-PA). From the FPGA Xilinx simulation output result shows that, the MAC unit reached between 15% and 25% energy saving and area reduction of 15% over the existing carry look ahead adder (CLA) conformist pipelined MAC units.

Keywords-Multiply Accumulate (MAC), Feed forward-Cutset-Free (FCF), Borrow Save Adder (BSA), Carry look ahead adder (CLA) and FPGA.

# **1. Introduction**

This paper [1] would facilitates to reduce the over fitting in the ImageNet LSVRC with 1000 different classes. The simulation result of this paper achieved top five test error rates of 15.3% over the other test method is 26.2%. The author [2] investigates the profound in Recurrent Neural Networks (RNN) network. The representation of deep networks with the flexible combination multiple levels of classes with empower RNN. In this paper, demonstrate the test error rate 17.7% in the phoneme appreciation. TIMIT The (Priyanka Nain & Virdi) have demonstrate on the MAC unit implemented to perform in processing and signal microcontroller application. In this paper, provide investigation comparative analysis of MAC unit. An execution unit of the three stage pipeline architecture with optimized sixteen x sixteen multipliers support both signed or unsigned and fixed point fractional input operands.[Hoange et al.] Represent two cycles MAC a guard bits and saturation circuit used to reduction tree in first stage and route evaluations using 65 nm of second stage. This MAC architecture support the rationalize gates and up to 52% reduction of energy consumption compared to the conventional MAC (DTMAC) unit. The investigation [5] performance of convolution network produce accuracy in large scale

network using 3 x 3 convolution filters, which is support the significant improvement of 16 to 19 weigh layers. In this article [6] represent the novel structure for bidirectional architecture of gated recurrent unit will show experimental results demonstrated is achieved.

Yu-Hsin et al. present the energy efficient achieves by using processing dataflow with 168 processing elements, which provide optimum energy saving by reduce the data movement measured. Finally, the simulation result is providing 236 mW power consumption. The author [8] developed for fast multiplier unit of combination logical circuits using straight forward diode transistor logic circuit. This architecture unit design provides 1 micro sec fastest multiplication iteration and their supports reduce the cost effect in large multiplication unit. The author [9] illustrated the novel high speed algorithm with low power multiplication to provide this

algorithm to reduce the power required and adders required in logic circuit. This is achieved by two complement negation rules to improvement of 90% delay and 89.9% consumption power.

### 2. Proposed method

### 2.1 Carry Look ahead Adder (CLA)

Fast adder in digital system used to reduce the delay in basic circuits of proposed system functioning by Carry Look ahead Adder is shown in Figure 1. Which has 4 bit carry adder to perform quickly in every stage whether the carry output is either 1 or 0 in previous stage. Below diagram consists of 4 bit CLA has input is A0,A1,A2,A3 and B0,B1,B2,B3,the evaluation can be executed with less time and the output carry flag to improve the C0,C1,C2 an C3 performance of the overall proposed architecture system.



Figure 1: Architecture of Carry Look ahead Adder

IT in Industry, Vol. 9, No.2, 2021

#### 2.2 Barrow Save Adder (BSA)

In arithmetic logic operation done by using VHDL, In Figure 2 shown in n-bit schematic logic circuits, which contain n-bit full adders (i.e,4 bit full adder logic circuit). Which has the inputs is Cin (Carry in) and x,y are

produces the corresponding Cnout and Cpout and sum Sn,Sp, it gives the code for 4-bit barrow save adder entity to support proposed architecture in design complexity and energy saving.



Figure 2: Architecture of Borrow Save Adder

2.3 Proposed MFCF-PA architecture Power efficiency

Table 1 shows the example input of MFCA-PA, which are the 4 bits carry input

with PG in four stages with binary number; one between the two four-bit hexadecimal numbers is carry output from LSB.

| Хр   | Xn   | Carry_in | Yp   | Yn   | Carry_in |
|------|------|----------|------|------|----------|
| 0000 | 0000 | 0        | 0000 | 0000 | 0        |
| 0001 | 0000 | 1        | 0001 | 0000 | 0        |
| 0000 | 0001 | 0        | 0000 | 0001 | 1        |
| 1111 | 1111 | 0        | 1111 | 1111 | 0        |

Table 1: Example input of MFCA-PA



Figure 3 RTL Technology with RTL Schematic Xp=0111 ;Xn=0000 ;Yp=0001;Yn=0000

# **3. Performance Result and Discussion**

The Figure 3 show the RTL schematic diagram with proposed

architecture inputs carry in which is produce the corresponding simulation output represent is shown in Figure 4 and Figure 5.



Figure: 4 Timing diagrams of 32 bits Borrow Save Adder (BSA)



Figure: 5 Output of MAC with Borrow Save Adder (BSA)



# **3.1 Discussion**



Based on the Xilinx Simulation output, Table 2 shown on the performance input bits are 16 and 32 bits has been simulated to optimal result solution to support reduce the design complexity in terms of area and power saving. Figure 4 IT in Industry, Vol. 9, No.2, 2021

indicate the power consumption of calculated for proposed method and existing function MAC-CLA using Xilinx ISE design suit which is shown in Figure 5 to provide high performance over the Agilent 1692A logic analyzer use in MAC unit.



MAC + CLA MAC + MFCF-PA FCF-MAC + CLA FCF-MAC + MFCF-PA

Figure 5: Power Consumption in MAC-CLA Vs MFCF-PA

(Paul F et al.) have demonstrate on the multiply accumulate operation implemented in parallel multiplier design, which has been use RISC in central processing units. The analysis result is optimal delay in multiply accumulate circuit over the present fast multiplier designs. (Tung et al.) have constructed MAC pipeline stage architecture, that contains only partial product generation and tree circuits used in this design approach of place and route evolution of 65nm. The simulation result of

this input is improved 32% degrease energy/operation with operand sizes are 16 and 32 bits over the existing two cycle MAC architecture design. In 2020, Rajesh et al. report the application based DSP system components and implementation performance of the 32 bit efficient MAC. Here, modify the Weinberger adder circuit method in MAC unit. The comparative analysis result provides high speed and low energy consumption power over the existing method.

| Operand     | Operand Size | Carry Look ahead<br>Adder (MAC-CLA) | Proposed MFCF-<br>PA-(MAC-BSA) | Delay (ns) |
|-------------|--------------|-------------------------------------|--------------------------------|------------|
| Power       | 16 bits      | 234.37 mw                           | 218.25 mw                      | 4.345      |
| Power       | 32 bits      | 432.26 mw                           | 419.67 mw                      | 7.985      |
| Area (Um^2) | 16 bits      | 367                                 | 341                            | 5.865      |
| Area (Um^2) | 32 bits      | 521                                 | 492                            | 3.432      |

Table 2: A Performance comparison between MAC-CLA Vs Proposed MFCF-PA-(MAC-BSA)

### 4. Conclusion

It is concluded that the use of the proposed scheme Feed forward-Cutset-Free (FCF) in Borrow Save Adder (BSA) architecture reduced the design complexity of the conventional design in MAC unit. The performance of the accumulator FCF pipelining method to reduce the number of flip flops selectively, it is identify that to the proposed applied MFCF-PA architecture with smaller delay variation using Xilinx ISE 13.2 design suit which is shown in Figure 4. Finally, result showed at mentioned above method has been used to reduce the design complexity, area 15% and energy saving from 218.25 mw to 419.67 mw over the conventional Carry Look Ahead adder (MAC-CLA). In this proposed impression, to achieve more efficient in Multiply Accumulate (MAC) unit design.

# **5.Reference**

- 1. A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Image Net classification with deep convolutional neural networks," in Proc. Adv. Neural Inf. Process. Syst., pp.1097– 1105,2012.
- C. 2. S. Ramesh, Yaashuwanth, K. Prathibanandhi, Adam Raja Basha, T. Jayasankar, "An optimized deep neural network based DoS attack detection in wireless video sensor network", Journal of Ambient Intelligence and Humanized Computing (2020),https://doi.org/10.1007/s12652-020-02763-9
- 3. Priyanka Nain and S.Virdi "Multiplier-Accumulator (MAC) Unit", International journal of digital application and Contemporary Research,vol.5(3),pp.1-4,2016.
- 4. T. T. Hoang, M. Sjalander, and P. Larsson-Edefors, "A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and its application to a doublethroughput MAC unit," *IEEE Trans. Circuits Syst.I,Reg.Papers*,vol.57,no.12,pp.3073–

3081,Dec.2010.

- 5. K.Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition ICLR, pp.1-14, 2015.
- 6. Xiong Luo et al. "Attention-Based Relation Extraction with Bidirectional Gated Recurrent Unit and Highway Network in the Analysis of Geological Data", IEEE Access,vol.6,pp.5705-5715,2020.
- Yu-Hsin Chen and Joel S.Emer, "Eyeriss : An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Nerual Networks", IEEE Journal of Solid-State Circuits,pp.1-12,2016.
- 8. C Thiruvengadam, M Palanivelan, K. Senthil Kumar, T. Jayasankar, "Low power approximate adder based repetitive iteration cord (LP-ARICO) algorithm for high-speed applications," Microprocessor and Microsystems (Elsevier 2020), vol 78,October, (2020) .https://doi.org/10.1016/j.micpro.2020.10326 0
- Manash Chanda et al. "Implementation of modified low power 8 x 8 signed DADDA multiplier", International journal of electronics and electrical engineering, vol.11,no.01,pp.8-14,2010.
- Paul F.Stelling and Vojin G, "Implementing multiply Accumulate operation in multiplication Time", IEEE Xplore, pp.99-106, 1997.
- 11. Tung Thanh Hoang et al. "A high speed and Energy efficient two cycle multiple accumulate architecture and its application to adouble throughput MAC unit", IEEE Transactions on circuits and systems-I", vol.57,no.12,pp.3071-3081,2010.
- 12. Rakesh H.M and G.S.Sunitha, "Design and implementation of Novel 32-bit MAC unit for DSP applications",International conference for emerging Technology (INCET), pp.1-6,2020.