Wallace tree

A Wallace multiplier is a hardware implementation of a binary multiplier, a digital circuit that multiplies two integers. It uses a selection of full and half adders (the Wallace tree or Wallace reduction) to sum partial products in stages until two numbers are left. Wallace multipliers reduce as much as possible on each layer, whereas Dadda multipliers try to minimize the required number of gates by postponing the reduction to the upper layers.[1]

4 layer Wallace reduction of an 8x8 partial product matrix, using 14 half adders (two dots) and 38 full adders (three dots). The dots in each column are bits of equal weight.

Wallace multipliers were devised by the Australian computer scientist Chris Wallace in 1964.[2]

The Wallace tree has three steps:

1. Multiply each bit of one of the arguments, by each bit of the other.
2. Reduce the number of partial products to two by layers of full and half adders.
3. Group the wires in two numbers, and add them with a conventional adder.[3]

Compared to naively adding partial products with regular adders, the benefit of the Wallace tree is its faster speed. It has ${\displaystyle O(\log n)}$ reduction layers, but each layer has only ${\displaystyle O(1)}$ propagation delay. A naive addition of partial products would require ${\displaystyle O(\log ^{2}n)}$ time. As making the partial products is ${\displaystyle O(1)}$ and the final addition is ${\displaystyle O(\log n)}$, the total multiplication is ${\displaystyle O(\log n)}$, not much slower than addition. From a complexity theoretic perspective, the Wallace tree algorithm puts multiplication in the class NC1. The downside of the Wallace tree, compared to naive addition of partial products, is its much higher gate count.

These computations only consider gate delays and don't deal with wire delays, which can also be very substantial.

The Wallace tree can be also represented by a tree of 3/2 or 4/2 adders.

It is sometimes combined with Booth encoding.[4][5]

Detailed explanation

The Wallace tree is a variant of long multiplication. The first step is to multiply each digit (each bit) of one factor by each digit of the other. Each of this partial products has weight equal to the product of its factors. The final product is calculated by the weighted sum of all these partial products.

The first step, as said above, is to multiply each bit of one number by each bit of the other, which is accomplished as a simple AND gate, resulting in ${\displaystyle n^{2}}$  bits; the partial product of bits ${\displaystyle a_{m}}$  by ${\displaystyle b_{n}}$  has weight ${\displaystyle 2^{(m+n)}}$

In the second step, the resulting bits are reduced to two numbers; this is accomplished as follows: As long as there are three or more wires with the same weight add a following layer:-

• Take any three wires with the same weights and input them into a full adder. The result will be an output wire of the same weight and an output wire with a higher weight for each three input wires.
• If there are two wires of the same weight left, input them into a half adder.
• If there is just one wire left, connect it to the next layer.

In the third and final step, the two resulting numbers are fed to an adder, obtaining the final product.

Example

${\displaystyle n=4}$ , multiplying ${\displaystyle a_{3}a_{2}a_{1}a_{0}}$  by ${\displaystyle b_{3}b_{2}b_{1}b_{0}}$ :

1. First we multiply every bit by every bit:
• weight 1 – ${\displaystyle a_{0}b_{0}}$
• weight 2 – ${\displaystyle a_{0}b_{1}}$ , ${\displaystyle a_{1}b_{0}}$
• weight 4 – ${\displaystyle a_{0}b_{2}}$ , ${\displaystyle a_{1}b_{1}}$ , ${\displaystyle a_{2}b_{0}}$
• weight 8 – ${\displaystyle a_{0}b_{3}}$ , ${\displaystyle a_{1}b_{2}}$ , ${\displaystyle a_{2}b_{1}}$ , ${\displaystyle a_{3}b_{0}}$
• weight 16 – ${\displaystyle a_{1}b_{3}}$ , ${\displaystyle a_{2}b_{2}}$ , ${\displaystyle a_{3}b_{1}}$
• weight 32 – ${\displaystyle a_{2}b_{3}}$ , ${\displaystyle a_{3}b_{2}}$
• weight 64 – ${\displaystyle a_{3}b_{3}}$
2. Reduction layer 1:
• Pass the only weight-1 wire through, output: 1 weight-1 wire
• Add a half adder for weight 2, outputs: 1 weight-2 wire, 1 weight-4 wire
• Add a full adder for weight 4, outputs: 1 weight-4 wire, 1 weight-8 wire
• Add a full adder for weight 8, and pass the remaining wire through, outputs: 2 weight-8 wires, 1 weight-16 wire
• Add a full adder for weight 16, outputs: 1 weight-16 wire, 1 weight-32 wire
• Add a half adder for weight 32, outputs: 1 weight-32 wire, 1 weight-64 wire
• Pass the only weight-64 wire through, output: 1 weight-64 wire
3. Wires at the output of reduction layer 1:
• weight 1 – 1
• weight 2 – 1
• weight 4 – 2
• weight 8 – 3
• weight 16 – 2
• weight 32 – 2
• weight 64 – 2
4. Reduction layer 2:
• Add a full adder for weight 8, and half adders for weights 4, 16, 32, 64
5. Outputs:
• weight 1 – 1
• weight 2 – 1
• weight 4 – 1
• weight 8 – 2
• weight 16 – 2
• weight 32 – 2
• weight 64 – 2
• weight 128 – 1
6. Group the wires into a pair of integers and an adder to add them.

References

1. ^ Townsend, Whitney J.; Swartzlander, Earl E.; Abraham, Jacob A. (2003). "A comparison of Dadda and Wallace multiplier delays". Advanced Signal Processing Algorithms, Architectures, and Implementations XIII. 5205: 552–560. doi:10.1117/12.507012. ISSN 0277-786X.
2. ^ Wallace, Christopher Stewart (February 1964). "A suggestion for a fast multiplier". IEEE Transactions on Electronic Computers. EC-13 (1): 14–17.
3. ^ Bohsali, Mounir; Doan, Michael (2010). "Rectangular Styled Wallace Tree Multipliers" (PDF). Archived from the original (PDF) on 2010-02-15.
4. ^ "Introduction". 8x8 Booth Encoded Wallace-tree multiplier. Tufts university. 2007. Archived from the original on 2010-06-17.
5. ^ Weems Jr., Charles C. (2001) [1995]. "CmpSci 535 Discussion 7: Number Representations". Amherst: University of Massachusetts. Archived from the original on 2011-02-06.