Math : Novel Methods of Integer Multiplication and Division

Also:

24 x 24 Multiplication algorithm

        x5:0 = x2:0 * y2:0,

x2:0 - input1,
y2:0 - input2,
x5:0 - output,
0 - least significant byte.

1) clear sum x5:3 = 0
2) clear carry
3) shift x2:0 right

4) if carry set, add y2:0 to x5:3
5) shift x5:0 right (carry from previous addition shifted to MSb and
next bit of input to carry)
6) repeat 23 more times from 4

48 / 24 Division algorithm

        x5:0 = x5:0 / y2:0

1) Check y2:0 for zero
2) Clear temporary Temp2:0 = 0
3) Set up counter Counter = 48

4) Shift left x5:0
5) Shift left Temp2:0 to get next bit from x5:0
6) Save carry (MSb of Temp might be set), e.g. in Counter<7>
7) Substract y2:0 from Temp2:0
8) If carry=1 (no borrow), set Counter<7>
9) If Counter<7>=0 (borrow and Msb of Temp was zero), add y2:0 to Temp2:0 to
restore it
10) carry=Counter<7> (next bit of result)
11) clear Counter<7>
12) decrement Counter, and repeat from 4 if not zero
13) Final shift left x5:0

Since result is known to fit in 3 bytes, then first 24 loops can be skipped. It would save also 3 temporary bytes Temp2:0.

Novel Methods of Integer Multiplication and Division

G. Reichborn-Kjennerud
Tandberg Data A/S
POB 9, Korsvoll, Oslo 8
Norway

You may be familiar with the method of multiplication, variously alleged to be of Kenyan, Russian, or even Himalayan origin, in which you repeatedly halve the multiplicand and double the multiplier until the multiplicand becomes 1. Then the sum of those multipliers that have a multiplicand counterpart of odd value becomes the product. This sounds complicated, but it's really not; table 1 shows an example.

**Table 1:** An example of the Kenyan double-and-halve
algorithm for integer multiplication.
Multiplicand	Multiplier	Partial Sum	Column (c) Expressed in Terms of Original Multiplier	Remainder of Division of Column (a) by 2
Procedure: Repeatedly halve the multiplicand (discarding re- mainders) and double the multiplier until the former is 1. For every odd multiplicand, add the respective multiplier.
Example: 44 x 51
(a)	(b)	(c)	(d)	(e)
44	51			0
22	102			0
11	204	204	4 x 51	1
5	408	408	8x51	1
2	816			0
1	1632	1632	32 x 51	1
	Total	2244 =	44 x 51	101100 is binary for 44

This algorithm readily lends itself to coding, as exemplified by the sequence in 8080 code shown in listing 1. Halving is done by shifting to the right, and the odd/even test is performed by checking the carry. Doubling is done by adding to itself using the DAD instruction, which is also used for summing up the output terms.

Repeated halving of a number and then noting the odd/even results is a nice way of finding the binary form of the number (the last bit found being the most significant one). It also tells something of the binary nature of the Kenyan method.

Listing 1: An implementation of the Kenyan algorithm for integer multiplication for the 8080 microprocessor.

;multiplibation program MULT

;input multiplication factors in HL and DE, one of which must.
;necessarily be an 8-bit number; if not, carry is set
;output product in DE, carry set if overflow.

;********************** Initial test to find 8-bit factor
MULT:	xra	a	;clear A
	ora	d	;is D zero?
	jz	found	;yes, DE number is 8-bit fabtor
	xra	a	;no, DE number was not 8-bit factor
	ora	h	;is H zero then?
	stc
	rnz		;no, return with carry set
	xchg		;yes, place 8-bit factor in DE
found:	mov	a,e	;transfer multiplicand to A

;********************** Multiplication starts in earnest
	lxi	d,0	;clear DE to receive output terms
	ana	a	;8-bit factor now in A; clear carry.
next:	rar		;halve the multiplicand;- result odd?
	jnc     even	;no, don't add multiplier term
	xchg		;yes, therefore,
	dad	d	;add multiplier (now in DE) to output
	rb		;overflow, carry set on return
	xchg		;put multiplier back. in HL
even:	ana	a	;already reached 1 by halving?
	rz		;Yes, retuPn with result, carry cleared
	dad	h	;no, double the multip.lier and
	jnc	next	;continue the process.
	ret		;overflow, carry set on return

Some time ago I became intrigued by the possibility of finding a procedure for division that was similar to the Kenyan method of multiplication. I came up with the following scheme: The divisor is repeatedly doubled until just less than the dividend, then successively subtracted from the dividend. Every time the subtraction operation gives a positive result, a 1 is noted; otherwise a O is recorded. Remarkably enough, the resultant sequence of 0's and 1's constitutes the quotient directly in binary form, as shown in table 2.

**Table 2:** An example of a new method of integer division
suitable for implementation on microprocessors without a
divide instruction
Procedure: Double the divisor until it is just less than the divi- dend. Then try to subtract the doubled divisors, starting with the largest, from the dividend. Note a 1 if the subtraction is possible otherwise, note a zero and do not perform the subtraction. The 1s and Os constitute the binary form of the quotient. To ob- tain the decimal form, multiply the latter digits with the corre- sponding terms in a power of 2 series, arranged in reverse order. The quotient is the sum of the resultant terms. To obtain decimal accuracy, multiply the dividend initially by an Nth power of 10. Then, after the division is complete, divide the quotient by the same power of 10 (moving the decimal point N places).
Example: 2246/51				Counter
Double:	51			0
	102			1
	204			2
	408			3
	816			4
	1632			5
Subtract:	2246
	-1632
	614	1	X 32 = 32	5
	-816
		0	x 16 = 0	4
	614
	-408
	206	1	x 8 = 8	3
	-204
	2	1	x 4 = 4	2
	-102
		0	x 2 = 0	1
	2
	-51
		0	x 1 = O	0
Remainder:	2
Quotient:			101100 = 44

Notice that the procedure is quite mechanical, with none of the trial-and-error search for the next correct quotient digit that is characteristic of the conventional method Furthermore, it lends itself beautifully to coding (see listing 2). There need be no 8-bit restrictions on any of the numbers; the dividend, divisor, quotient, and remainder can all be entered as 16-bit numbers.

Listing 2: An implementation of the author's integer-division algorithm for the 8080 microprocessor.

;division program DIVIDE

;input dividend in BC and input divisor in HL.
:output quotient in HL and output remainder in DE.
;carry set if division by zero

;********************** Test for division by zero and prepare
DIVIDE:	mov	a,h	;for reverse polarity subtraction
	ora
	stc
	rz		;division by zero; abort operation; carry set
	mov	a,b	;put 2's complement of BC +1 into DE for
	cma		;purposes of subtraction. (BC will be
	mov	d,a	;incremented to enable subtraction when minuend
	mov	a,c	;and subtrahend are having equal values).
	cma
	mov	e,a	;dividend in negative form now in DE
	inx	b	;BC +1; dividend incremented
        xra	a	;reset counter A and
	sta	quot	;clear the quotient buffer
	sta	quot+1	;(high-order part of quotient buffer)
	jmp	double	;start the division in earnest

;*********************** First phase: Doubling the divisor
restore:dad	b	;add back
double: inr	a	;increment counter
        push	h	;save divisor
        dad	h	;double it, but go to second phase if
        jc	change	;HL now is larger than dividend in B
	dad	d	;comparison with dividend by subtraction
	jnc	restore	;keep doubling unless HL now is larger than BC

;*********************** Second phase: Subtracting from the dividend
;                                      and accumulating quotient bits.

change: mov	b,a	;transfer count to new counter
subtrct:pop	h	;Fetch halved divisor as positive subtrahend
	dad	d	;subtract by using negative dividend as minuend
	jc	shiftc	;the carry bit becomes the quotient bit
	xchg		;equivalent of adding back if subtraction fails
shiftc:	cmc		;invert quotient bit from reverse polarity
	lda	quot	;shift quotient bits
	ral
	sta	quot	;and place into temporary storage
	lda	quot+l
	ral
	sta	quot+1
	dcr	b	;count-down finished?
	jnz	subtrct	;no, continue process
	lhld	quot	;yes, place output quotient in HL.
	mov	a,e	;change remainder in DE into proper polarity

	cma
	mov	e,a
	mov	a,d
	cma
	mov	d,a
	ret		;division operation completed

quot:	d	2	;buffer for evolving quotient

To handle 16-bit numbers, the add-to-itself DAD H instruction is used for doubling the divisor, and the necessary comparison with the dividend is accomplished by reverse-polarity addition, using the negative value of the dividend (in the DE register pair) and testing on the carry. Care is taken to restore the divisor before the next doubling by adding back the positive value (in the BC register pair). The doubled divisors are put in temporary storage by pushing them to the stack.

For the necessary subtraction of the doubled divisors from the dividend, reverse-polarity addition is used again. Luckily, the dividend is already present in negative form (in the DE register pair), and the divisors can be used in their existing positive form as they are popped from the stack for subtraction. The carry is then indicative of a positive or negative result, and for every subtraction, it is shifted into a register pair to form the final quotient. A counter sees to it that there are no more subtractions than there were doubling operations. The contents of the DE register pair constitute the remainder (in complemented form).

As we have seen, odd ways of multiplying and dividing can lead to useful code algorithms. But the reverse can also be true. Machine-code algorithms can lead to odd but perhaps not so useful manual methods.

First, consider a table used for multiplying by a fixed number K, based on using the 8080 DAD instruction (see table 3). The multiplicand is loaded into two register pairs (HL and DE), and the product is obtained by executing a sequence of DAD H and DAD D commands in the order given beneath each value of K (operand sequences for K=2 to K=32 have been included). DAD H doubles the accumulated multiplicand in the HL pair, and DAD D adds the original multiplicand to the HL pair.

Procedure: Input multiplicand in both HL and DE register pairs. Constant K is the multiplier. Then perform a series of DAD D and DAD H Instructions in the order given by the sequence of Ds and Hs under the given value of K. The final product will be in the HL register pair If every DAD instruction is followed by a test of carry (JC or RC), carry will be set in case of overflow.

K = 2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
DAD H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H  H
DAD    D  H  H  D  D  H  H  H  H  D  D  D  D  H  H  H  H  H  H  H  D  D  D  D  D  D  D  D  H  H
DAD          D  H  H  H  H  D  D  H  H  H  H  H  H  H  H  D  D  D  D  H  H  H  H  H  H  H  H  H
DAD                D     D  H  H  H  H  D  D  H  H  D  D  H  H  H  H  H  H  H  H  D  D  D  D  H
DAD                            D     D  H  H     D  H  H  H  H  D  D  H  H  D  D  H  H  H  H  H
DAD                                        D           D     D  H  H     D  H  H  H  H  D  D
DAD                                                                D           D     D  H  H
DAD                                                                                        D

Table 3: An algorithm for integer multiplication for 8080 microporcessors.

It seems natural to look for a general algorithm based on DAD Hs and DAD Ds. If you look hard at table 3, you'll see a familiar pattern emerge: the Hs and Ds actually represent K in binary form. The Os are represented by H, whereas the 1's are represented by H and D as a group. True, the most significant bit is missing, but that will always be a 1 anyway. As an example, consider K=19. The sequence is H H (H D) (H D), which translates into (1) 0 0 1 1.

Thus, we can multiply by shifting the multiplier and examining the carry. When carry is cleared, we perform a DAD H operation, and when it is set, we do both a DAD H and a DAD D. This gives us the code in listing 3

Listing 3: an implementation in 8080 assembly language of the integer-multiplication algorithm given in tables 3 and 4.

;multiplication program DADDY

;input multiplicand in DE and input multiplier in A
;output product in HL, carry set if overflow

;********************** Test for zero and leading zeroes, (8-bit
;                         factor already determined and placed in A)

DADDY:	lxi	h,0	;clear output product register
	mvi	b,8+1	;set bit counter
	ana	a	;is multiplier in A zero? (carry cleared)
	rz		;yes, skip multiplication operation; O in HL
skip:	dcr	b	;check multiplier bit
	ral		;leading zero?
	jnc	skip	;yes, ignore it and check next bit
	dad	d	;no, load HL with multiplicand in DE; carry
                                                            ; cleared
;********************** Multiplication starts in earnest
next:	dcr	b	;more multiplier bits?
	rz		;no, return with result in HL
	dad	h	;yes, do a DAD H, doubling the multiplicand
	rc		;overflow, carry set on return
	ral		;is the multiplier bit a l?
	jnc	next	;no, check the next bit
	dad	d	;yes, do a DAD D too, adding the initial
	jnc	next	;check the next bit multiplicand
	ret		;overflow, carry set on return

Now for the manual method that can be derived from this: Repeatedly halve the multipler until it becomes 1 (in order to find the binary form). Reverse the sequence of halved multipliers and ignore the 1. Repeatedly double the multiplicand. Whenever the corresponding halved multiplier is odd, add also the original multiplicand to the accumulated doubled multiplicands; table 4 gives us an example of this method. Oh well, not everything is progress. But then, progress isn't everything.

**table 4:** *An example of manual implementation of the algorithm of table 3.*
Procedure: Repeatedly halve the multiplier (discarding remainders) until you reach 1. Ignore the 1 and arrange the resultant halved multipliers vertically in reverse order. For each halved multiplier, double the multiplicand. Add also the initial multiplicand if the halved multiplier is an odd number.
Example: 44 x 51 Repeatedly halve the multiplier: 51 25 12 6 3

odd/even halved multipliers:	Resultant	Comment
	44
	44	Double the multiplicand by adding to itself
3	+44	Add initial multiplicand
	132
	132
6	+ 0	Don't add initial multiplicand
	264
	264
12	+0
	528
	528
25	+ 44
	1100
	1100
51	+ 44
Final product:	2244

Comments:

Interested:

Questions:

-Remove-curiousKILLspam at KILLspambwv190.internetdsl.tpnet.pl asks:
how about 'vedic' multiplication methods? while not so beautifull without parallelism (i.e. fpga array) , they are still interesting, and allow performing multiplication of large numbers in just few steps (i.e. 6 clock cycles for 64bit mul. if fpga is used)
simplest way to perform it is to multiply bit like one multiplies on paper, so i.e.
```
      101011
*       0010
------------
      000000
+    101011
+   000000
-------------
    01010110
```
notice that while CPU has to repeat the 'shift and multiply'
for each bit of the multiplier , fpga can do it in parallel in just one cycle (shift is just adressing to destination register, moving data there (including masking by multiplier) - takes just one cycle , actually one 'slope' , as not even full 'cycle' is needed.)
then we have $multiplier_bit_count_size array of sums to make.
for 64bit multiplier, this would equal to as many as 64 operations, so 64 cycles, but we can once again try to be smart the 'vedic' way; addition operations can be split into parallel operations :
a+b+c+d=(a+b)+(c+d)
which means when we have addition list resulting from multiply:
```
        0110
*       0101
------------
        0110 (a)
       0000  (b)
      0110   (c)
+    0000    (d)
------------
```
we can add it in pretty much any order i.e. (0110 + 00000) + (011000 + 0000000) according to the above rule . This mean we can quickly add each pair , which makes it 32 parallel operations per first clock cyle, then 16 for 2nd, 8 for 3rd, 4, for 4th , 2 for 5th , and voila.
Also if extra logic would be use to detect 'pairs' which will unlikely cause any overflow in the addition (i.e. 001 + 010) they could be instantly merged to peform (a+b)+c additions, as it does not require involving additional 'overflow sum' cycle. We can group our adds to ones which will unlikely influence each other.
The order can be even more free, if we have logic allowing us to detect zeros, and choose only 'non zero' substrate, allowing to further reduce amount of operations (though at cost of non-predictable lenght of operation and more logic complexity)
Including the masked preloading of the array mentioned earlier, this all equals to just 6 cycles, and of those, just two involve ALU bus (fetching number and multiplier is one, and placing final addition result in destination is another)

so assuming clocking the MUL array 6x faster than the ALU bus (which is quite doable in cmos, assuming we talk about up to ~1ghz speeds), we can practicaly deliver MUL instruction in just 2 ALU bus clock cycles, while if registers can be independent (separate ALU bus to result register) - just one clock cycle.

In asm it takes bit more looping unfortunatelly , but it still makes practical method for multiplying of insanely large (64bit and more) numbers quite a breeze.
+

file: /Techref/method/math/muldiv.htm, 29KB, , updated: 2024/9/1 14:42, local time: 2025/4/25 23:16,

^{3.133.116.221:LOG IN}

©2025 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions?

Please DO link to this page! Digg it! / MAKE!

<A HREF="http://linistepper.com/Techref/method/math/muldiv.htm"> Novel Methods of Integer Multiplication and Division</A>

Did you find what you needed?