Also:
See also
DST = ((SRC ) + SRC) ; /* times 2 */ DST = ((SRC << 1) + SRC) ; /* times 3 */ DST = ((SRC ) + SRC) << 1; /* times 4 */ DST = ((SRC << 2) + SRC) ; /* times 5 */ DST = ((SRC << 1) + SRC) << 1; /* times 6 */ DST = ((SRC ) + SRC) << 2; /* times 8 */ DST = ((SRC << 3) + SRC) ; /* times 9 */ DST = ((SRC << 2) + SRC) << 1; /* times 10 */ DST = ((SRC << 1) + SRC) << 2; /* times 12 */The following added for completness
DST = ((((SRC << 1) + SRC) << 1) + SRC); /* times 7 */
Notation:
Nikolai Golovchenko says:
24 x 24 Multiplication algorithm
x5:0 = x2:0 * y2:0, x2:0 - input1, y2:0 - input2, x5:0 - output, 0 - least significant byte. 1) clear sum x5:3 = 0 2) clear carry 3) shift x2:0 right 4) if carry set, add y2:0 to x5:3 5) shift x5:0 right (carry from previous addition shifted to MSb and next bit of input to carry) 6) repeat 23 more times from 448 / 24 Division algorithm
x5:0 = x5:0 / y2:0 1) Check y2:0 for zero 2) Clear temporary Temp2:0 = 0 3) Set up counter Counter = 48 4) Shift left x5:0 5) Shift left Temp2:0 to get next bit from x5:0 6) Save carry (MSb of Temp might be set), e.g. in Counter<7> 7) Substract y2:0 from Temp2:0 8) If carry=1 (no borrow), set Counter<7> 9) If Counter<7>=0 (borrow and Msb of Temp was zero), add y2:0 to Temp2:0 to restore it 10) carry=Counter<7> (next bit of result) 11) clear Counter<7> 12) decrement Counter, and repeat from 4 if not zero 13) Final shift left x5:0Since result is known to fit in 3 bytes, then first 24 loops can be skipped. It would save also 3 temporary bytes Temp2:0.
G. Reichborn-Kjennerud
Tandberg Data A/S
POB 9, Korsvoll, Oslo 8
Norway
You may be familiar with the method of multiplication, variously alleged to be of Kenyan, Russian, or even Himalayan origin, in which you repeatedly halve the multiplicand and double the multiplier until the multiplicand becomes 1. Then the sum of those multipliers that have a multiplicand counterpart of odd value becomes the product. This sounds complicated, but it's really not; table 1 shows an example.
Procedure: Repeatedly halve the multiplicand (discarding re- mainders) and double the multiplier until the former is 1. For every odd multiplicand, add the respective multiplier. |
|||||||
Example: 44 x 51 | |||||||
Multiplicand | Multiplier | Partial Sum |
Column (c) Expressed in Terms of Original Multiplier |
Remainder of Division of Column (a) by 2 |
|||
---|---|---|---|---|---|---|---|
(a) |
(b) |
(c) |
(d) |
(e) |
|||
44 |
51 |
|
|
0 |
|||
22 |
102 |
|
|
0 |
|||
11 |
204 |
204 |
4 x 51 |
1 |
|||
5 |
408 |
408 |
8x51 |
1 |
|||
2 |
816 |
|
|
0 |
|||
1 |
1632 |
1632 |
32 x 51 |
1 |
|||
Total |
2244 = |
44 x 51 |
101100 is binary for 44 |
||||
|
|
|
This algorithm readily lends itself to coding, as exemplified by the sequence in 8080 code shown in listing 1. Halving is done by shifting to the right, and the odd/even test is performed by checking the carry. Doubling is done by adding to itself using the DAD instruction, which is also used for summing up the output terms.
Repeated halving of a number and then noting the odd/even results is a nice way of finding the binary form of the number (the last bit found being the most significant one). It also tells something of the binary nature of the Kenyan method.
Listing 1: An implementation of the Kenyan algorithm for integer multiplication for the 8080 microprocessor.
;multiplibation program MULT ;input multiplication factors in HL and DE, one of which must. ;necessarily be an 8-bit number; if not, carry is set ;output product in DE, carry set if overflow. ;********************** Initial test to find 8-bit factor MULT: xra a ;clear A ora d ;is D zero? jz found ;yes, DE number is 8-bit fabtor xra a ;no, DE number was not 8-bit factor ora h ;is H zero then? stc rnz ;no, return with carry set xchg ;yes, place 8-bit factor in DE found: mov a,e ;transfer multiplicand to A ;********************** Multiplication starts in earnest lxi d,0 ;clear DE to receive output terms ana a ;8-bit factor now in A; clear carry. next: rar ;halve the multiplicand;- result odd? jnc even ;no, don't add multiplier term xchg ;yes, therefore, dad d ;add multiplier (now in DE) to output rb ;overflow, carry set on return xchg ;put multiplier back. in HL even: ana a ;already reached 1 by halving? rz ;Yes, retuPn with result, carry cleared dad h ;no, double the multip.lier and jnc next ;continue the process. ret ;overflow, carry set on return
Some time ago I became intrigued by the possibility of finding a procedure for division that was similar to the Kenyan method of multiplication. I came up with the following scheme: The divisor is repeatedly doubled until just less than the dividend, then successively subtracted from the dividend. Every time the subtraction operation gives a positive result, a 1 is noted; otherwise a O is recorded. Remarkably enough, the resultant sequence of 0's and 1's constitutes the quotient directly in binary form, as shown in table 2.
Procedure: Double the divisor until it is just less than the
divi- dend. Then try to subtract the doubled divisors, starting with the largest, from the dividend. Note a 1 if the subtraction is possible otherwise, note a zero and do not perform the subtraction.
The 1s and Os constitute the binary form of the quotient. To ob-
To obtain decimal accuracy, multiply the dividend initially by an |
||||
Example: 2246/51 | Counter | |||
Double: | 51 |
0 | ||
102 |
1 | |||
204 |
2 | |||
408 |
3 | |||
816 |
4 | |||
1632 |
5 | |||
Subtract: | 2246 |
|||
-1632 |
||||
614 |
1 | X 32 = 32 | 5 | |
-816 |
||||
0 | x 16 = 0 | 4 | ||
614 |
||||
-408 |
||||
206 |
1 | x 8 = 8 | 3 | |
-204 |
||||
2 |
1 | x 4 = 4 | 2 | |
-102 |
||||
0 | x 2 = 0 | 1 | ||
2 |
||||
-51 |
||||
0 | x 1 = O | 0 | ||
Remainder: | 2 |
|||
Quotient: | 101100 = 44 | |||
Notice that the procedure is quite mechanical, with none of the trial-and-error search for the next correct quotient digit that is characteristic of the conventional method Furthermore, it lends itself beautifully to coding (see listing 2). There need be no 8-bit restrictions on any of the numbers; the dividend, divisor, quotient, and remainder can all be entered as 16-bit numbers.
Listing 2: An implementation of the author's integer-division algorithm for the 8080 microprocessor.
;division program DIVIDE ;input dividend in BC and input divisor in HL. :output quotient in HL and output remainder in DE. ;carry set if division by zero ;********************** Test for division by zero and prepare DIVIDE: mov a,h ;for reverse polarity subtraction ora stc rz ;division by zero; abort operation; carry set mov a,b ;put 2's complement of BC +1 into DE for cma ;purposes of subtraction. (BC will be mov d,a ;incremented to enable subtraction when minuend mov a,c ;and subtrahend are having equal values). cma mov e,a ;dividend in negative form now in DE inx b ;BC +1; dividend incremented xra a ;reset counter A and sta quot ;clear the quotient buffer sta quot+1 ;(high-order part of quotient buffer) jmp double ;start the division in earnest ;*********************** First phase: Doubling the divisor restore:dad b ;add back double: inr a ;increment counter push h ;save divisor dad h ;double it, but go to second phase if jc change ;HL now is larger than dividend in B dad d ;comparison with dividend by subtraction jnc restore ;keep doubling unless HL now is larger than BC ;*********************** Second phase: Subtracting from the dividend ; and accumulating quotient bits. change: mov b,a ;transfer count to new counter subtrct:pop h ;Fetch halved divisor as positive subtrahend dad d ;subtract by using negative dividend as minuend jc shiftc ;the carry bit becomes the quotient bit xchg ;equivalent of adding back if subtraction fails shiftc: cmc ;invert quotient bit from reverse polarity lda quot ;shift quotient bits ral sta quot ;and place into temporary storage lda quot+l ral sta quot+1 dcr b ;count-down finished? jnz subtrct ;no, continue process lhld quot ;yes, place output quotient in HL. mov a,e ;change remainder in DE into proper polarity cma mov e,a mov a,d cma mov d,a ret ;division operation completed quot: d 2 ;buffer for evolving quotient
To handle 16-bit numbers, the add-to-itself DAD H instruction is used for doubling the divisor, and the necessary comparison with the dividend is accomplished by reverse-polarity addition, using the negative value of the dividend (in the DE register pair) and testing on the carry. Care is taken to restore the divisor before the next doubling by adding back the positive value (in the BC register pair). The doubled divisors are put in temporary storage by pushing them to the stack.
For the necessary subtraction of the doubled divisors from the dividend, reverse-polarity addition is used again. Luckily, the dividend is already present in negative form (in the DE register pair), and the divisors can be used in their existing positive form as they are popped from the stack for subtraction. The carry is then indicative of a positive or negative result, and for every subtraction, it is shifted into a register pair to form the final quotient. A counter sees to it that there are no more subtractions than there were doubling operations. The contents of the DE register pair constitute the remainder (in complemented form).
As we have seen, odd ways of multiplying and dividing can lead to useful code algorithms. But the reverse can also be true. Machine-code algorithms can lead to odd but perhaps not so useful manual methods.
First, consider a table used for multiplying by a fixed number K, based on using the 8080 DAD instruction (see table 3). The multiplicand is loaded into two register pairs (HL and DE), and the product is obtained by executing a sequence of DAD H and DAD D commands in the order given beneath each value of K (operand sequences for K=2 to K=32 have been included). DAD H doubles the accumulated multiplicand in the HL pair, and DAD D adds the original multiplicand to the HL pair.
Procedure: Input multiplicand in both HL and DE register pairs. Constant K is the multiplier. Then perform a series of DAD D and DAD H Instructions in the order given by the sequence of Ds and Hs under the given value of K. The final product will be in the HL register pair If every DAD instruction is followed by a test of carry (JC or RC), carry will be set in case of overflow.
K = 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 DAD H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H H DAD D H H D D H H H H D D D D H H H H H H H D D D D D D D D H H DAD D H H H H D D H H H H H H H H D D D D H H H H H H H H H DAD D D H H H H D D H H D D H H H H H H H H D D D D H DAD D D H H D H H H H D D H H D D H H H H H DAD D D D H H D H H H H D D DAD D D D H H DAD D
Table 3: An algorithm for integer multiplication for 8080 microporcessors.
It seems natural to look for a general algorithm based on DAD Hs and DAD Ds. If you look hard at table 3, you'll see a familiar pattern emerge: the Hs and Ds actually represent K in binary form. The Os are represented by H, whereas the 1's are represented by H and D as a group. True, the most significant bit is missing, but that will always be a 1 anyway. As an example, consider K=19. The sequence is H H (H D) (H D), which translates into (1) 0 0 1 1.
Thus, we can multiply by shifting the multiplier and examining the carry. When carry is cleared, we perform a DAD H operation, and when it is set, we do both a DAD H and a DAD D. This gives us the code in listing 3
Listing 3: an implementation in 8080 assembly language of the integer-multiplication algorithm given in tables 3 and 4.
;multiplication program DADDY ;input multiplicand in DE and input multiplier in A ;output product in HL, carry set if overflow ;********************** Test for zero and leading zeroes, (8-bit ; factor already determined and placed in A) DADDY: lxi h,0 ;clear output product register mvi b,8+1 ;set bit counter ana a ;is multiplier in A zero? (carry cleared) rz ;yes, skip multiplication operation; O in HL skip: dcr b ;check multiplier bit ral ;leading zero? jnc skip ;yes, ignore it and check next bit dad d ;no, load HL with multiplicand in DE; carry ; cleared ;********************** Multiplication starts in earnest next: dcr b ;more multiplier bits? rz ;no, return with result in HL dad h ;yes, do a DAD H, doubling the multiplicand rc ;overflow, carry set on return ral ;is the multiplier bit a l? jnc next ;no, check the next bit dad d ;yes, do a DAD D too, adding the initial jnc next ;check the next bit multiplicand ret ;overflow, carry set on return
Now for the manual method that can be derived from this: Repeatedly halve the multipler until it becomes 1 (in order to find the binary form). Reverse the sequence of halved multipliers and ignore the 1. Repeatedly double the multiplicand. Whenever the corresponding halved multiplier is odd, add also the original multiplicand to the accumulated doubled multiplicands; table 4 gives us an example of this method. Oh well, not everything is progress. But then, progress isn't everything.
Procedure: Repeatedly halve the multiplier (discarding remainders)
until you reach 1. Ignore the 1 and arrange the resultant halved multipliers vertically in reverse order. For each halved multiplier, double the multiplicand. Add also the initial multiplicand if the halved multiplier is an odd number. |
||
Example: 44 x 51
Repeatedly halve the multiplier: 51 25 12 6 3 |
||
odd/even halved multipliers: |
Resultant |
Comment |
|
44 |
|
|
44 |
Double the multiplicand by adding to itself |
3 |
+44 |
Add initial multiplicand |
|
132 |
|
132 |
||
6 |
+ 0 |
Don't add initial multiplicand |
264 |
||
264 |
||
12 |
+0 |
|
528 |
||
528 |
||
25 |
+ 44 |
|
1100 |
||
1100 |
||
51 |
+ 44 |
|
Final product: | 2244 |
|
Comments:
Interested:
Questions:
how about 'vedic' multiplication methods? while not so beautifull without parallelism (i.e. fpga array) , they are still interesting, and allow performing multiplication of large numbers in just few steps (i.e. 6 clock cycles for 64bit mul. if fpga is used)+
simplest way to perform it is to multiply bit like one multiplies on paper, so i.e.
101011 * 0010 ------------ 000000 + 101011 + 000000 ------------- 01010110
notice that while CPU has to repeat the 'shift and multiply'
for each bit of the multiplier , fpga can do it in parallel in just one cycle (shift is just adressing to destination register, moving data there (including masking by multiplier) - takes just one cycle , actually one 'slope' , as not even full 'cycle' is needed.)
then we have $multiplier_bit_count_size array of sums to make.
for 64bit multiplier, this would equal to as many as 64 operations, so 64 cycles, but we can once again try to be smart the 'vedic' way; addition operations can be split into parallel operations :a+b+c+d=(a+b)+(c+d)
which means when we have addition list resulting from multiply:
0110 * 0101 ------------ 0110 (a) 0000 (b) 0110 (c) + 0000 (d) ------------we can add it in pretty much any order i.e. (0110 + 00000) + (011000 + 0000000) according to the above rule . This mean we can quickly add each pair , which makes it 32 parallel operations per first clock cyle, then 16 for 2nd, 8 for 3rd, 4, for 4th , 2 for 5th , and voila.
Also if extra logic would be use to detect 'pairs' which will unlikely cause any overflow in the addition (i.e. 001 + 010) they could be instantly merged to peform (a+b)+c additions, as it does not require involving additional 'overflow sum' cycle. We can group our adds to ones which will unlikely influence each other.
The order can be even more free, if we have logic allowing us to detect zeros, and choose only 'non zero' substrate, allowing to further reduce amount of operations (though at cost of non-predictable lenght of operation and more logic complexity)
Including the masked preloading of the array mentioned earlier, this all equals to just 6 cycles, and of those, just two involve ALU bus (fetching number and multiplier is one, and placing final addition result in destination is another)
so assuming clocking the MUL array 6x faster than the ALU bus (which is quite doable in cmos, assuming we talk about up to ~1ghz speeds), we can practicaly deliver MUL instruction in just 2 ALU bus clock cycles, while if registers can be independent (separate ALU bus to result register) - just one clock cycle.
In asm it takes bit more looping unfortunatelly , but it still makes practical method for multiplying of insanely large (64bit and more) numbers quite a breeze.
file: /Techref/method/math/muldiv.htm, 29KB, , updated: 2024/9/1 13:42, local time: 2025/1/27 07:30,
18.217.118.156:LOG IN
|
©2025 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions? <A HREF="http://linistepper.com/Techref/method/math/muldiv.htm"> Novel Methods of Integer Multiplication and Division</A> |
Did you find what you needed? |