ramblings

This commit is contained in:
Byron Lathi
2025-07-14 11:10:43 -07:00
parent 2b57079205
commit 80e3faeae6
6 changed files with 131 additions and 21 deletions

View File

@@ -105,4 +105,49 @@ quarter round module can have 2 different blocks going through it at once.
The new one multiplexes 4 quarter rounds between 1 QR module which reduces the
logic usage down to only 46k le, of which the vast majority is flops (2k ff per round,
0.5k lut)
0.5k lut)
# Modulo 2^130-5
We can use the trick here to do modulo reduction much faster.
If we split the bits at 2^130, leaving 129 high bits and 130 low bits, we now
have a 129 bit value multiplied by 2^130, plus the 130 bit value. We know that
2^130 mod 2^130-5 is 5, so we can replace that 2^130 with 5 and add, then
repeat that step again.
Ex.
x = x1*2^130 + x2
x mod 2^130-5 = x1*5 + x2 -> x1*5+x2 = x3
x mod 2^130-5 = x3*2^130 + x4
x mod 2^130-5 = x3*5+x4
and lets do the math to verify that we only need two rounds. The maximum value
that we could possible get is 2^131-1 and the maxmimum value for R is
0x0ffffffc0ffffffc0ffffffc0fffffff. Multiplying these together gives us
0x7fffffe07fffffe07fffffe07ffffff7f0000003f0000003f0000003f0000001.
Applying the first round to this we get
0x1ffffff81ffffff81ffffff81ffffffd * 5 + 0x3f0000003f0000003f0000003f0000001
= 0x48fffffdc8fffffdc8fffffdc8ffffff2
applying the second round to this we get
1 * 5 + 0x8fffffdc8fffffdc8fffffdc8ffffff2 = 0x8fffffdc8fffffdc8fffffdc8ffffff7
and this is indeed the correct answer. The bottom part is 130 bits but since we
put in the max values and it didn't overflow, I don't think it will overflow here.
131+128 = 259 bits, only have to do this once
0xb83fe991ca75d7ef2ab5cba9cccdfd938b73fff384ac90ed284034da565ecf
0x19471c3e3e9c1bfded81da3736e96604a
Kind of curious now, at what point does a ripple carry adder using dedicated
CI/CO ports become slower then a more complex adder like carry lookahead or
carry save (wallace tree)