Compare commits
10 Commits
2b8286d180
...
2fd1136154
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
2fd1136154 | ||
|
|
06d5949aa7
|
||
|
|
003527ee0d
|
||
|
|
fd50ecc4f0
|
||
|
|
faef39c4d3
|
||
|
|
5e3b7be854
|
||
|
|
d9651e9074
|
||
|
|
80e3faeae6
|
||
|
|
2b57079205
|
||
|
|
7f91a8af32
|
@@ -1,108 +1,36 @@
|
||||
# Notes
|
||||
# Overall Notes
|
||||
|
||||
Since we are designing this for a 64 bit datapath, we need to be able to
|
||||
compute 64 bits every cycle. The ChaCha20 hash works on groups of 16x32, or
|
||||
512-bit blocks at a time. Logically it might make more sense to have a datapath
|
||||
of 128 bits.
|
||||
We need to support 25Gbps, and we will have 2 datapaths, tx and rx
|
||||
|
||||
On the other hand, each operation is a 32 bit operation. It might make more
|
||||
sense for timing reasons then to have each operation registered. But will this
|
||||
be able to match the throughput that we need?
|
||||
|
||||
Each quarter round generates 4 words. Each cycle updates all 128 bits at once.
|
||||
We can do 4 of the quarter rounds at once, so at the end of each cycle we will
|
||||
generate 512 bits.
|
||||
|
||||
At full speed then, the core would generate 512 bits per cycle. but we would
|
||||
only need to generate 64 bits per cycle. We could only do 1 quarter cycle at
|
||||
once, which would only generate 128 bits per cycle, but we would need some sort
|
||||
of structure to reorder the state such that it is ready to xor with the
|
||||
incoming data. We could even make this parameterizable, but that would be the
|
||||
next step if we actually need to support 100Gbps encryption.
|
||||
|
||||
So in summary, we will have a single QuarterRound module which generates 128
|
||||
bits of output. We will have a scheduling block which schedules which 4 words
|
||||
of state go into the quarter round module, and a de-interleaver which takes the
|
||||
output from the quarter round module and re-orders it to be in the correct
|
||||
order to combine with the incoming data. there is also an addition in there
|
||||
somewhere.
|
||||
|
||||
To support AEAD, The first round becomes the key for the Poly1305 block. This
|
||||
can be done in parallel with the second round, which becomes the cipher, at the
|
||||
expense of double the gates. Otherwise, there would be a delay in between
|
||||
packets as this is generated.
|
||||
at 128 bit datapth, this is 200MHz, but lets aim for 250MHz
|
||||
|
||||
|
||||
Okay so we did some timing tests and we can easily do 1 round of ChaCha20 in a
|
||||
single cycle on a Titanium FPGA at 250MHz (~350-400 MHz)
|
||||
# ChaCha20 Notes
|
||||
|
||||
So then it will take 20 cycles to calculate 512 bits, or 25.6 bits/cycle, or
|
||||
6.4Gbps. So then we will need 2 of these for 10Gbps.
|
||||
|
||||
So in order to use multiple cores, we would calculate 1024 bits in 20 cycles.
|
||||
Then we would put those bits into a memory or something and start calculating
|
||||
the next 1024 bits. Those bits would all be used up in 16 cycles, (but the
|
||||
throughput still checks out). Once they are used, we load the memory with the
|
||||
new output.
|
||||
|
||||
This puts a 20 cycle minimum on small packets since the core is not completely
|
||||
pipelined. This puts a hard cap at 12.5Mpps. At 42 byte packets, this is
|
||||
4.2Gbps, and for 64 byte packets is 6.4Gbps. In order to saturate the link, you
|
||||
would need packets of at least 100 bytes.
|
||||
|
||||
This is with the 20 cycle minimum, though in reality it would be more like 25
|
||||
or 30 with the final addition, scheduling, pipelining etc. Adding more cores
|
||||
increases the throughput for larger packets, but does nothing for small packets
|
||||
since the latency is the same. To solve this, we could instantiate the entire
|
||||
core twice, such that we could handle 2 minimum size packets at the same time.
|
||||
|
||||
If we say there is a 30 cycle latency, the worst case is 2.8Gbps. Doubling the
|
||||
number of cores gives 5.6, quadrupling the number of cores gives 11.2Gbps. This
|
||||
would of course more than quadrouple the area since we need 4x the cores as
|
||||
well as the mux and demux between them.
|
||||
|
||||
This could be configurable at compile time though. The number of ChaChas per
|
||||
core would also be configurable, but at the moment I choose 2.
|
||||
|
||||
Just counting the quarter rounds, there are 4\*2\*4 = 32 QR modules, or 64 if
|
||||
we want to 8 QRs per core instead of 4 for timing reasons.
|
||||
|
||||
Each QR is 322 XLR, so just the QR would be either 10k or 20k XLR.. That's kind
|
||||
of a lot. A fully pipelined design would use 322\*20\*4 or 25k XLR. If we can
|
||||
pass timing using 10k luts than that would be nice. We get a peak throughput
|
||||
of 50Gbps, its just that the latency kills our packet rate. If we reduce the
|
||||
latency to 25 cycles and have 2 alternating cores, our packet rate would be
|
||||
20Mpps, increasing with every cycle we take off. I think that is good. This
|
||||
would result in 5k XLR which is not so bad.
|
||||
Chacha20 operates on 512 bit blocks. Each round is made of 4 quarter
|
||||
rounds, which are the same ecept for which 32 bit is used. We can
|
||||
use the same 32 bit quarter round 4 times in a row, but we need to
|
||||
store the rest of the round between operations, so memory usage
|
||||
might be similar to if we just did all 4 at once, but the logic
|
||||
would only be 25% as much. Because we switch between odd and even
|
||||
rounds, the data used in one round is not the data used in the other
|
||||
round.
|
||||
|
||||
|
||||
Okay so starting over now, our clock speed cannot be 250MHz, the best we can do
|
||||
is 200MHz. If we assume this same 25 cycle latency, thats 4Gbps per block, so
|
||||
we would need 3 of them to surpass 10Gbps (each is 4096) so now we need 3 blocks
|
||||
instead of 2.
|
||||
# Poly1305
|
||||
|
||||
We are barely going to be able to pass at 180MHz. maybe the fully pipelined
|
||||
core is a better idea, but we can just fully pipeline a quarter stage, and
|
||||
generate 512 bits every 4 clock cycles. This would give us a theoretical
|
||||
throughput of 32Gbps, and we would not have to worry about latency and small
|
||||
packets slowing us down. Lets experiment with what that would look like.
|
||||
## Parallel Operation
|
||||
|
||||
For our single round its using 1024 adders, which almost sounds like it is
|
||||
instantiating 8 quarter rounds instead of just 4. Either way, we can say that
|
||||
a quarter round is 128ff + 128add + 250lut.
|
||||
We can calculate in parallel but we need to calculate r^n, where n is the number of
|
||||
parallel stages. Ideally we would have the number of parallel stages be equal to the
|
||||
latency of the full stage, that way we could have it be fully pipelined. For
|
||||
example, if it took 8 cycles per block, we would have 8 parallel calculations. This
|
||||
requires you to calculate r^n, as well as every intermediate value. If we do 8,
|
||||
|
||||
So pipelining 20 of these gives 10k luts. Not so bad.
|
||||
then we need to calculate r^1, r^2, r^3, etc. This takes log2(n) multiplies (right?)
|
||||
|
||||
we need
|
||||
|
||||
Actualyl its 88k luts... its 512ff * 4 * 20 = 40k ff
|
||||
|
||||
Lets just leave it for now even if its overkill. The hardware would support up to
|
||||
40Gbps, and technically the FPGA has 16 lanes so could do 160Gbps in total, if
|
||||
we designed a custom board for it (or 120 if we used FMC connectors).
|
||||
|
||||
If we only use a single quarter round multiplexed between all 4, then the same
|
||||
quarter round module can have 2 different blocks going through it at once.
|
||||
|
||||
The new one multiplexes 4 quarter rounds between 1 QR module which reduces the
|
||||
logic usage down to only 46k le, of which the vast majority is flops (2k ff per round,
|
||||
0.5k lut)
|
||||
r\*r = r^2
|
||||
r\*r^2 = r^3 r^2\*r^2 = r^4
|
||||
r^4\*r = r^5 r^2\*r^4 = r^6 r^3\*r^4 = r^7 r^4\*r^4 = r^8
|
||||
146
ChaCha20_Poly1305_64/doc/poly1305.drawio
Normal file
146
ChaCha20_Poly1305_64/doc/poly1305.drawio
Normal file
@@ -0,0 +1,146 @@
|
||||
<mxfile host="Electron" agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/26.2.2 Chrome/134.0.6998.178 Electron/35.1.2 Safari/537.36" version="26.2.2">
|
||||
<diagram name="Page-1" id="gIy_vrPza4QP03Kn0wfk">
|
||||
<mxGraphModel dx="655" dy="442" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
|
||||
<root>
|
||||
<mxCell id="0" />
|
||||
<mxCell id="1" parent="0" />
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-24" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.25;exitDx=0;exitDy=0;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-1" target="GA09nmFLpfHeItamLD5O-21">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-25" value="r" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-24">
|
||||
<mxGeometry x="0.5579" y="-1" relative="1" as="geometry">
|
||||
<mxPoint x="9" y="5" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-35" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;exitX=1;exitY=0.75;exitDx=0;exitDy=0;entryX=0.5;entryY=1;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-1" target="GA09nmFLpfHeItamLD5O-34">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-38" value="s" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-35">
|
||||
<mxGeometry x="-0.6624" y="1" relative="1" as="geometry">
|
||||
<mxPoint x="-9" y="-9" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-1" value="r/s" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
|
||||
<mxGeometry x="360" y="200" width="80" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-2" value="" style="endArrow=classic;html=1;rounded=0;entryX=0;entryY=0.25;entryDx=0;entryDy=0;" edge="1" parent="1" target="GA09nmFLpfHeItamLD5O-1">
|
||||
<mxGeometry width="50" height="50" relative="1" as="geometry">
|
||||
<mxPoint x="320" y="210" as="sourcePoint" />
|
||||
<mxPoint x="410" y="270" as="targetPoint" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-3" value="otk" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-2">
|
||||
<mxGeometry x="-0.3946" y="1" relative="1" as="geometry">
|
||||
<mxPoint x="-22" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-10" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-4" target="GA09nmFLpfHeItamLD5O-6">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-4" value="64-&gt;128" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
|
||||
<mxGeometry x="175" y="130" width="50" height="20" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-5" value="" style="endArrow=classic;html=1;rounded=0;entryX=0;entryY=0.5;entryDx=0;entryDy=0;" edge="1" parent="1" target="GA09nmFLpfHeItamLD5O-4">
|
||||
<mxGeometry width="50" height="50" relative="1" as="geometry">
|
||||
<mxPoint x="120" y="140" as="sourcePoint" />
|
||||
<mxPoint x="290" y="110" as="targetPoint" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-15" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-6" target="GA09nmFLpfHeItamLD5O-14">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-40" value="data_one_extended" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-15">
|
||||
<mxGeometry x="-0.3532" y="-1" relative="1" as="geometry">
|
||||
<mxPoint x="7" y="29" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-6" value="bit add" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;" vertex="1" parent="1">
|
||||
<mxGeometry x="240" y="120" width="40" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-8" value="" style="endArrow=classic;html=1;rounded=0;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="1" target="GA09nmFLpfHeItamLD5O-6">
|
||||
<mxGeometry width="50" height="50" relative="1" as="geometry">
|
||||
<mxPoint x="260" y="80" as="sourcePoint" />
|
||||
<mxPoint x="290" y="70" as="targetPoint" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-9" value="tkeep" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-8">
|
||||
<mxGeometry x="-0.699" relative="1" as="geometry">
|
||||
<mxPoint y="-16" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-11" value="P" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
|
||||
<mxGeometry x="540" y="180" width="80" height="20" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-18" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-12" target="GA09nmFLpfHeItamLD5O-14">
|
||||
<mxGeometry relative="1" as="geometry">
|
||||
<Array as="points">
|
||||
<mxPoint x="340" y="100" />
|
||||
</Array>
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-36" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-12" target="GA09nmFLpfHeItamLD5O-34">
|
||||
<mxGeometry relative="1" as="geometry">
|
||||
<Array as="points">
|
||||
<mxPoint x="400" y="60" />
|
||||
<mxPoint x="660" y="60" />
|
||||
</Array>
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-12" value="acc" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
|
||||
<mxGeometry x="360" y="80" width="80" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-14" value="+" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;" vertex="1" parent="1">
|
||||
<mxGeometry x="320" y="120" width="40" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-32" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-21" target="GA09nmFLpfHeItamLD5O-31">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-21" value="*" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;" vertex="1" parent="1">
|
||||
<mxGeometry x="440" y="120" width="40" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-22" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-14" target="GA09nmFLpfHeItamLD5O-21">
|
||||
<mxGeometry relative="1" as="geometry">
|
||||
<mxPoint x="460" y="140" as="targetPoint" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-41" value="data_post_add" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-22">
|
||||
<mxGeometry x="-0.1925" y="-1" relative="1" as="geometry">
|
||||
<mxPoint x="8" y="19" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-26" value="%" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;" vertex="1" parent="1">
|
||||
<mxGeometry x="560" y="120" width="40" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-29" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.51;entryY=1.007;entryDx=0;entryDy=0;entryPerimeter=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-11" target="GA09nmFLpfHeItamLD5O-26">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-30" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1.022;entryY=0.482;entryDx=0;entryDy=0;entryPerimeter=0;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-26" target="GA09nmFLpfHeItamLD5O-12">
|
||||
<mxGeometry relative="1" as="geometry">
|
||||
<Array as="points">
|
||||
<mxPoint x="580" y="99" />
|
||||
</Array>
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-33" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-31" target="GA09nmFLpfHeItamLD5O-26">
|
||||
<mxGeometry relative="1" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-31" value="reg" style="rounded=0;whiteSpace=wrap;html=1;" vertex="1" parent="1">
|
||||
<mxGeometry x="500" y="130" width="40" height="20" as="geometry" />
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-37" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="GA09nmFLpfHeItamLD5O-34">
|
||||
<mxGeometry relative="1" as="geometry">
|
||||
<mxPoint x="720" y="140" as="targetPoint" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-39" value="tag" style="edgeLabel;html=1;align=center;verticalAlign=middle;resizable=0;points=[];" vertex="1" connectable="0" parent="GA09nmFLpfHeItamLD5O-37">
|
||||
<mxGeometry x="0.6531" y="2" relative="1" as="geometry">
|
||||
<mxPoint x="17" y="2" as="offset" />
|
||||
</mxGeometry>
|
||||
</mxCell>
|
||||
<mxCell id="GA09nmFLpfHeItamLD5O-34" value="+" style="ellipse;whiteSpace=wrap;html=1;aspect=fixed;" vertex="1" parent="1">
|
||||
<mxGeometry x="640" y="120" width="40" height="40" as="geometry" />
|
||||
</mxCell>
|
||||
</root>
|
||||
</mxGraphModel>
|
||||
</diagram>
|
||||
</mxfile>
|
||||
@@ -0,0 +1 @@
|
||||
create_clock -period 2.5 -name clk [get_ports i_clk]
|
||||
@@ -0,0 +1,42 @@
|
||||
module mult_timing_test(
|
||||
input i_clk,
|
||||
|
||||
input logic [132:0] data_a,
|
||||
input logic [127:0] data_b,
|
||||
|
||||
output logic [260:0] data_z
|
||||
);
|
||||
|
||||
logic [132:0] data_a_reg;
|
||||
logic [127:0] data_b_reg;
|
||||
|
||||
|
||||
logic [260:0] partial_result [7];
|
||||
|
||||
logic [260:0] data_z_temp_1[4];
|
||||
logic [260:0] data_z_temp_2_0, data_z_temp_2_1;
|
||||
|
||||
always @(posedge i_clk) begin
|
||||
data_a_reg <= data_a;
|
||||
data_b_reg <= data_b;
|
||||
|
||||
for (int i = 0; i < 7; i++) begin
|
||||
partial_result[i] <= data_a_reg[i*18 +: 18] * data_b_reg;
|
||||
end
|
||||
|
||||
|
||||
data_z_temp_1[0] <= (partial_result[0] << (19*0)) + (partial_result[1] << (19*1));
|
||||
data_z_temp_1[1] <= (partial_result[2] << (19*0)) + (partial_result[3] << (19*1));
|
||||
data_z_temp_1[2] <= (partial_result[4] << (19*0)) + (partial_result[5] << (19*1));
|
||||
data_z_temp_1[3] <= (partial_result[6] << (19*0));
|
||||
|
||||
data_z_temp_2_0 <= data_z_temp_1[0] + (data_z_temp_1[1] << (19*2));
|
||||
data_z_temp_2_1 <= data_z_temp_1[2] + (data_z_temp_1[3] << (19*2));
|
||||
|
||||
data_z <= data_z_temp_2_0 + data_z_temp_2_1;
|
||||
|
||||
// data_z <= data_z_temp_2[0] + (data_z_temp_2[1] << (19*4));
|
||||
|
||||
end
|
||||
|
||||
endmodule
|
||||
@@ -0,0 +1,122 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<efxpt:design_db name="poly1305_timing_test" device_def="Ti375N1156" version="2025.1.110" db_version="20251999" last_change_date="Sat Jul 5 07:15:12 2025" xmlns:efxpt="http://www.efinixinc.com/peri_design_db" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.efinixinc.com/peri_design_db peri_design_db.xsd ">
|
||||
<efxpt:device_info>
|
||||
<efxpt:iobank_info>
|
||||
<efxpt:iobank name="2A" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="2A_MODE_SEL"/>
|
||||
<efxpt:iobank name="2B" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="2B_MODE_SEL"/>
|
||||
<efxpt:iobank name="2C" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="2C_MODE_SEL"/>
|
||||
<efxpt:iobank name="2D" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="2D_MODE_SEL"/>
|
||||
<efxpt:iobank name="2E" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="2E_MODE_SEL"/>
|
||||
<efxpt:iobank name="4A" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="4A_MODE_SEL"/>
|
||||
<efxpt:iobank name="4B" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="4B_MODE_SEL"/>
|
||||
<efxpt:iobank name="4C" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="4C_MODE_SEL"/>
|
||||
<efxpt:iobank name="4D" iostd="1.8 V LVCMOS" is_dyn_voltage="false" mode_sel_name="4D_MODE_SEL"/>
|
||||
<efxpt:iobank name="BL0" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BL0_MODE_SEL"/>
|
||||
<efxpt:iobank name="BL1" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BL1_MODE_SEL"/>
|
||||
<efxpt:iobank name="BL2" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BL2_MODE_SEL"/>
|
||||
<efxpt:iobank name="BL3" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BL3_MODE_SEL"/>
|
||||
<efxpt:iobank name="BR0" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BR0_MODE_SEL"/>
|
||||
<efxpt:iobank name="BR1" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BR1_MODE_SEL"/>
|
||||
<efxpt:iobank name="BR3" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BR3_MODE_SEL"/>
|
||||
<efxpt:iobank name="BR4" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="BR4_MODE_SEL"/>
|
||||
<efxpt:iobank name="TL0" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="TL0_MODE_SEL"/>
|
||||
<efxpt:iobank name="TL1_TL5" iostd="3.3 V LVCMOS" is_dyn_voltage="false">
|
||||
<efxpt:mode_sel_name>
|
||||
<efxpt:pin_name bank_name="TL1" value="TL1_MODE_SEL"/>
|
||||
<efxpt:pin_name bank_name="TL5" value="TL5_MODE_SEL"/>
|
||||
</efxpt:mode_sel_name>
|
||||
</efxpt:iobank>
|
||||
<efxpt:iobank name="TR0" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="TR0_MODE_SEL"/>
|
||||
<efxpt:iobank name="TR1" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="TR1_MODE_SEL"/>
|
||||
<efxpt:iobank name="TR2" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="TR2_MODE_SEL"/>
|
||||
<efxpt:iobank name="TR3" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="TR3_MODE_SEL"/>
|
||||
<efxpt:iobank name="TR5" iostd="3.3 V LVCMOS" is_dyn_voltage="false" mode_sel_name="TR5_MODE_SEL"/>
|
||||
</efxpt:iobank_info>
|
||||
<efxpt:ctrl_info>
|
||||
<efxpt:ctrl name="cfg" ctrl_def="CONFIG_CTRL0" clock_name="" is_clk_invert="false" cbsel_bus_name="cfg_CBSEL" config_ctrl_name="cfg_CONFIG" ena_capture_name="cfg_ENA" error_status_name="cfg_ERROR" um_signal_status_name="cfg_USR_STATUS" is_remote_update_enable="false" is_user_mode_enable="false">
|
||||
<efxpt:gen_param>
|
||||
<efxpt:param name="remote_update_retries" value="0" value_type="int"/>
|
||||
</efxpt:gen_param>
|
||||
</efxpt:ctrl>
|
||||
</efxpt:ctrl_info>
|
||||
<efxpt:seu_info>
|
||||
<efxpt:seu name="seu" block_def="CONFIG_SEU0" mode="auto" ena_detect="false" wait_interval="16500000">
|
||||
<efxpt:gen_pin>
|
||||
<efxpt:pin name="seu_START" type_name="START" is_bus="false"/>
|
||||
<efxpt:pin name="seu_INJECT_ERROR" type_name="INJECT_ERROR" is_bus="false"/>
|
||||
<efxpt:pin name="seu_RST" type_name="RST" is_bus="false"/>
|
||||
<efxpt:pin name="seu_CONFIG" type_name="CONFIG" is_bus="false"/>
|
||||
<efxpt:pin name="seu_ERROR" type_name="ERROR" is_bus="false"/>
|
||||
<efxpt:pin name="seu_DONE" type_name="DONE" is_bus="false"/>
|
||||
</efxpt:gen_pin>
|
||||
</efxpt:seu>
|
||||
</efxpt:seu_info>
|
||||
<efxpt:clkmux_info>
|
||||
<efxpt:clkmux name="GCLKMUX_B" block_def="GCLKMUX_B" is_mux_bot0_dyn="false" is_mux_bot7_dyn="false">
|
||||
<efxpt:gen_pin>
|
||||
<efxpt:pin name="" type_name="ROUTE0" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE1" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE2" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE3" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_0" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_7" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_0" is_bus="true"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_7" is_bus="true"/>
|
||||
</efxpt:gen_pin>
|
||||
</efxpt:clkmux>
|
||||
<efxpt:clkmux name="GCLKMUX_L" block_def="GCLKMUX_L" is_mux_bot0_dyn="false" is_mux_bot7_dyn="false">
|
||||
<efxpt:gen_pin>
|
||||
<efxpt:pin name="" type_name="ROUTE0" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE1" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE2" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE3" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_0" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_7" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_0" is_bus="true"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_7" is_bus="true"/>
|
||||
</efxpt:gen_pin>
|
||||
</efxpt:clkmux>
|
||||
<efxpt:clkmux name="GCLKMUX_R" block_def="GCLKMUX_R" is_mux_bot0_dyn="false" is_mux_bot7_dyn="false">
|
||||
<efxpt:gen_pin>
|
||||
<efxpt:pin name="" type_name="ROUTE0" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE1" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE2" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE3" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_0" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_7" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_0" is_bus="true"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_7" is_bus="true"/>
|
||||
</efxpt:gen_pin>
|
||||
</efxpt:clkmux>
|
||||
<efxpt:clkmux name="GCLKMUX_T" block_def="GCLKMUX_T" is_mux_bot0_dyn="false" is_mux_bot7_dyn="false">
|
||||
<efxpt:gen_pin>
|
||||
<efxpt:pin name="" type_name="ROUTE0" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE1" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE2" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="ROUTE3" is_bus="false" is_clk="true" is_clk_invert="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_0" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_OUT_7" is_bus="false"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_0" is_bus="true"/>
|
||||
<efxpt:pin name="" type_name="DYN_MUX_SEL_7" is_bus="true"/>
|
||||
</efxpt:gen_pin>
|
||||
</efxpt:clkmux>
|
||||
</efxpt:clkmux_info>
|
||||
</efxpt:device_info>
|
||||
<efxpt:gpio_info>
|
||||
<efxpt:global_unused_config state="input with weak pullup"/>
|
||||
</efxpt:gpio_info>
|
||||
<efxpt:pll_info/>
|
||||
<efxpt:osc_info/>
|
||||
<efxpt:lvds_info/>
|
||||
<efxpt:mipi_info/>
|
||||
<efxpt:jtag_info/>
|
||||
<efxpt:ddr_info/>
|
||||
<efxpt:mipi_dphy_info/>
|
||||
<efxpt:pll_ssc_info/>
|
||||
<efxpt:quad_lane_info/>
|
||||
<efxpt:quad_pcie_info/>
|
||||
<efxpt:lane_10g_info/>
|
||||
<efxpt:lane_1g_info/>
|
||||
<efxpt:raw_serdes_info/>
|
||||
<efxpt:soc_info/>
|
||||
</efxpt:design_db>
|
||||
@@ -0,0 +1,110 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<efx:project name="poly1305_timing_test" description="" last_change="1752448578" sw_version="2025.1.110" last_run_state="pass" last_run_flow="bitstream" config_result_in_sync="true" design_ood="sync" place_ood="sync" route_ood="sync" xmlns:efx="http://www.efinixinc.com/enf_proj" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.efinixinc.com/enf_proj enf_proj.xsd">
|
||||
<efx:device_info>
|
||||
<efx:family name="Titanium"/>
|
||||
<efx:device name="Ti375N1156"/>
|
||||
<efx:timing_model name="C4"/>
|
||||
</efx:device_info>
|
||||
<efx:design_info def_veri_version="sv_09" def_vhdl_version="vhdl_2008" unified_flow="false">
|
||||
<efx:top_module name="mult_timing_test"/>
|
||||
<efx:design_file name="../src/poly1305_core.sv" version="default" library="default"/>
|
||||
<efx:design_file name="../../common/sim/sub/taxi/src/axis/rtl/taxi_axis_if.sv" version="default" library="default"/>
|
||||
<efx:design_file name="../sim/poly1305_core_wrapper.sv" version="default" library="default"/>
|
||||
<efx:design_file name="mult_timing_test.sv" version="default" library="default"/>
|
||||
<efx:top_vhdl_arch name=""/>
|
||||
</efx:design_info>
|
||||
<efx:constraint_info>
|
||||
<efx:sdc_file name="constraints.sdc"/>
|
||||
<efx:inter_file name=""/>
|
||||
</efx:constraint_info>
|
||||
<efx:sim_info/>
|
||||
<efx:misc_info/>
|
||||
<efx:ip_info/>
|
||||
<efx:synthesis tool_name="efx_map">
|
||||
<efx:param name="work_dir" value="work_syn" value_type="e_string"/>
|
||||
<efx:param name="write_efx_verilog" value="on" value_type="e_bool"/>
|
||||
<efx:param name="allow-const-ram-index" value="0" value_type="e_option"/>
|
||||
<efx:param name="blackbox-error" value="1" value_type="e_option"/>
|
||||
<efx:param name="blast_const_operand_adders" value="1" value_type="e_option"/>
|
||||
<efx:param name="bram_output_regs_packing" value="1" value_type="e_option"/>
|
||||
<efx:param name="bram-push-tco-outreg" value="0" value_type="e_option"/>
|
||||
<efx:param name="create-onehot-fsms" value="0" value_type="e_option"/>
|
||||
<efx:param name="fanout-limit" value="0" value_type="e_integer"/>
|
||||
<efx:param name="hdl-compile-unit" value="1" value_type="e_option"/>
|
||||
<efx:param name="hdl-loop-limit" value="20000" value_type="e_integer"/>
|
||||
<efx:param name="infer-clk-enable" value="3" value_type="e_option"/>
|
||||
<efx:param name="infer-sync-set-reset" value="1" value_type="e_option"/>
|
||||
<efx:param name="enable-mark-debug" value="1" value_type="e_option"/>
|
||||
<efx:param name="max_ram" value="-1" value_type="e_integer"/>
|
||||
<efx:param name="max_mult" value="-1" value_type="e_integer"/>
|
||||
<efx:param name="max-bit-blast-mem-size" value="10240" value_type="e_integer"/>
|
||||
<efx:param name="min-sr-fanout" value="0" value_type="e_integer"/>
|
||||
<efx:param name="min-ce-fanout" value="0" value_type="e_integer"/>
|
||||
<efx:param name="mode" value="speed" value_type="e_option"/>
|
||||
<efx:param name="mult-auto-pipeline" value="1" value_type="e_integer"/>
|
||||
<efx:param name="mult-decomp-retime" value="1" value_type="e_option"/>
|
||||
<efx:param name="operator-sharing" value="1" value_type="e_option"/>
|
||||
<efx:param name="optimize-adder-tree" value="1" value_type="e_option"/>
|
||||
<efx:param name="optimize-zero-init-rom" value="1" value_type="e_option"/>
|
||||
<efx:param name="peri-syn-instantiation" value="0" value_type="e_option"/>
|
||||
<efx:param name="peri-syn-inference" value="0" value_type="e_option"/>
|
||||
<efx:param name="ram-decomp-mode" value="0" value_type="e_option"/>
|
||||
<efx:param name="retiming" value="2" value_type="e_option"/>
|
||||
<efx:param name="seq_opt" value="1" value_type="e_option"/>
|
||||
<efx:param name="seq-opt-sync-only" value="0" value_type="e_option"/>
|
||||
<efx:param name="use-logic-for-small-mem" value="64" value_type="e_integer"/>
|
||||
<efx:param name="use-logic-for-small-rom" value="64" value_type="e_integer"/>
|
||||
<efx:param name="max_threads" value="-1" value_type="e_integer"/>
|
||||
<efx:param name="dsp-input-regs-packing" value="1" value_type="e_option"/>
|
||||
<efx:param name="dsp-output-regs-packing" value="1" value_type="e_option"/>
|
||||
<efx:param name="dsp-mac-packing" value="1" value_type="e_option"/>
|
||||
<efx:param name="insert-carry-skip" value="1" value_type="e_option"/>
|
||||
<efx:param name="pack-luts-to-comb4" value="0" value_type="e_option"/>
|
||||
<efx:dynparam name="asdf" value="asdf"/>
|
||||
</efx:synthesis>
|
||||
<efx:place_and_route tool_name="efx_pnr">
|
||||
<efx:param name="work_dir" value="work_pnr" value_type="e_string"/>
|
||||
<efx:param name="verbose" value="off" value_type="e_bool"/>
|
||||
<efx:param name="load_delaym" value="on" value_type="e_bool"/>
|
||||
<efx:param name="optimization_level" value="TIMING_3" value_type="e_option"/>
|
||||
<efx:param name="seed" value="1" value_type="e_integer"/>
|
||||
<efx:param name="placer_effort_level" value="5" value_type="e_option"/>
|
||||
<efx:param name="max_threads" value="-1" value_type="e_integer"/>
|
||||
<efx:param name="print_critical_path" value="10" value_type="e_integer"/>
|
||||
<efx:param name="classic_flow" value="off" value_type="e_noarg"/>
|
||||
<efx:param name="beneficial_skew" value="on" value_type="e_option"/>
|
||||
</efx:place_and_route>
|
||||
<efx:bitstream_generation tool_name="efx_pgm">
|
||||
<efx:param name="mode" value="active" value_type="e_option"/>
|
||||
<efx:param name="width" value="1" value_type="e_option"/>
|
||||
<efx:param name="enable_roms" value="smart" value_type="e_option"/>
|
||||
<efx:param name="spi_low_power_mode" value="on" value_type="e_bool"/>
|
||||
<efx:param name="io_weak_pullup" value="on" value_type="e_bool"/>
|
||||
<efx:param name="oscillator_clock_divider" value="DIV8" value_type="e_option"/>
|
||||
<efx:param name="bitstream_compression" value="on" value_type="e_bool"/>
|
||||
<efx:param name="enable_external_master_clock" value="off" value_type="e_bool"/>
|
||||
<efx:param name="active_capture_clk_edge" value="negedge" value_type="e_option"/>
|
||||
<efx:param name="jtag_usercode" value="0xFFFFFFFF" value_type="e_string"/>
|
||||
<efx:param name="release_tri_then_reset" value="on" value_type="e_bool"/>
|
||||
<efx:param name="four_byte_addressing" value="off" value_type="e_bool"/>
|
||||
<efx:param name="generate_bit" value="on" value_type="e_bool"/>
|
||||
<efx:param name="generate_bitbin" value="off" value_type="e_bool"/>
|
||||
<efx:param name="generate_hex" value="on" value_type="e_bool"/>
|
||||
<efx:param name="generate_hexbin" value="off" value_type="e_bool"/>
|
||||
<efx:param name="cold_boot" value="off" value_type="e_bool"/>
|
||||
<efx:param name="cascade" value="off" value_type="e_option"/>
|
||||
</efx:bitstream_generation>
|
||||
<efx:debugger>
|
||||
<efx:param name="work_dir" value="work_dbg" value_type="e_string"/>
|
||||
<efx:param name="auto_instantiation" value="off" value_type="e_bool"/>
|
||||
<efx:param name="profile" value="NONE" value_type="e_string"/>
|
||||
</efx:debugger>
|
||||
<efx:security>
|
||||
<efx:param name="randomize_iv_value" value="on" value_type="e_bool"/>
|
||||
<efx:param name="iv_value" value="" value_type="e_string"/>
|
||||
<efx:param name="enable_bitstream_encrypt" value="off" value_type="e_bool"/>
|
||||
<efx:param name="enable_bitstream_auth" value="off" value_type="e_bool"/>
|
||||
<efx:param name="encryption_key_file" value="NONE" value_type="e_string"/>
|
||||
<efx:param name="auth_key_file" value="NONE" value_type="e_string"/>
|
||||
</efx:security>
|
||||
</efx:project>
|
||||
121
ChaCha20_Poly1305_64/sim/do_poly_1305.py
Normal file
121
ChaCha20_Poly1305_64/sim/do_poly_1305.py
Normal file
@@ -0,0 +1,121 @@
|
||||
from typing import List
|
||||
|
||||
from modulo_theory import friendly_modular_mult, friendly_modulo
|
||||
|
||||
def mask_r(r: int) -> int:
|
||||
r_bytes = r.to_bytes(16, "little")
|
||||
|
||||
r_masked = bytearray(r_bytes)
|
||||
r_masked[3] &= 15
|
||||
r_masked[7] &= 15
|
||||
r_masked[11] &= 15
|
||||
r_masked[15] &= 15
|
||||
r_masked[4] &= 252
|
||||
r_masked[8] &= 252
|
||||
r_masked[12] &= 252
|
||||
|
||||
|
||||
r_masked = int.from_bytes(r_masked, "little")
|
||||
|
||||
return r_masked
|
||||
|
||||
|
||||
def poly1305(message: bytes, r: int, s: int):
|
||||
r = mask_r(r)
|
||||
p = 2**130-5
|
||||
acc = 0
|
||||
|
||||
blocks = [int.from_bytes(message[i:i+16], "little") for i in range(0, len(message), 16)]
|
||||
|
||||
for block in blocks:
|
||||
byte_length = (block.bit_length() + 7) // 8
|
||||
|
||||
block += 1 << (8*byte_length)
|
||||
|
||||
acc = ((acc+block)*r) % p
|
||||
|
||||
acc += s
|
||||
|
||||
return acc & (2**128-1)
|
||||
|
||||
def parallel_poly1305(message: bytes, r: int, s: int, lanes: int):
|
||||
r = mask_r(r)
|
||||
p = 2**130-5
|
||||
|
||||
r_powers = [1, r]
|
||||
|
||||
for l_pow_log2 in range(3):
|
||||
l_pow = 2**l_pow_log2
|
||||
for r_pow in range(1,l_pow+1):
|
||||
r_powers.append(friendly_modular_mult(r_powers[l_pow], r_powers[r_pow]))
|
||||
|
||||
acc = [0]*lanes
|
||||
|
||||
blocks = [int.from_bytes(message[i:i+16], "little") for i in range(0, len(message), 16)]
|
||||
|
||||
lane_blocks = [blocks[i:i+lanes] for i in range(0, len(blocks), lanes)]
|
||||
|
||||
for i, lane_block in enumerate(lane_blocks):
|
||||
for j, lane in enumerate(lane_block):
|
||||
idx = i*lanes + j
|
||||
power = min(lanes, len(blocks) - idx)
|
||||
|
||||
# There is a division here but we can get this value somehow else
|
||||
byte_length = (lane.bit_length() + 7) // 8
|
||||
lane += 1 << (8*byte_length)
|
||||
|
||||
acc[j] = friendly_modular_mult(acc[j] + lane, r_powers[power])
|
||||
|
||||
combined_acc = friendly_modulo(sum(acc), 0)
|
||||
combined_acc += s
|
||||
|
||||
return combined_acc & (2**128-1)
|
||||
|
||||
|
||||
def test_regular():
|
||||
r = 0xa806d542fe52447f336d555778bed685
|
||||
s = 0x1bf54941aff6bf4afdb20dfb8a800301
|
||||
|
||||
golden_result = 0xa927010caf8b2bc2c6365130c11d06a8
|
||||
|
||||
msg = b"Cryptographic Forum Research Group"
|
||||
|
||||
result = poly1305(msg, r, s)
|
||||
|
||||
print(f"{golden_result:x}")
|
||||
print(f"{result:x}")
|
||||
|
||||
def test_parallel():
|
||||
r = 0xa806d542fe52447f336d555778bed685
|
||||
s = 0x1bf54941aff6bf4afdb20dfb8a800301
|
||||
|
||||
golden_result = 0xa927010caf8b2bc2c6365130c11d06a8
|
||||
|
||||
msg = b"Cryptographic Forum Research Group"
|
||||
|
||||
result = parallel_poly1305(msg, r, s, 8)
|
||||
|
||||
print(f"{golden_result:x}")
|
||||
print(f"{result:x}")
|
||||
|
||||
|
||||
def test_on_long_string():
|
||||
r = 0xa806d542fe52447f336d555778bed685
|
||||
s = 0x1bf54941aff6bf4afdb20dfb8a800301
|
||||
|
||||
msg = b"Very long message with lots of words that is very long and requires a lot of cycles to complete because of how long it is"
|
||||
|
||||
regular_result = poly1305(msg, r, s)
|
||||
parallel_result = parallel_poly1305(msg, r, s, 8)
|
||||
|
||||
print(f"{regular_result:x}")
|
||||
print(f"{parallel_result:x}")
|
||||
|
||||
|
||||
def main():
|
||||
test_regular()
|
||||
test_parallel()
|
||||
test_on_long_string()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
87
ChaCha20_Poly1305_64/sim/modulo_theory.py
Normal file
87
ChaCha20_Poly1305_64/sim/modulo_theory.py
Normal file
@@ -0,0 +1,87 @@
|
||||
import random
|
||||
|
||||
PRIME = 2**130-5
|
||||
|
||||
def modulo_theory_simple(loops: int):
|
||||
prime = 97
|
||||
|
||||
for _ in range(loops):
|
||||
value_a = random.randint(1,97)
|
||||
value_b = random.randint(1,97)
|
||||
|
||||
value_a_high = value_a // 10
|
||||
value_a_low = value_a % 10 # Ignore this modulo, in base 2 it is a mask
|
||||
|
||||
prod_high = value_a_high * value_b
|
||||
prod_low = value_a_low * value_b
|
||||
|
||||
mod_high = (prod_high*10) % prime
|
||||
mod_low = prod_low % prime
|
||||
|
||||
mod_sum = (mod_high + mod_low) % prime
|
||||
|
||||
mod_conventional = (value_a * value_b) % prime
|
||||
|
||||
if mod_sum != mod_conventional:
|
||||
print(f"{value_a}")
|
||||
print(f"{value_b}")
|
||||
print(f"{mod_sum=}")
|
||||
print(f"{mod_conventional=}")
|
||||
|
||||
def modulo_theory_full(loops: int):
|
||||
for _ in range(loops):
|
||||
value_a = random.randint(1,PRIME)
|
||||
value_b = random.randint(1,2**128)
|
||||
|
||||
a_partials = [(value_a >> 26*i) & (2**26-1) for i in range(5)]
|
||||
|
||||
prods = [a_partial * value_b for a_partial in a_partials]
|
||||
|
||||
mods = [friendly_modulo(prod, 26*i) for i, prod in enumerate(prods)]
|
||||
|
||||
|
||||
mod_sum = friendly_modulo(sum(mods), 0)
|
||||
|
||||
mod_conventional = (value_a * value_b) % PRIME
|
||||
|
||||
if mod_sum != mod_conventional:
|
||||
print(f"{value_a}")
|
||||
print(f"{value_b}")
|
||||
print(f"{mod_sum=}")
|
||||
print(f"{mod_conventional=}")
|
||||
|
||||
def friendly_modular_mult(value_a: int, value_b: int) -> int:
|
||||
a_partials = [(value_a >> 26*i) & (2**26-1) for i in range(5)]
|
||||
|
||||
prods = [a_partial * value_b for a_partial in a_partials]
|
||||
|
||||
mods = [friendly_modulo(prod, 26*i) for i, prod in enumerate(prods)]
|
||||
|
||||
|
||||
mod_sum = friendly_modulo(sum(mods), 0)
|
||||
|
||||
return mod_sum
|
||||
|
||||
def friendly_modulo(val: int, shift_amount: int) -> int:
|
||||
high_part = val >> (130-shift_amount)
|
||||
low_part = (val << shift_amount) & (2**130-1)
|
||||
|
||||
high_part *= 5
|
||||
|
||||
val = high_part + low_part
|
||||
|
||||
high_part = val >> 130
|
||||
low_part = val & (2**130-1)
|
||||
|
||||
high_part *= 5
|
||||
|
||||
val = high_part + low_part
|
||||
|
||||
if val >= PRIME:
|
||||
val -= PRIME
|
||||
|
||||
return val
|
||||
|
||||
if __name__ == "__main__":
|
||||
#modulo_theory_simple(10000000)
|
||||
modulo_theory_full(100000)
|
||||
13
ChaCha20_Poly1305_64/sim/poly1305.yaml
Normal file
13
ChaCha20_Poly1305_64/sim/poly1305.yaml
Normal file
@@ -0,0 +1,13 @@
|
||||
tests:
|
||||
- name: "poly1305_core"
|
||||
toplevel: "poly1305_core_wrapper"
|
||||
modules:
|
||||
- "poly1305_core"
|
||||
sources: "sources.list"
|
||||
waves: True
|
||||
- name: "friendly_modulo"
|
||||
toplevel: "poly1305_friendly_modulo"
|
||||
modules:
|
||||
- "poly1305_friendly_modulo"
|
||||
sources: sources.list
|
||||
waves: True
|
||||
75
ChaCha20_Poly1305_64/sim/poly1305_core.py
Normal file
75
ChaCha20_Poly1305_64/sim/poly1305_core.py
Normal file
@@ -0,0 +1,75 @@
|
||||
import logging
|
||||
|
||||
|
||||
import cocotb
|
||||
from cocotb.clock import Clock
|
||||
from cocotb.triggers import Timer, RisingEdge, FallingEdge
|
||||
from cocotb.queue import Queue
|
||||
|
||||
from cocotbext.axi import AxiStreamBus, AxiStreamSource
|
||||
|
||||
CLK_PERIOD = 4
|
||||
|
||||
|
||||
class TB:
|
||||
def __init__(self, dut):
|
||||
self.dut = dut
|
||||
|
||||
self.log = logging.getLogger("cocotb.tb")
|
||||
self.log.setLevel(logging.INFO)
|
||||
|
||||
cocotb.start_soon(Clock(self.dut.i_clk, CLK_PERIOD, units="ns").start())
|
||||
|
||||
self.s_data_axis = AxiStreamSource(AxiStreamBus.from_prefix(dut, ""), dut.i_clk, dut.i_rst)
|
||||
|
||||
async def cycle_reset(self):
|
||||
await self._cycle_reset(self.dut.i_rst, self.dut.i_clk)
|
||||
|
||||
async def _cycle_reset(self, rst, clk):
|
||||
rst.setimmediatevalue(0)
|
||||
await RisingEdge(clk)
|
||||
await RisingEdge(clk)
|
||||
rst.value = 1
|
||||
await RisingEdge(clk)
|
||||
await RisingEdge(clk)
|
||||
rst.value = 0
|
||||
await RisingEdge(clk)
|
||||
await RisingEdge(clk)
|
||||
|
||||
@cocotb.test
|
||||
async def test_sanity(dut):
|
||||
tb = TB(dut)
|
||||
|
||||
await tb.cycle_reset()
|
||||
|
||||
s = 0x1bf54941aff6bf4afdb20dfb8a800301
|
||||
r = 0xa806d542fe52447f336d555778bed685
|
||||
r_masked = 0x0806d5400e52447c036d555408bed685
|
||||
|
||||
result = 0xa927010caf8b2bc2c6365130c11d06a8
|
||||
|
||||
msg = b"Cryptographic Forum Research Group"
|
||||
|
||||
|
||||
tb.dut.i_otk.value = ((r << 128) | s)
|
||||
tb.dut.i_otk_valid.value = 1
|
||||
await RisingEdge(tb.dut.i_clk)
|
||||
tb.dut.i_otk_valid.value = 0
|
||||
await RisingEdge(tb.dut.i_clk)
|
||||
|
||||
dut_s = tb.dut.u_dut.poly1305_s.value.integer
|
||||
dut_r = tb.dut.u_dut.poly1305_r.value.integer
|
||||
|
||||
assert dut_s == s
|
||||
assert dut_r == r_masked
|
||||
|
||||
await tb.s_data_axis.send(msg)
|
||||
|
||||
await RisingEdge(tb.dut.o_tag_valid)
|
||||
tag = tb.dut.o_tag.value.integer
|
||||
|
||||
tb.log.info(f"tag: {tag:x}")
|
||||
|
||||
assert tag == result
|
||||
|
||||
await Timer(1, "us")
|
||||
40
ChaCha20_Poly1305_64/sim/poly1305_core_wrapper.sv
Normal file
40
ChaCha20_Poly1305_64/sim/poly1305_core_wrapper.sv
Normal file
@@ -0,0 +1,40 @@
|
||||
module poly1305_core_wrapper(
|
||||
input i_clk,
|
||||
input i_rst,
|
||||
|
||||
input [255:0] i_otk,
|
||||
input i_otk_valid,
|
||||
|
||||
output [127:0] o_tag,
|
||||
output o_tag_valid,
|
||||
|
||||
input [127:0] tdata,
|
||||
input [15:0] tkeep,
|
||||
input [15:0] tstrb,
|
||||
input tlast,
|
||||
input tvalid,
|
||||
output tready
|
||||
);
|
||||
|
||||
taxi_axis_if #(.DATA_W(128)) s_data_axis();
|
||||
|
||||
assign s_data_axis.tdata = tdata;
|
||||
assign s_data_axis.tkeep = tkeep;
|
||||
assign s_data_axis.tstrb = tstrb;
|
||||
assign s_data_axis.tlast = tlast;
|
||||
assign s_data_axis.tvalid = tvalid;
|
||||
assign tready = s_data_axis.tready;
|
||||
|
||||
poly1305_core u_dut (
|
||||
.i_clk (i_clk),
|
||||
.i_rst (i_rst),
|
||||
.i_otk (i_otk),
|
||||
.i_otk_valid (i_otk_valid),
|
||||
|
||||
.o_tag (o_tag),
|
||||
.o_tag_valid (o_tag_valid),
|
||||
|
||||
.s_data_axis (s_data_axis)
|
||||
);
|
||||
|
||||
endmodule
|
||||
91
ChaCha20_Poly1305_64/sim/poly1305_friendly_modulo.py
Normal file
91
ChaCha20_Poly1305_64/sim/poly1305_friendly_modulo.py
Normal file
@@ -0,0 +1,91 @@
|
||||
import logging
|
||||
|
||||
|
||||
import cocotb
|
||||
from cocotb.clock import Clock
|
||||
from cocotb.triggers import Timer, RisingEdge, FallingEdge
|
||||
from cocotb.queue import Queue
|
||||
|
||||
from cocotbext.axi import AxiStreamBus, AxiStreamSource
|
||||
|
||||
import random
|
||||
|
||||
PRIME = 2**130-5
|
||||
|
||||
CLK_PERIOD = 4
|
||||
|
||||
|
||||
class TB:
|
||||
def __init__(self, dut):
|
||||
self.dut = dut
|
||||
|
||||
self.log = logging.getLogger("cocotb.tb")
|
||||
self.log.setLevel(logging.INFO)
|
||||
|
||||
self.input_queue = Queue()
|
||||
|
||||
self.expected_queue = Queue()
|
||||
self.output_queue = Queue()
|
||||
|
||||
cocotb.start_soon(Clock(self.dut.i_clk, CLK_PERIOD, units="ns").start())
|
||||
|
||||
cocotb.start_soon(self.run_input())
|
||||
cocotb.start_soon(self.run_output())
|
||||
|
||||
async def cycle_reset(self):
|
||||
await self._cycle_reset(self.dut.i_rst, self.dut.i_clk)
|
||||
|
||||
async def _cycle_reset(self, rst, clk):
|
||||
rst.setimmediatevalue(0)
|
||||
await RisingEdge(clk)
|
||||
await RisingEdge(clk)
|
||||
rst.value = 1
|
||||
await RisingEdge(clk)
|
||||
await RisingEdge(clk)
|
||||
rst.value = 0
|
||||
await RisingEdge(clk)
|
||||
await RisingEdge(clk)
|
||||
|
||||
async def write_input(self, value: int, shift_amount: int):
|
||||
await self.input_queue.put((value, shift_amount))
|
||||
await self.expected_queue.put((value << (shift_amount*26)) % PRIME)
|
||||
|
||||
async def run_input(self):
|
||||
while True:
|
||||
value, shift_amount = await self.input_queue.get()
|
||||
self.dut.i_valid.value = 1
|
||||
self.dut.i_val.value = value
|
||||
self.dut.i_shift_amount.value = shift_amount
|
||||
await RisingEdge(self.dut.i_clk)
|
||||
self.dut.i_valid.value = 0
|
||||
self.dut.i_shift_amount.value = 0
|
||||
self.dut.i_val.value = 0
|
||||
|
||||
async def run_output(self):
|
||||
while True:
|
||||
await RisingEdge(self.dut.i_clk)
|
||||
if self.dut.o_valid.value:
|
||||
await self.output_queue.put(self.dut.o_result.value.integer)
|
||||
|
||||
@cocotb.test
|
||||
async def test_sanity(dut):
|
||||
tb = TB(dut)
|
||||
|
||||
await tb.cycle_reset()
|
||||
|
||||
count = 1024
|
||||
|
||||
for _ in range(count):
|
||||
await tb.write_input(random.randint(1,2**(130+16)), random.randint(0, 4))
|
||||
|
||||
fail = False
|
||||
|
||||
for _ in range(count):
|
||||
sim_val = await tb.expected_queue.get()
|
||||
dut_val = await tb.output_queue.get()
|
||||
|
||||
if sim_val != dut_val:
|
||||
tb.log.info(f"{sim_val:x} -> {dut_val:x}")
|
||||
fail = True
|
||||
|
||||
assert not fail
|
||||
@@ -1 +1,4 @@
|
||||
../src/sources.list
|
||||
poly1305_core_wrapper.sv
|
||||
|
||||
../src/sources.list
|
||||
../../common/sim/sub/taxi/src/axis/rtl/taxi_axis_if.sv
|
||||
|
||||
24
ChaCha20_Poly1305_64/src/chacha20_poly1305_64.sv
Normal file
24
ChaCha20_Poly1305_64/src/chacha20_poly1305_64.sv
Normal file
@@ -0,0 +1,24 @@
|
||||
module chacha20_poly1305_64 (
|
||||
input i_clk,
|
||||
input i_rst,
|
||||
|
||||
taxi_axis_if.snk s_ctrl_axis,
|
||||
taxi_axis_if.snk s_data_axis,
|
||||
taxi_axis_if.src m_data_axis
|
||||
);
|
||||
|
||||
//TODO the rest of this
|
||||
|
||||
// control axis decoder.
|
||||
|
||||
localparam R_MASK = 128'h0ffffffc0ffffffc0ffffffc0fffffff;
|
||||
|
||||
chacha20_pipelined_block u_chacha20_pipelined_block (
|
||||
|
||||
);
|
||||
|
||||
poly1305 u_poly1305 (
|
||||
|
||||
);
|
||||
|
||||
endmodule
|
||||
101
ChaCha20_Poly1305_64/src/poly1305_core.sv
Normal file
101
ChaCha20_Poly1305_64/src/poly1305_core.sv
Normal file
@@ -0,0 +1,101 @@
|
||||
module poly1305_core #(
|
||||
|
||||
) (
|
||||
input i_clk,
|
||||
input i_rst,
|
||||
|
||||
input [255:0] i_otk,
|
||||
input i_otk_valid,
|
||||
|
||||
output [127:0] o_tag,
|
||||
output o_tag_valid,
|
||||
|
||||
taxi_axis_if.snk s_data_axis
|
||||
);
|
||||
|
||||
// incoming data must be 128 bit and packed, i.e. tkeep is 1 except for the last beat with no gaps
|
||||
|
||||
|
||||
localparam R_MASK = 128'h0ffffffc0ffffffc0ffffffc0fffffff;
|
||||
localparam P130M5 = 258'h3fffffffffffffffffffffffffffffffb;
|
||||
|
||||
logic [127:0] poly1305_r, poly1305_s;
|
||||
logic [129:0] accumulator, accumulator_next;
|
||||
|
||||
logic [129:0] data_one_extended;
|
||||
logic [130:0] data_post_add, data_post_add_reg;
|
||||
|
||||
logic [257:0] data_post_mul, data_post_mul_reg;
|
||||
|
||||
logic [257:0] modulo_stage, modulo_stage_next;
|
||||
|
||||
logic [2:0] phase;
|
||||
|
||||
logic [3:0] valid_sr;
|
||||
|
||||
function logic [129:0] tkeep_expand (input [15:0] tkeep);
|
||||
tkeep_expand = '0;
|
||||
for (int i = 0; i < 16; i++) begin
|
||||
tkeep_expand[i*8 +: 8] = {8{tkeep[i]}};
|
||||
end
|
||||
endfunction
|
||||
|
||||
// only ready in phase 0
|
||||
assign s_data_axis.tready = phase == 0;
|
||||
assign o_tag_valid = valid_sr[3];
|
||||
|
||||
always_ff @(posedge i_clk) begin
|
||||
if (i_rst) begin
|
||||
phase <= '0;
|
||||
valid_sr <= '0;
|
||||
end
|
||||
|
||||
valid_sr <= {valid_sr[2:0], s_data_axis.tlast & s_data_axis.tvalid & s_data_axis.tready & (phase == 0)};
|
||||
data_post_add_reg <= data_post_add;
|
||||
data_post_mul_reg <= data_post_mul;
|
||||
modulo_stage <= modulo_stage_next;
|
||||
|
||||
if (i_otk_valid) begin
|
||||
poly1305_r <= i_otk[255:128] & R_MASK;
|
||||
poly1305_s <= i_otk[127:0];
|
||||
end
|
||||
|
||||
if (s_data_axis.tvalid && phase == 0) begin
|
||||
phase <= 1;
|
||||
end
|
||||
|
||||
if (phase == 1) begin
|
||||
phase <= 2;
|
||||
end
|
||||
|
||||
if (phase == 2) begin
|
||||
phase <= 3;
|
||||
end
|
||||
|
||||
if (phase == 3) begin
|
||||
accumulator <= accumulator_next;
|
||||
phase <= '0;
|
||||
end
|
||||
end
|
||||
|
||||
always_comb begin
|
||||
accumulator_next = accumulator;
|
||||
data_post_mul = '0;
|
||||
|
||||
// phase == 0
|
||||
data_one_extended = (tkeep_expand(s_data_axis.tkeep) + 1) | {2'b0, s_data_axis.tdata};
|
||||
data_post_add = data_one_extended + accumulator;
|
||||
|
||||
// phase == 1
|
||||
data_post_mul = data_post_add_reg * poly1305_r;
|
||||
|
||||
// phase == 2
|
||||
modulo_stage_next = (data_post_mul_reg[257:130] * 5) + 258'(data_post_mul_reg[129:0]);
|
||||
|
||||
// phase == 3
|
||||
accumulator_next = 130'((modulo_stage[257:130] * 5) + 258'(modulo_stage[129:0]));
|
||||
end
|
||||
|
||||
assign o_tag = accumulator[127:0] + poly1305_s;
|
||||
|
||||
endmodule
|
||||
48
ChaCha20_Poly1305_64/src/poly1305_friendly_modulo.sv
Normal file
48
ChaCha20_Poly1305_64/src/poly1305_friendly_modulo.sv
Normal file
@@ -0,0 +1,48 @@
|
||||
module poly1305_friendly_modulo #(
|
||||
parameter WIDTH = 130,
|
||||
parameter MDIFF = 5, // modulo difference
|
||||
parameter SHIFT_SIZE = 26
|
||||
) (
|
||||
input logic i_clk,
|
||||
input logic i_rst,
|
||||
|
||||
input logic i_valid,
|
||||
input logic [2*WIDTH-1:0] i_val,
|
||||
input logic [2:0] i_shift_amount,
|
||||
|
||||
output logic o_valid,
|
||||
output logic [WIDTH-1:0] o_result
|
||||
);
|
||||
|
||||
localparam WIDE_WIDTH = WIDTH + $clog2(MDIFF);
|
||||
localparam [WIDTH-1:0] PRIME = (1 << WIDTH) - MDIFF;
|
||||
|
||||
logic [WIDE_WIDTH-1:0] high_part_1, high_part_2;
|
||||
logic [WIDTH-1:0] low_part_1, low_part_2;
|
||||
|
||||
logic [WIDE_WIDTH-1:0] intermediate_val;
|
||||
logic [WIDTH-1:0] final_val;
|
||||
|
||||
logic [2:0] unused_final;
|
||||
|
||||
logic [2:0] valid_sr;
|
||||
|
||||
assign intermediate_val = high_part_1 + WIDE_WIDTH'(low_part_1);
|
||||
|
||||
assign o_result = (final_val >= PRIME) ? final_val - PRIME : final_val;
|
||||
|
||||
assign o_valid = valid_sr[2];
|
||||
|
||||
always_ff @(posedge i_clk) begin
|
||||
valid_sr <= {valid_sr[1:0], i_valid};
|
||||
|
||||
high_part_1 <= WIDTH'({3'b0, i_val} >> (130 - (i_shift_amount*SHIFT_SIZE))) * MDIFF;
|
||||
low_part_1 <= WIDTH'(i_val << (i_shift_amount*SHIFT_SIZE));
|
||||
|
||||
high_part_2 <= (intermediate_val >> WIDTH) * 5;
|
||||
low_part_2 <= intermediate_val[WIDTH-1:0];
|
||||
|
||||
{unused_final, final_val} <= high_part_2 + WIDE_WIDTH'(low_part_2);
|
||||
end
|
||||
|
||||
endmodule
|
||||
@@ -1,4 +1,7 @@
|
||||
chacha20_qr.sv
|
||||
chacha20_block.sv
|
||||
chacha20_pipelined_round.sv
|
||||
chacha20_pipelined_block.sv
|
||||
chacha20_pipelined_block.sv
|
||||
|
||||
poly1305_core.sv
|
||||
poly1305_friendly_modulo.sv
|
||||
Reference in New Issue
Block a user