add notes

This commit is contained in:
Byron Lathi
2025-06-23 00:02:14 -07:00
parent 4b67e7aa5a
commit 369e29557c

View File

@@ -0,0 +1,33 @@
# Notes
Since we are designing this for a 64 bit datapath, we need to be able to
compute 64 bits every cycle. The ChaCha20 hash works on groups of 16x32, or
512-bit blocks at a time. Logically it might make more sense to have a datapath
of 128 bits.
On the other hand, each operation is a 32 bit operation. It might make more
sense for timing reasons then to have each operation registered. But will this
be able to match the throughput that we need?
Each quarter round generates 4 words. Each cycle updates all 128 bits at once.
We can do 4 of the quarter rounds at once, so at the end of each cycle we will
generate 512 bits.
At full speed then, the core would generate 512 bits per cycle. but we would
only need to generate 64 bits per cycle. We could only do 1 quarter cycle at
once, which would only generate 128 bits per cycle, but we would need some sort
of structure to reorder the state such that it is ready to xor with the
incoming data. We could even make this parameterizable, but that would be the
next step if we actually need to support 100Gbps encryption.
So in summary, we will have a single QuarterRound module which generates 128
bits of output. We will have a scheduling block which schedules which 4 words
of state go into the quarter round module, and a de-interleaver which takes the
output from the quarter round module and re-orders it to be in the correct
order to combine with the incoming data. there is also an addition in there
somewhere.
To support AEAD, The first round becomes the key for the Poly1305 block. This
can be done in parallel with the second round, which becomes the cipher, at the
expense of double the gates. Otherwise, there would be a delay in between
packets as this is generated.