Files
2025-10-27 19:19:43 -07:00

36 lines
1.3 KiB
Markdown

# Overall Notes
We need to support 25Gbps, and we will have 2 datapaths, tx and rx
at 128 bit datapth, this is 200MHz, but lets aim for 250MHz
# ChaCha20 Notes
Chacha20 operates on 512 bit blocks. Each round is made of 4 quarter
rounds, which are the same ecept for which 32 bit is used. We can
use the same 32 bit quarter round 4 times in a row, but we need to
store the rest of the round between operations, so memory usage
might be similar to if we just did all 4 at once, but the logic
would only be 25% as much. Because we switch between odd and even
rounds, the data used in one round is not the data used in the other
round.
# Poly1305
## Parallel Operation
We can calculate in parallel but we need to calculate r^n, where n is the number of
parallel stages. Ideally we would have the number of parallel stages be equal to the
latency of the full stage, that way we could have it be fully pipelined. For
example, if it took 8 cycles per block, we would have 8 parallel calculations. This
requires you to calculate r^n, as well as every intermediate value. If we do 8,
then we need to calculate r^1, r^2, r^3, etc. This takes log2(n) multiplies (right?)
we need
r\*r = r^2
r\*r^2 = r^3 r^2\*r^2 = r^4
r^4\*r = r^5 r^2\*r^4 = r^6 r^3\*r^4 = r^7 r^4\*r^4 = r^8