36 lines
1.3 KiB
Markdown
36 lines
1.3 KiB
Markdown
# Overall Notes
|
|
|
|
We need to support 25Gbps, and we will have 2 datapaths, tx and rx
|
|
|
|
at 128 bit datapth, this is 200MHz, but lets aim for 250MHz
|
|
|
|
|
|
# ChaCha20 Notes
|
|
|
|
Chacha20 operates on 512 bit blocks. Each round is made of 4 quarter
|
|
rounds, which are the same ecept for which 32 bit is used. We can
|
|
use the same 32 bit quarter round 4 times in a row, but we need to
|
|
store the rest of the round between operations, so memory usage
|
|
might be similar to if we just did all 4 at once, but the logic
|
|
would only be 25% as much. Because we switch between odd and even
|
|
rounds, the data used in one round is not the data used in the other
|
|
round.
|
|
|
|
|
|
# Poly1305
|
|
|
|
## Parallel Operation
|
|
|
|
We can calculate in parallel but we need to calculate r^n, where n is the number of
|
|
parallel stages. Ideally we would have the number of parallel stages be equal to the
|
|
latency of the full stage, that way we could have it be fully pipelined. For
|
|
example, if it took 8 cycles per block, we would have 8 parallel calculations. This
|
|
requires you to calculate r^n, as well as every intermediate value. If we do 8,
|
|
|
|
then we need to calculate r^1, r^2, r^3, etc. This takes log2(n) multiplies (right?)
|
|
|
|
we need
|
|
|
|
r\*r = r^2
|
|
r\*r^2 = r^3 r^2\*r^2 = r^4
|
|
r^4\*r = r^5 r^2\*r^4 = r^6 r^3\*r^4 = r^7 r^4\*r^4 = r^8 |