1.3 KiB
Overall Notes
We need to support 25Gbps, and we will have 2 datapaths, tx and rx
at 128 bit datapth, this is 200MHz, but lets aim for 250MHz
ChaCha20 Notes
Chacha20 operates on 512 bit blocks. Each round is made of 4 quarter rounds, which are the same ecept for which 32 bit is used. We can use the same 32 bit quarter round 4 times in a row, but we need to store the rest of the round between operations, so memory usage might be similar to if we just did all 4 at once, but the logic would only be 25% as much. Because we switch between odd and even rounds, the data used in one round is not the data used in the other round.
Poly1305
Parallel Operation
We can calculate in parallel but we need to calculate r^n, where n is the number of parallel stages. Ideally we would have the number of parallel stages be equal to the latency of the full stage, that way we could have it be fully pipelined. For example, if it took 8 cycles per block, we would have 8 parallel calculations. This requires you to calculate r^n, as well as every intermediate value. If we do 8,
then we need to calculate r^1, r^2, r^3, etc. This takes log2(n) multiplies (right?)
we need
r*r = r^2 r*r^2 = r^3 r^2*r^2 = r^4 r^4*r = r^5 r^2*r^4 = r^6 r^3*r^4 = r^7 r^4*r^4 = r^8