66 lines
2.6 KiB
Markdown
66 lines
2.6 KiB
Markdown
# Overall Notes
|
|
|
|
We need to support 25Gbps, and we will have 2 datapaths, tx and rx
|
|
|
|
at 128 bit datapth, this is 200MHz, but lets aim for 250MHz
|
|
|
|
|
|
# ChaCha20 Notes
|
|
|
|
Chacha20 operates on 512 bit blocks. Each round is made of 4 quarter
|
|
rounds, which are the same ecept for which 32 bit is used. We can
|
|
use the same 32 bit quarter round 4 times in a row, but we need to
|
|
store the rest of the round between operations, so memory usage
|
|
might be similar to if we just did all 4 at once, but the logic
|
|
would only be 25% as much. Because we switch between odd and even
|
|
rounds, the data used in one round is not the data used in the other
|
|
round.
|
|
|
|
|
|
# Poly1305
|
|
|
|
## Parallel Operation
|
|
|
|
We can calculate in parallel but we need to calculate r^n, where n is the number of
|
|
parallel stages. Ideally we would have the number of parallel stages be equal to the
|
|
latency of the full stage, that way we could have it be fully pipelined. For
|
|
example, if it took 8 cycles per block, we would have 8 parallel calculations. This
|
|
requires you to calculate r^n, as well as every intermediate value. If we do 8,
|
|
|
|
then we need to calculate r^1, r^2, r^3, etc. This takes log2(n) multiplies (right?)
|
|
|
|
we need
|
|
|
|
r\*r = r^2
|
|
r\*r^2 = r^3 r^2\*r^2 = r^4
|
|
r^4\*r = r^5 r^2\*r^4 = r^6 r^3\*r^4 = r^7 r^4\*r^4 = r^8
|
|
|
|
we can do all of these in parallel, so we 4 (n/2) multiply blocks that feed back
|
|
on themselves, with some kind of FSM to control it. This can be done while another
|
|
block is being hashed, but there will be a delay between when the key is ready from
|
|
the chacha block and when the powers are ready, so there needs to be a fifo in between.
|
|
|
|
|
|
Basically we have to wait until we see that the accumulator was written with our index.
|
|
At reset though, the acumulator is unwritten? So we need to pretend that it was written
|
|
|
|
Lets just write out what we want to happen:
|
|
|
|
1. The index starts at 0. We accept new data, and send it through the pipeline
|
|
2. We increment the index to 1.
|
|
3. We accept new data and send it through the pipeline
|
|
4. We increment the index to 2
|
|
5. We need to wait until the index 0 is written before we can say we are ready
|
|
6. If the index 1 is written then we still need to say we are ready though
|
|
7. We can just use the 1 to indicate that is a valid write then?
|
|
|
|
So in the shift register we just need to say whether it is a valid write or not,
|
|
so always 1?
|
|
|
|
But if we send in 0, then send in 1, then the current index will be 0
|
|
and eventually the final index will always be 0. We need to store what
|
|
the last written one is.
|
|
|
|
We can just say the last written one was 2 I guess
|
|
|
|
We also need an input that tells it to reset the accumulator |