# Overall Notes We need to support 25Gbps, and we will have 2 datapaths, tx and rx at 128 bit datapth, this is 200MHz, but lets aim for 250MHz # ChaCha20 Notes Chacha20 operates on 512 bit blocks. Each round is made of 4 quarter rounds, which are the same ecept for which 32 bit is used. We can use the same 32 bit quarter round 4 times in a row, but we need to store the rest of the round between operations, so memory usage might be similar to if we just did all 4 at once, but the logic would only be 25% as much. Because we switch between odd and even rounds, the data used in one round is not the data used in the other round. # Poly1305 ## Parallel Operation We can calculate in parallel but we need to calculate r^n, where n is the number of parallel stages. Ideally we would have the number of parallel stages be equal to the latency of the full stage, that way we could have it be fully pipelined. For example, if it took 8 cycles per block, we would have 8 parallel calculations. This requires you to calculate r^n, as well as every intermediate value. If we do 8, then we need to calculate r^1, r^2, r^3, etc. This takes log2(n) multiplies (right?) we need r\*r = r^2 r\*r^2 = r^3 r^2\*r^2 = r^4 r^4\*r = r^5 r^2\*r^4 = r^6 r^3\*r^4 = r^7 r^4\*r^4 = r^8 we can do all of these in parallel, so we 4 (n/2) multiply blocks that feed back on themselves, with some kind of FSM to control it. This can be done while another block is being hashed, but there will be a delay between when the key is ready from the chacha block and when the powers are ready, so there needs to be a fifo in between. Basically we have to wait until we see that the accumulator was written with our index. At reset though, the acumulator is unwritten? So we need to pretend that it was written Lets just write out what we want to happen: 1. The index starts at 0. We accept new data, and send it through the pipeline 2. We increment the index to 1. 3. We accept new data and send it through the pipeline 4. We increment the index to 2 5. We need to wait until the index 0 is written before we can say we are ready 6. If the index 1 is written then we still need to say we are ready though 7. We can just use the 1 to indicate that is a valid write then? So in the shift register we just need to say whether it is a valid write or not, so always 1? But if we send in 0, then send in 1, then the current index will be 0 and eventually the final index will always be 0. We need to store what the last written one is. We can just say the last written one was 2 I guess We also need an input that tells it to reset the accumulator