Files
crypto/ChaCha20_Poly1305_64/doc/notes.md
Byron Lathi 369e29557c add notes
2025-06-23 00:02:14 -07:00

1.7 KiB

Notes

Since we are designing this for a 64 bit datapath, we need to be able to compute 64 bits every cycle. The ChaCha20 hash works on groups of 16x32, or 512-bit blocks at a time. Logically it might make more sense to have a datapath of 128 bits.

On the other hand, each operation is a 32 bit operation. It might make more sense for timing reasons then to have each operation registered. But will this be able to match the throughput that we need?

Each quarter round generates 4 words. Each cycle updates all 128 bits at once. We can do 4 of the quarter rounds at once, so at the end of each cycle we will generate 512 bits.

At full speed then, the core would generate 512 bits per cycle. but we would only need to generate 64 bits per cycle. We could only do 1 quarter cycle at once, which would only generate 128 bits per cycle, but we would need some sort of structure to reorder the state such that it is ready to xor with the incoming data. We could even make this parameterizable, but that would be the next step if we actually need to support 100Gbps encryption.

So in summary, we will have a single QuarterRound module which generates 128 bits of output. We will have a scheduling block which schedules which 4 words of state go into the quarter round module, and a de-interleaver which takes the output from the quarter round module and re-orders it to be in the correct order to combine with the incoming data. there is also an addition in there somewhere.

To support AEAD, The first round becomes the key for the Poly1305 block. This can be done in parallel with the second round, which becomes the cipher, at the expense of double the gates. Otherwise, there would be a delay in between packets as this is generated.