Files
PeakRDL-regblock/docs/dev_notes/template-layers/5-readback-mux

105 lines
3.9 KiB
Plaintext

--------------------------------------------------------------------------------
Readback mux layer
--------------------------------------------------------------------------------
Use a large always_comb block + many if statements that select the read data
based on the cpuif address.
Loops are handled the same way as address decode.
Other options that were considered:
- Flat case statement
con: Difficult to represent arrays. Essentially requires unrolling
con: complicates retiming strategies
con: Representing a range (required for externals) is cumbersome. Possible with stacked casez wildcards.
- AND field data with strobe, then massive OR reduce
This was the strategy prior to v1.3, but turned out to infer more overhead
than originally anticipated
- Assigning data to a flat register array, then directly indexing via address
con: Would work fine, but scales poorly for sparse regblocks.
Namely, simulators would likely allocate memory for the entire array
- Assign to a flat array that is packed sequentially, then directly indexing using a derived packed index
Concern that for sparse regfiles, the translation of addr --> packed index
becomes a nontrivial logic function
Pros:
- Scales well for arrays since loops can be used
- Externals work well, as address ranges can be compared
- Synthesis results show more efficient logic inference
Example:
logic [7:0] out;
always_comb begin
out = '0;
for(int i=0; i<64; i++) begin
if(i == addr) out = data[i];
end
end
How to implement retiming:
Ideally this would partition the design into several equal sub-regions, but
with loop structures, this is pretty difficult..
What if instead, it is partitioned into equal address ranges?
First stage compares the lower-half of the address bits.
Values are assigned to the appropriate output "bin"
logic [7:0] out[8];
always_comb begin
for(int i=0; i<8; i++) out[i] = '0;
for(int i=0; i<64; i++) begin
automatic bit [5:0] this_addr = i;
if(this_addr[2:0] == addr[2:0]) out[this_addr[5:3]] = data[i];
end
end
(not showing retiming ff for `out` and `addr`)
The second stage muxes down the resulting bins using the high address bits.
If the user up-sizes the address bits, need to check the upper bits to prevent aliasing
Assuming min address bit range is [5:0], but it was padded up to [8:0], do the following:
logic [7:0] rd_data;
always_comb begin
if(addr[8:6] != '0) begin
// Invalid read range
rd_data = '0;
end else begin
rd_data = out[addr[5:3]];
end
end
Retiming with external blocks
One minor downside is the above scheme does not work well for external blocks
that span a range of addresses. Depending on the range, it may span multiple
retiming bins which complicates how this would be assigned cleanly.
This would be complicated even further with arrays of externals since the
span of bins could change depending on the iteration.
Since externals can already be retimed, and large fanin of external blocks
is likely less of a concern, implement these as a separate readback mux on
the side that does not get retimed at all.
WARNING:
Beware of read/write flop stage asymmetry & race conditions.
Eg. If a field is rclr, dont want to sample it after it gets read:
addr --> strb --> clear
addr --> loooong...retime --> sample rd value
Should guarantee that read-sampling happens at the same cycle as any read-modify
Forwards response strobe back up to cpu interface layer
Variables:
From decode:
decoded_addr
decoded_req
decoded_req_is_wr
Response:
readback_done
readback_err
readback_data