-------------------------------------------------------------------------------- Readback mux layer -------------------------------------------------------------------------------- Use a large always_comb block + many if statements that select the read data based on the cpuif address. Loops are handled the same way as address decode. Other options that were considered: - Flat case statement con: Difficult to represent arrays. Essentially requires unrolling con: complicates retiming strategies con: Representing a range (required for externals) is cumbersome. Possible with stacked casez wildcards. - AND field data with strobe, then massive OR reduce This was the strategy prior to v1.3, but turned out to infer more overhead than originally anticipated - Assigning data to a flat register array, then directly indexing via address con: Would work fine, but scales poorly for sparse regblocks. Namely, simulators would likely allocate memory for the entire array - Assign to a flat array that is packed sequentially, then directly indexing using a derived packed index Concern that for sparse regfiles, the translation of addr --> packed index becomes a nontrivial logic function Pros: - Scales well for arrays since loops can be used - Externals work well, as address ranges can be compared - Synthesis results show more efficient logic inference Example: logic [7:0] out; always_comb begin out = '0; for(int i=0; i<64; i++) begin if(i == addr) out = data[i]; end end How to implement retiming: Ideally this would partition the design into several equal sub-regions, but with loop structures, this is pretty difficult.. What if instead, it is partitioned into equal address ranges? First stage compares the lower-half of the address bits. Values are assigned to the appropriate output "bin" logic [7:0] out[8]; always_comb begin for(int i=0; i<8; i++) out[i] = '0; for(int i=0; i<64; i++) begin automatic bit [5:0] this_addr = i; if(this_addr[2:0] == addr[2:0]) out[this_addr[5:3]] = data[i]; end end (not showing retiming ff for `out` and `addr`) The second stage muxes down the resulting bins using the high address bits. If the user up-sizes the address bits, need to check the upper bits to prevent aliasing Assuming min address bit range is [5:0], but it was padded up to [8:0], do the following: logic [7:0] rd_data; always_comb begin if(addr[8:6] != '0) begin // Invalid read range rd_data = '0; end else begin rd_data = out[addr[5:3]]; end end Retiming with external blocks One minor downside is the above scheme does not work well for external blocks that span a range of addresses. Depending on the range, it may span multiple retiming bins which complicates how this would be assigned cleanly. This would be complicated even further with arrays of externals since the span of bins could change depending on the iteration. Since externals can already be retimed, and large fanin of external blocks is likely less of a concern, implement these as a separate readback mux on the side that does not get retimed at all. WARNING: Beware of read/write flop stage asymmetry & race conditions. Eg. If a field is rclr, dont want to sample it after it gets read: addr --> strb --> clear addr --> loooong...retime --> sample rd value Should guarantee that read-sampling happens at the same cycle as any read-modify Forwards response strobe back up to cpu interface layer Variables: From decode: decoded_addr decoded_req decoded_req_is_wr Response: readback_done readback_err readback_data