basic framework
This commit is contained in:
51
doc/logbooks/template-layers/1-port-declaration
Normal file
51
doc/logbooks/template-layers/1-port-declaration
Normal file
@@ -0,0 +1,51 @@
|
||||
--------------------------------------------------------------------------------
|
||||
Port Declaration
|
||||
--------------------------------------------------------------------------------
|
||||
Generates the port declaration of the module:
|
||||
- Parameters
|
||||
- rd/wr error response/data behavior
|
||||
Do missed accesses cause a SLVERR?
|
||||
Do reads respond with a magic value?
|
||||
- Pipeline enables
|
||||
Enable reg stages in various places
|
||||
|
||||
- RDL-derived Parameters:
|
||||
Someday in the future if i ever get around to this: https://github.com/SystemRDL/systemrdl-compiler/issues/58
|
||||
|
||||
- Clock/Reset
|
||||
Single clk
|
||||
One or more resets
|
||||
|
||||
- CPU Bus Interface
|
||||
Given the bus interface object, emits the IO
|
||||
This can be flattened ports, or a SV Interface
|
||||
Regardless, it shall be malleable so that the user can use their favorite
|
||||
declaration style
|
||||
|
||||
- Hardware interface
|
||||
Two options:
|
||||
- 2-port struct interface
|
||||
Everything is rolled into two unpacked structs - inputs and outputs
|
||||
- Flattened --> NOT DOING
|
||||
Flatten/Unroll everything
|
||||
No. not doing. I hate this and dont want to waste time implementing this.
|
||||
This will NEVER be able to support parameterized regmaps, and just
|
||||
creates a ton of corner cases i dont care to deal with.
|
||||
|
||||
Other IO Signals I need to be aware of:
|
||||
any signals declared, and used in any references:
|
||||
field.resetsignal
|
||||
field.next
|
||||
... etc ...
|
||||
any signals declared and marked as cpuif_reset, or field_reset
|
||||
These override the default rst
|
||||
If both are defined, be sure to not emit the default
|
||||
Pretty straightforward (see 17.1)
|
||||
Also have some notes on this in my general Logbook
|
||||
Will have to make a call on how these propagate if multiple defined
|
||||
in different hierarchies
|
||||
interrupt/halt outputs
|
||||
See "Interrupts" logbook for explanation
|
||||
addrmap.errextbus, regfile.errextbus, reg.errextbus
|
||||
???
|
||||
Apparently these are inputs
|
||||
77
doc/logbooks/template-layers/1.1.hardware-interface
Normal file
77
doc/logbooks/template-layers/1.1.hardware-interface
Normal file
@@ -0,0 +1,77 @@
|
||||
================================================================================
|
||||
Summary
|
||||
================================================================================
|
||||
|
||||
RTL interface that provides access to per-field context signals
|
||||
|
||||
Regarding signals:
|
||||
I think RDL-declared signals should actually be part of the hwif input
|
||||
structure.
|
||||
Exceptions:
|
||||
- if the signal instance is at the top-level, it will get promoted to the
|
||||
top level port list for convenience, and therefore omitted from the struct
|
||||
|
||||
================================================================================
|
||||
Naming Scheme
|
||||
================================================================================
|
||||
|
||||
hwif_out
|
||||
.my_regblock
|
||||
.my_reg[X][Y]
|
||||
.my_field
|
||||
.value
|
||||
.anded
|
||||
|
||||
hwif_in
|
||||
.my_regblock
|
||||
.my_reg[X][Y]
|
||||
.my_field
|
||||
.value
|
||||
.we
|
||||
.my_signal
|
||||
.my_fieldreset_signal
|
||||
|
||||
================================================================================
|
||||
Flattened mode? --> NO
|
||||
================================================================================
|
||||
If user wants a flattened list of ports,
|
||||
still use the same hwif_in/out struct internally.
|
||||
Rather than declaring hwif_in and hwif_out in the port list, declare it internally
|
||||
|
||||
Add a mapping layer in the body of the module that performs a ton of assign statements
|
||||
to map flat signals <-> struct
|
||||
|
||||
Alternatively, don't do this at all.
|
||||
If I want to add a flattened mode, generate a wrapper module instead.
|
||||
|
||||
Marking this as YAGNI for now.
|
||||
|
||||
|
||||
================================================================================
|
||||
IO Signals
|
||||
================================================================================
|
||||
|
||||
Outputs:
|
||||
field value
|
||||
If hw readable
|
||||
bitwise reductions
|
||||
if anded, ored, xored == True, output a signal
|
||||
swmod/swacc
|
||||
event strobes
|
||||
|
||||
Inputs:
|
||||
field value
|
||||
If hw writable
|
||||
we/wel
|
||||
if either is boolean, and true
|
||||
not part of external hwif if reference
|
||||
mutually exclusive
|
||||
hwclr/hwset
|
||||
if either is boolean, and true
|
||||
not part of external hwif if reference
|
||||
incr/decr
|
||||
if counter=true, generate BOTH
|
||||
incrvalue/decrvalue
|
||||
if either incrwidth/decrwidth are set
|
||||
signals!
|
||||
any signal instances instantiated in the scope
|
||||
72
doc/logbooks/template-layers/2-CPUIF
Normal file
72
doc/logbooks/template-layers/2-CPUIF
Normal file
@@ -0,0 +1,72 @@
|
||||
--------------------------------------------------------------------------------
|
||||
CPU Bus interface layer
|
||||
--------------------------------------------------------------------------------
|
||||
Provides an abstraction layer between the outside SoC's bus interface, and the
|
||||
internal register block's implementation.
|
||||
Converts a user-selectable bus protocol to generic register file signals.
|
||||
|
||||
Upstream Signals:
|
||||
Signal names are defined in the bus interface class and shall be malleable
|
||||
to the user.
|
||||
User can choose a flat signal interface, or a SV interface.
|
||||
SV interface shall be easy to tweak since various orgs will use different
|
||||
naming conventions in their library of interface definitions
|
||||
|
||||
Downstream Signals:
|
||||
- cpuif_req
|
||||
- Single-cycle pulse
|
||||
- Qualifies the following child signals:
|
||||
- cpuif_req_is_wr
|
||||
1 denotes this is a write transfer
|
||||
- cpuif_addr
|
||||
Byte address
|
||||
- cpuif_wr_data
|
||||
- cpuif_wr_bitstrb
|
||||
per-bit strobes
|
||||
some protocols may opt to tie this to all 1's
|
||||
- cpuif_rd_ack
|
||||
- Single-cycle pulse
|
||||
- Qualifies the following child signals:
|
||||
- cpuif_rd_data
|
||||
- cpuif_rd_err
|
||||
|
||||
- cpuif_wr_ack
|
||||
- Single-cycle pulse
|
||||
- Qualifies the following child signals:
|
||||
- cpuif_wr_err
|
||||
|
||||
|
||||
Misc thoughts
|
||||
- Internal cpuif_* signals use a strobe-based protocol:
|
||||
- Unknown, but fixed latency
|
||||
- Makes for easy pipelining if needed
|
||||
- Decided to keep cpuif_req signals common for read write:
|
||||
This will allow address decode logic to be shared for read/write
|
||||
Downside is split protocols like axi-lite can't have totally separate rd/wr
|
||||
access lanes, but who cares?
|
||||
- separate response strobes
|
||||
Not necessary to use, but this lets me independently pipeline read/write paths.
|
||||
read path will need more time if readback mux is large
|
||||
- On multiple outstanding transactions
|
||||
Currently, cpuif doesnt really support this. Goal was to make it easily pipelineable
|
||||
without having to backfeed stall logic.
|
||||
Could still be possible to do a "fly-by" pipeline with a more intelligent cpuif layer
|
||||
Not worrying about this now.
|
||||
|
||||
|
||||
Implementation:
|
||||
Implement this mainly as a Jinja template.
|
||||
Upstream bus intf signals are fetched via busif class properties. Ex:
|
||||
{{busif.signal('pready')}} <= '1;
|
||||
This allows the actual SV or flattened signal to be emitted
|
||||
|
||||
What protocols do I care about?
|
||||
- AXI4 Lite
|
||||
- Ignore AxPROT?
|
||||
- APB3
|
||||
- APB4
|
||||
- Ignore pprot?
|
||||
- AHB?
|
||||
- Wishbone
|
||||
- Generic
|
||||
breakout the above signals as-is (reassign with a prefix or something)
|
||||
51
doc/logbooks/template-layers/3-address-decode
Normal file
51
doc/logbooks/template-layers/3-address-decode
Normal file
@@ -0,0 +1,51 @@
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
Address Decode layer
|
||||
--------------------------------------------------------------------------------
|
||||
A bunch of combinational address decodes that generate individual register
|
||||
req strobes
|
||||
|
||||
Possible decode logic styles:
|
||||
- Big case statement
|
||||
+ Probably more sim-efficient
|
||||
- Hard to do loop parameterization
|
||||
- More annoying to do multiple regs per address
|
||||
- Big always_comb + One if/else chain
|
||||
+ Easy to nest loops & parameterize if needed
|
||||
- sim has a lot to evaluate each time
|
||||
- More annoying to do multiple regs per address
|
||||
- implies precedence? Synth tools should be smart enough?
|
||||
- Big always_comb + inline conditionals <---- DO THIS
|
||||
+ Easy to nest loops & parameterize if needed
|
||||
- sim has a lot to evaluate each time
|
||||
+ Multiple regs per address possible
|
||||
+ implies address decode parallelism.
|
||||
?? Should I try using generate loops + assigns?
|
||||
This would be more explicit parallelism, however some tools may
|
||||
get upset at multiple assignments to a common struct
|
||||
|
||||
Implementation:
|
||||
Jinja is inappropriate here
|
||||
Very logic-heavy. Jinja may end up being annoying
|
||||
Also, not much need for customization here
|
||||
This may even make sense as a visitor that dumps lines
|
||||
- visit each reg
|
||||
- upon entering an array, create for loops
|
||||
- upon exiting an array, emit 'end'
|
||||
Make the strobe struct declared locally
|
||||
No need for it to leave the block
|
||||
Error handling
|
||||
If no strobe generated, respond w error?
|
||||
This is actually pretty expensive to do for writes.
|
||||
Hold off on this for now.
|
||||
Reads get this effectively for free in the readback mux.
|
||||
Implement write response strobes back upstream to cpuif
|
||||
Eventually allow for optional register stage for strobe struct
|
||||
Will need to also pipeline the other cpuif signals
|
||||
ok to discard the cpuif_addr. no longer needed
|
||||
|
||||
|
||||
Downstream Signals:
|
||||
- access strobes
|
||||
Encase these into a struct datatype
|
||||
- is_write + wr_data/wr_bitstrobe
|
||||
35
doc/logbooks/template-layers/4-fields
Normal file
35
doc/logbooks/template-layers/4-fields
Normal file
@@ -0,0 +1,35 @@
|
||||
--------------------------------------------------------------------------------
|
||||
Field storage / next value layer
|
||||
--------------------------------------------------------------------------------
|
||||
Where all the magic happens!!
|
||||
|
||||
Any field that implements storage is defined here.
|
||||
Bigass struct that only contains storage elements
|
||||
|
||||
Each field consists of:
|
||||
- an always_ff block
|
||||
- series of if/else statements that assign the next value in the storage element
|
||||
Think of this as a flat list of "next state" conditons, ranked by their precedence as follows:
|
||||
- reset
|
||||
- sw access (if sw precedence)
|
||||
- onread/onwrite
|
||||
- hw access
|
||||
- Counter
|
||||
- next
|
||||
- etc
|
||||
- sw access (if hw precedence)
|
||||
- onread/onwrite
|
||||
|
||||
TODO:
|
||||
What about stuff like read-clear counters that cant lose a count?
|
||||
In a traditional if/else chain, i need to be aware of the fact that its a counter
|
||||
when handling the swaccess case
|
||||
Is it possible to code this in a way where I can isolate the need to know every nuanced case here?
|
||||
this may actually only apply to counters...
|
||||
This is trivial in a 2-process implementation, but i'd rather avoid the overheads
|
||||
|
||||
|
||||
Implementation
|
||||
Makes sense to use a listener class
|
||||
|
||||
Be sure to skip alias registers
|
||||
65
doc/logbooks/template-layers/5-readback-mux
Normal file
65
doc/logbooks/template-layers/5-readback-mux
Normal file
@@ -0,0 +1,65 @@
|
||||
--------------------------------------------------------------------------------
|
||||
Readback mux layer
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Implementation:
|
||||
- Big always_comb block
|
||||
- Initialize default rd_data value
|
||||
- Lotsa if statements that operate on reg strb to assign rd_data
|
||||
- Merges all fields together into reg
|
||||
- pulls value from storage element struct, or input struct
|
||||
- Provision for optional flop stage?
|
||||
|
||||
Mux Strategy:
|
||||
Flat case statement:
|
||||
-- Cant parameterize
|
||||
+ better performance?
|
||||
|
||||
Flatten array then mux:
|
||||
- First, flatten ALL readback values into an array
|
||||
Round up the size of the array to next ^2
|
||||
needs to be fully addressable anyways!
|
||||
This can be in a combinational block
|
||||
Initialize the array to the default readback value
|
||||
then, assign all register values. Use loops where necessary.
|
||||
Append an extra 'is-valid' bit if I need to slverr on bad reads
|
||||
- Next, use the read address as an index into this array
|
||||
- If needed, I can do a staged decode!
|
||||
Compute the most balanced fanin staging in Python. eg:
|
||||
64 regs --mux--> 8x8 --mux--> 1
|
||||
128 regs --mux--> 8x16 --mux--> 1
|
||||
Favor smaller fanin first. Latter stage should have more fanin since routing congestion will be easier
|
||||
256 regs --mux--> 16x16 --mux--> 1
|
||||
- Potential sparseness of this makes me uncomfortable,
|
||||
but its synthesis SEEMS like it would be really efficient!
|
||||
- TODO: Rethink this
|
||||
I feel like people will complain about this
|
||||
It will likely also be pretty sim-inefficient?
|
||||
Flat 1-hot array then OR reduce: <-- DO THIS
|
||||
- Create a bus-wide flat array
|
||||
eg: 32-bits x N readable registers
|
||||
- Assign each element:
|
||||
the readback value of each register
|
||||
... masked by the register's access strobe
|
||||
- I could also stuff an extra bit into the array that denotes the read is valid
|
||||
A missed read will OR reduce down to a 0
|
||||
- Finally, OR reduce all the elements in the array down to a flat 32-bit bus
|
||||
- Retiming the large OR fanin can be done by chopping up the array into stages
|
||||
for 2 stages, sqrt(N) gives each stage's fanin size. Round to favor
|
||||
more fanin on 2nd stage
|
||||
3 stages uses cube-root. etc...
|
||||
- This has the benefit of re-using the address decode logic.
|
||||
synth can choose to replicate logic if fanout is bad
|
||||
|
||||
|
||||
WARNING:
|
||||
Beware of read/write flop stage asymmetry & race conditions.
|
||||
Eg. If a field is rclr, dont want to sample it after it gets read:
|
||||
addr --> strb --> clear
|
||||
addr --> loooong...retime --> sample rd value
|
||||
Should guarantee that read-sampling happens at the same cycle as any read-modify
|
||||
|
||||
|
||||
Forwards response strobe back up to cpu interface layer
|
||||
|
||||
Dont forget about alias registers here
|
||||
9
doc/logbooks/template-layers/6-output-port-mapping
Normal file
9
doc/logbooks/template-layers/6-output-port-mapping
Normal file
@@ -0,0 +1,9 @@
|
||||
--------------------------------------------------------------------------------
|
||||
Output Port mapping layer
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Assign to output struct port
|
||||
|
||||
Still TBD if this will actually be a distinct layer.
|
||||
Cosmetically, this might be nicer to interleave with the field section above
|
||||
Assign storage element & other derived values as requested by properties
|
||||
Reference in New Issue
Block a user