Initial Commit - Forked from PeakRDL-regblock @ a440cc19769069be831d267505da4f3789a26695

This commit is contained in:
Arnav Sacheti
2025-10-10 22:28:36 -07:00
commit 9bf5cd1e68
308 changed files with 19414 additions and 0 deletions

View File

@@ -0,0 +1,51 @@
--------------------------------------------------------------------------------
Port Declaration
--------------------------------------------------------------------------------
Generates the port declaration of the module:
- Parameters
- rd/wr error response/data behavior
Do missed accesses cause a SLVERR?
Do reads respond with a magic value?
- Pipeline enables
Enable reg stages in various places
- RDL-derived Parameters:
Someday in the future if i ever get around to this: https://github.com/SystemRDL/systemrdl-compiler/issues/58
- Clock/Reset
Single clk
One or more resets
- CPU Bus Interface
Given the bus interface object, emits the IO
This can be flattened ports, or a SV Interface
Regardless, it shall be malleable so that the user can use their favorite
declaration style
- Hardware interface
Two options:
- 2-port struct interface
Everything is rolled into two unpacked structs - inputs and outputs
- Flattened --> NOT DOING
Flatten/Unroll everything
No. not doing. I hate this and dont want to waste time implementing this.
This will NEVER be able to support parameterized regmaps, and just
creates a ton of corner cases i dont care to deal with.
Other IO Signals I need to be aware of:
any signals declared, and used in any references:
field.resetsignal
field.next
... etc ...
any signals declared and marked as cpuif_reset, or field_reset
These override the default rst
If both are defined, be sure to not emit the default
Pretty straightforward (see 17.1)
Also have some notes on this in my general Logbook
Will have to make a call on how these propagate if multiple defined
in different hierarchies
interrupt/halt outputs
See "Interrupts" logbook for explanation
addrmap.errextbus, regfile.errextbus, reg.errextbus
???
Apparently these are inputs

View File

@@ -0,0 +1,103 @@
================================================================================
Summary
================================================================================
RTL interface that provides access to per-field context signals
Regarding signals:
RDL-declared signals are part of the hwif input structure.
Only include them if they are referenced by the design (need to scan the
full design anyways, so may as well filter out unreferenced ones)
It is possible to use signals declared in a parent scope.
This means that not all signals will be discovered by a hierarchical listener alone
Need to scan ALL assigned properties for signal references too.
- get signal associated with top node's cpuif_reset helper property, if any
- collect all field_resets
X check all signal instances in the hier tree
- search parents of top node for the first field_reset signal, if any
This is WAY less expensive than querying EACH field's resetsignal property
X Check all explicitly assigned properties
only need to do this for fields
Collect all of these into the following:
- If inside the hier, add to a list of paths
- if outside the hier, add to a dict of path:SignalNode
These are all the signals in-use by the design
Pass list into the hwif generator
If the hwif generator encounters a signal during traversal:
check if it exists in the signal path list
out-of-hier signals are inserted outside of the hwif_in as standalone signals.
For now, just use their plain inst names. If I need to uniquify them i can add that later.
I should at least check against a list of known "dirty words". Seems very likely someone will choose
a signal called "rst".
Prefix with usersig_ if needed
================================================================================
Naming Scheme
================================================================================
hwif_out
.my_regblock
.my_reg[X][Y]
.my_field
.value
.anded
hwif_in
.my_regblock
.my_reg[X][Y]
.my_field
.value
.we
.my_signal
.my_fieldreset_signal
================================================================================
Flattened mode? --> NO
================================================================================
If user wants a flattened list of ports,
still use the same hwif_in/out struct internally.
Rather than declaring hwif_in and hwif_out in the port list, declare it internally
Add a mapping layer in the body of the module that performs a ton of assign statements
to map flat signals <-> struct
Alternatively, don't do this at all.
If I want to add a flattened mode, generate a wrapper module instead.
Marking this as YAGNI for now.
================================================================================
IO Signals
================================================================================
Outputs:
field value
If hw readable
bitwise reductions
if anded, ored, xored == True, output a signal
swmod/swacc
event strobes
Inputs:
field value
If hw writable
we/wel
if either is boolean, and true
not part of external hwif if reference
mutually exclusive
hwclr/hwset
if either is boolean, and true
not part of external hwif if reference
incr/decr
if counter=true, generate BOTH
incrvalue/decrvalue
if either incrwidth/decrwidth are set
signals!
any signal instances instantiated in the scope

View File

@@ -0,0 +1,72 @@
--------------------------------------------------------------------------------
CPU Bus interface layer
--------------------------------------------------------------------------------
Provides an abstraction layer between the outside SoC's bus interface, and the
internal register block's implementation.
Converts a user-selectable bus protocol to generic register file signals.
Upstream Signals:
Signal names are defined in the bus interface class and shall be malleable
to the user.
User can choose a flat signal interface, or a SV interface.
SV interface shall be easy to tweak since various orgs will use different
naming conventions in their library of interface definitions
Downstream Signals:
- cpuif_req
- Single-cycle pulse
- Qualifies the following child signals:
- cpuif_req_is_wr
1 denotes this is a write transfer
- cpuif_addr
Byte address
- cpuif_wr_data
- cpuif_wr_biten
per-bit strobes
some protocols may opt to tie this to all 1's
- cpuif_rd_ack
- Single-cycle pulse
- Qualifies the following child signals:
- cpuif_rd_data
- cpuif_rd_err
- cpuif_wr_ack
- Single-cycle pulse
- Qualifies the following child signals:
- cpuif_wr_err
Misc thoughts
- Internal cpuif_* signals use a strobe-based protocol:
- Unknown, but fixed latency
- Makes for easy pipelining if needed
- Decided to keep cpuif_req signals common for read write:
This will allow address decode logic to be shared for read/write
Downside is split protocols like axi-lite can't have totally separate rd/wr
access lanes, but who cares?
- separate response strobes
Not necessary to use, but this lets me independently pipeline read/write paths.
read path will need more time if readback mux is large
- On multiple outstanding transactions
Currently, cpuif doesnt really support this. Goal was to make it easily pipelineable
without having to backfeed stall logic.
Could still be possible to do a "fly-by" pipeline with a more intelligent cpuif layer
Not worrying about this now.
Implementation:
Implement this mainly as a Jinja template.
Upstream bus intf signals are fetched via busif class properties. Ex:
{{busif.signal('pready')}} <= '1;
This allows the actual SV or flattened signal to be emitted
What protocols do I care about?
- AXI4 Lite
- Ignore AxPROT?
- APB3
- APB4
- Ignore pprot?
- AHB?
- Wishbone
- Generic
breakout the above signals as-is (reassign with a prefix or something)

View File

@@ -0,0 +1,51 @@
--------------------------------------------------------------------------------
Address Decode layer
--------------------------------------------------------------------------------
A bunch of combinational address decodes that generate individual register
req strobes
Possible decode logic styles:
- Big case statement
+ Probably more sim-efficient
- Hard to do loop parameterization
- More annoying to do multiple regs per address
- Big always_comb + One if/else chain
+ Easy to nest loops & parameterize if needed
- sim has a lot to evaluate each time
- More annoying to do multiple regs per address
- implies precedence? Synth tools should be smart enough?
- Big always_comb + inline conditionals <---- DO THIS
+ Easy to nest loops & parameterize if needed
- sim has a lot to evaluate each time
+ Multiple regs per address possible
+ implies address decode parallelism.
?? Should I try using generate loops + assigns?
This would be more explicit parallelism, however some tools may
get upset at multiple assignments to a common struct
Implementation:
Jinja is inappropriate here
Very logic-heavy. Jinja may end up being annoying
Also, not much need for customization here
This may even make sense as a visitor that dumps lines
- visit each reg
- upon entering an array, create for loops
- upon exiting an array, emit 'end'
Make the strobe struct declared locally
No need for it to leave the block
Error handling
If no strobe generated, respond w error?
This is actually pretty expensive to do for writes.
Hold off on this for now.
Reads get this effectively for free in the readback mux.
Implement write response strobes back upstream to cpuif
Eventually allow for optional register stage for strobe struct
Will need to also pipeline the other cpuif signals
ok to discard the cpuif_addr. no longer needed
Downstream Signals:
- access strobes
Encase these into a struct datatype
- is_write + wr_data/wr_bitstrobe

View File

@@ -0,0 +1,163 @@
--------------------------------------------------------------------------------
Field storage / next value layer
--------------------------------------------------------------------------------
Where all the magic happens!!
Any field that implements storage is defined here.
Bigass struct that only contains storage elements
Each field consists of:
- Entries in the storage element struct
- if implements storage - field value
- user extensible values?
- Entries in the combo struct
- if implements storage:
- Field's "next" value
- load-enable strobe
- If counter
various event strobes (overflow/overflow).
These are convenient to generate alongside the field next state logic
- user extensible values?
- an always_comb block:
- generates the "next value" combinational signal
- May generate other intermediate strobes?
incr/decr?
- series of if/else statements that assign the next value in the storage element
Think of this as a flat list of "next state" conditons, ranked by their precedence as follows:
- reset
Actually, handle this in the always_ff
- sw access (if sw precedence)
- onread/onwrite
- hw access
- Counter
beware of clear events and incr/decr events happening simultaneously
- next
- etc
- sw access (if hw precedence)
- onread/onwrite
- always_comb block to also generate write-enable strobes for the actual
storage element
This is better for low-power design
- an always_ff block
Implements the actual storage element
Also a tidy place to abstract the specifics of activehigh/activelow field reset
selection.
TODO:
Scour the RDL spec.
Does this "next state" precedence model hold true in all situations?
TODO:
Think about user-extensibility
Provide a mechanism for users to extend/override field behavior
TODO:
Does the endianness the user sets matter anywhere?
Implementation
Makes sense to use a listener class
Be sure to skip alias registers
--------------------------------------------------------------------------------
NextStateConditional Class
Describes a single conditional action that determines the next state of a field
Provides information to generate the following content:
if(<conditional>) begin
<assignments>
end
- is_match(self, field: FieldNode) -> bool:
Returns True if this conditional is relevant to the field. If so,
it instructs the FieldBuider that code for this conditional shall be emitted
TODO: better name than "is_match"? More like "is this relevant"
- get_predicate(self, field: FieldNode) -> str:
Returns the rendered conditional text
- get_assignments(self, field: FieldNode) -> List[str]:
Returns a list of rendered assignment strings
This will basically always be two:
<field>.next = <next value>
<field>.load_next = '1;
- get_extra_combo_signals(self, field: FieldNode) -> List[TBD]:
Some conditionals will need to set some extra signals (eg. counter underflow/overflow strobes)
Compiler needs to know to:
- declare these inthe combo struct
- initialize them in the beginning of always_comb
Return something that denotes the following information: (namedtuple?)
- signal name: str
- width: int
- default value assignment: str
Multiple NextStateConditional can declare the same extra combo signal
as long as their definitions agree
--> Assert this
FieldBuilder Class
Describes how to build fields
Contains NextStateConditional definitions
Nested inside the class namespace, define all the NextStateConditional classes
that apply
User can override definitions or add own to extend behavior
NextStateConditional objects are stored in a dictionary as follows:
_conditionals {
assignment_precedence: [
conditional_option_1,
conditional_option_2,
conditional_option_3,
]
}
add_conditional(self, conditional, assignment_precedence):
Inserts the NextStateConditional into the given assignment precedence bin
The first one added to a precedence bin is first in that bin's search order
init_conditionals(self) -> None:
Called from __init__.
loads all possible conditionals into self.conditionals list
This function is to provide a hook for the user to add their own.
Do not do fancy class introspection. Load them explicitly by name like so:
self.add_conditional(MyNextState(), AssignmentPrecedence.SW_ACCESS)
If user wants to extend this class, they can pile onto the bins of conditionals freely!
--------------------------------------------------------------------------------
Misc
--------------------------------------------------------------------------------
What about complex behaviors like a read-clear counter?
if({{software read}})
next = 0
elif({{increment}})
next = prev + 1
--> Implement this by stacking multiple NextStateConditional in the same assignment precedence.
In this case, there would be a special action on software read that would be specific to read-clear counters
this would get inserted ahead of the search order.
Precedence & Search order
There are two layers of priority I need to keep track of:
- Assignment Precedence
RTL precedence of the assignment conditional
- Search order (sp?)
Within an assignment precedence, order in which the NextStateConditional classes are
searched for a match
For assignment precedence, it makes sense to use an integer enumeration for this
since there really aren't too many precedence levels that apply here.
Space out the integer enumerations so that user can reliably insert their own actions, ie:
my_precedence = AssignmentPrecedence.SW_ACCESS + 1
For search order, provide a user API to load a NextStateConditional into
a precedence 'bin'. Pushing into a bin always inserts into the front of the search order
This makes sense since user overrides will always want to be highest priority - and
rule themselves out before falling back to builtin behavior

View File

@@ -0,0 +1,49 @@
--------------------------------------------------------------------------------
Readback mux layer
--------------------------------------------------------------------------------
Implementation:
- Big always_comb block
- Initialize default rd_data value
- Lotsa if statements that operate on reg strb to assign rd_data
- Merges all fields together into reg
- pulls value from storage element struct, or input struct
- Provision for optional flop stage?
Mux Strategy:
Flat case statement:
-- Cant parameterize
+ better performance?
Flat 1-hot array then OR reduce:
- Create a bus-wide flat array
eg: 32-bits x N readable registers
- Assign each element:
the readback value of each register
... masked by the register's access strobe
- I could also stuff an extra bit into the array that denotes the read is valid
A missed read will OR reduce down to a 0
- Finally, OR reduce all the elements in the array down to a flat 32-bit bus
- Retiming the large OR fanin can be done by chopping up the array into stages
for 2 stages, sqrt(N) gives each stage's fanin size. Round to favor
more fanin on 2nd stage
3 stages uses cube-root. etc...
- This has the benefit of re-using the address decode logic.
synth can choose to replicate logic if fanout is bad
WARNING:
Beware of read/write flop stage asymmetry & race conditions.
Eg. If a field is rclr, dont want to sample it after it gets read:
addr --> strb --> clear
addr --> loooong...retime --> sample rd value
Should guarantee that read-sampling happens at the same cycle as any read-modify
Forwards response strobe back up to cpu interface layer
TODO:
Dont forget about alias registers here
TODO:
Does the endinness the user sets matter anywhere?

View File

@@ -0,0 +1,9 @@
--------------------------------------------------------------------------------
Output Port mapping layer
--------------------------------------------------------------------------------
Assign to output struct port
Still TBD if this will actually be a distinct layer.
Cosmetically, this might be nicer to interleave with the field section above
Assign storage element & other derived values as requested by properties