move docs

This commit is contained in:
Alex Mykyta
2021-12-12 17:10:32 -08:00
parent 3dee090467
commit ee8d74b455
37 changed files with 32 additions and 114 deletions

View File

@@ -0,0 +1,51 @@
--------------------------------------------------------------------------------
Port Declaration
--------------------------------------------------------------------------------
Generates the port declaration of the module:
- Parameters
- rd/wr error response/data behavior
Do missed accesses cause a SLVERR?
Do reads respond with a magic value?
- Pipeline enables
Enable reg stages in various places
- RDL-derived Parameters:
Someday in the future if i ever get around to this: https://github.com/SystemRDL/systemrdl-compiler/issues/58
- Clock/Reset
Single clk
One or more resets
- CPU Bus Interface
Given the bus interface object, emits the IO
This can be flattened ports, or a SV Interface
Regardless, it shall be malleable so that the user can use their favorite
declaration style
- Hardware interface
Two options:
- 2-port struct interface
Everything is rolled into two unpacked structs - inputs and outputs
- Flattened --> NOT DOING
Flatten/Unroll everything
No. not doing. I hate this and dont want to waste time implementing this.
This will NEVER be able to support parameterized regmaps, and just
creates a ton of corner cases i dont care to deal with.
Other IO Signals I need to be aware of:
any signals declared, and used in any references:
field.resetsignal
field.next
... etc ...
any signals declared and marked as cpuif_reset, or field_reset
These override the default rst
If both are defined, be sure to not emit the default
Pretty straightforward (see 17.1)
Also have some notes on this in my general Logbook
Will have to make a call on how these propagate if multiple defined
in different hierarchies
interrupt/halt outputs
See "Interrupts" logbook for explanation
addrmap.errextbus, regfile.errextbus, reg.errextbus
???
Apparently these are inputs

View File

@@ -0,0 +1,77 @@
================================================================================
Summary
================================================================================
RTL interface that provides access to per-field context signals
Regarding signals:
I think RDL-declared signals should actually be part of the hwif input
structure.
Exceptions:
- if the signal instance is at the top-level, it will get promoted to the
top level port list for convenience, and therefore omitted from the struct
================================================================================
Naming Scheme
================================================================================
hwif_out
.my_regblock
.my_reg[X][Y]
.my_field
.value
.anded
hwif_in
.my_regblock
.my_reg[X][Y]
.my_field
.value
.we
.my_signal
.my_fieldreset_signal
================================================================================
Flattened mode? --> NO
================================================================================
If user wants a flattened list of ports,
still use the same hwif_in/out struct internally.
Rather than declaring hwif_in and hwif_out in the port list, declare it internally
Add a mapping layer in the body of the module that performs a ton of assign statements
to map flat signals <-> struct
Alternatively, don't do this at all.
If I want to add a flattened mode, generate a wrapper module instead.
Marking this as YAGNI for now.
================================================================================
IO Signals
================================================================================
Outputs:
field value
If hw readable
bitwise reductions
if anded, ored, xored == True, output a signal
swmod/swacc
event strobes
Inputs:
field value
If hw writable
we/wel
if either is boolean, and true
not part of external hwif if reference
mutually exclusive
hwclr/hwset
if either is boolean, and true
not part of external hwif if reference
incr/decr
if counter=true, generate BOTH
incrvalue/decrvalue
if either incrwidth/decrwidth are set
signals!
any signal instances instantiated in the scope

View File

@@ -0,0 +1,72 @@
--------------------------------------------------------------------------------
CPU Bus interface layer
--------------------------------------------------------------------------------
Provides an abstraction layer between the outside SoC's bus interface, and the
internal register block's implementation.
Converts a user-selectable bus protocol to generic register file signals.
Upstream Signals:
Signal names are defined in the bus interface class and shall be malleable
to the user.
User can choose a flat signal interface, or a SV interface.
SV interface shall be easy to tweak since various orgs will use different
naming conventions in their library of interface definitions
Downstream Signals:
- cpuif_req
- Single-cycle pulse
- Qualifies the following child signals:
- cpuif_req_is_wr
1 denotes this is a write transfer
- cpuif_addr
Byte address
- cpuif_wr_data
- cpuif_wr_biten
per-bit strobes
some protocols may opt to tie this to all 1's
- cpuif_rd_ack
- Single-cycle pulse
- Qualifies the following child signals:
- cpuif_rd_data
- cpuif_rd_err
- cpuif_wr_ack
- Single-cycle pulse
- Qualifies the following child signals:
- cpuif_wr_err
Misc thoughts
- Internal cpuif_* signals use a strobe-based protocol:
- Unknown, but fixed latency
- Makes for easy pipelining if needed
- Decided to keep cpuif_req signals common for read write:
This will allow address decode logic to be shared for read/write
Downside is split protocols like axi-lite can't have totally separate rd/wr
access lanes, but who cares?
- separate response strobes
Not necessary to use, but this lets me independently pipeline read/write paths.
read path will need more time if readback mux is large
- On multiple outstanding transactions
Currently, cpuif doesnt really support this. Goal was to make it easily pipelineable
without having to backfeed stall logic.
Could still be possible to do a "fly-by" pipeline with a more intelligent cpuif layer
Not worrying about this now.
Implementation:
Implement this mainly as a Jinja template.
Upstream bus intf signals are fetched via busif class properties. Ex:
{{busif.signal('pready')}} <= '1;
This allows the actual SV or flattened signal to be emitted
What protocols do I care about?
- AXI4 Lite
- Ignore AxPROT?
- APB3
- APB4
- Ignore pprot?
- AHB?
- Wishbone
- Generic
breakout the above signals as-is (reassign with a prefix or something)

View File

@@ -0,0 +1,51 @@
--------------------------------------------------------------------------------
Address Decode layer
--------------------------------------------------------------------------------
A bunch of combinational address decodes that generate individual register
req strobes
Possible decode logic styles:
- Big case statement
+ Probably more sim-efficient
- Hard to do loop parameterization
- More annoying to do multiple regs per address
- Big always_comb + One if/else chain
+ Easy to nest loops & parameterize if needed
- sim has a lot to evaluate each time
- More annoying to do multiple regs per address
- implies precedence? Synth tools should be smart enough?
- Big always_comb + inline conditionals <---- DO THIS
+ Easy to nest loops & parameterize if needed
- sim has a lot to evaluate each time
+ Multiple regs per address possible
+ implies address decode parallelism.
?? Should I try using generate loops + assigns?
This would be more explicit parallelism, however some tools may
get upset at multiple assignments to a common struct
Implementation:
Jinja is inappropriate here
Very logic-heavy. Jinja may end up being annoying
Also, not much need for customization here
This may even make sense as a visitor that dumps lines
- visit each reg
- upon entering an array, create for loops
- upon exiting an array, emit 'end'
Make the strobe struct declared locally
No need for it to leave the block
Error handling
If no strobe generated, respond w error?
This is actually pretty expensive to do for writes.
Hold off on this for now.
Reads get this effectively for free in the readback mux.
Implement write response strobes back upstream to cpuif
Eventually allow for optional register stage for strobe struct
Will need to also pipeline the other cpuif signals
ok to discard the cpuif_addr. no longer needed
Downstream Signals:
- access strobes
Encase these into a struct datatype
- is_write + wr_data/wr_bitstrobe

View File

@@ -0,0 +1,163 @@
--------------------------------------------------------------------------------
Field storage / next value layer
--------------------------------------------------------------------------------
Where all the magic happens!!
Any field that implements storage is defined here.
Bigass struct that only contains storage elements
Each field consists of:
- Entries in the storage element struct
- if implements storage - field value
- user extensible values?
- Entries in the combo struct
- if implements storage:
- Field's "next" value
- load-enable strobe
- If counter
various event strobes (overflow/overflow).
These are convenient to generate alongside the field next state logic
- user extensible values?
- an always_comb block:
- generates the "next value" combinational signal
- May generate other intermediate strobes?
incr/decr?
- series of if/else statements that assign the next value in the storage element
Think of this as a flat list of "next state" conditons, ranked by their precedence as follows:
- reset
Actually, handle this in the always_ff
- sw access (if sw precedence)
- onread/onwrite
- hw access
- Counter
beware of clear events and incr/decr events happening simultaneously
- next
- etc
- sw access (if hw precedence)
- onread/onwrite
- always_comb block to also generate write-enable strobes for the actual
storage element
This is better for low-power design
- an always_ff block
Implements the actual storage element
Also a tidy place to abstract the specifics of activehigh/activelow field reset
selection.
TODO:
Scour the RDL spec.
Does this "next state" precedence model hold true in all situations?
TODO:
Think about user-extensibility
Provide a mechanism for users to extend/override field behavior
TODO:
Does the endinness the user sets matter anywhere?
Implementation
Makes sense to use a listener class
Be sure to skip alias registers
--------------------------------------------------------------------------------
NextStateConditional Class
Decribes a single conditional action that determines the next state of a field
Provides information to generate the following content:
if(<conditional>) begin
<assignments>
end
- is_match(self, field: FieldNode) -> bool:
Returns True if this conditional is relevant to the field. If so,
it instructs the FieldBuider that code for this conditional shall be emitted
TODO: better name than "is_match"? More like "is this relevant"
- get_predicate(self, field: FieldNode) -> str:
Returns the rendered conditional text
- get_assignments(self, field: FieldNode) -> List[str]:
Returns a list of rendered assignment strings
This will basically always be two:
<field>.next = <next value>
<field>.load_next = '1;
- get_extra_combo_signals(self, field: FieldNode) -> List[TBD]:
Some conditionals will need to set some extra signals (eg. counter underflow/overflow strobes)
Compiler needs to know to:
- declare these inthe combo struct
- initialize them in the beginning of always_comb
Return something that denotes the following information: (namedtuple?)
- signal name: str
- width: int
- default value assignment: str
Multiple NextStateConditional can declare the same extra combo signal
as long as their definitions agree
--> Assert this
FieldBuilder Class
Describes how to build fields
Contains NextStateConditional definitions
Nested inside the class namespace, define all the NextStateConditional classes
that apply
User can override definitions or add own to extend behavior
NextStateConditional objects are stored in a dictionary as follows:
_conditionals {
assignment_precedence: [
conditional_option_3,
conditional_option_2,
conditional_option_1,
]
}
add_conditional(self, conditional, assignment_precedence):
Inserts the NextStateConditional into the given assignment precedence bin
The last one added to a precedence bin is first in that bin's search order
init_conditionals(self) -> None:
Called from __init__.
loads all possible conditionals into self.conditionals list
This function is to provide a hook for the user to add their own.
Do not do fancy class intospection. Load them explicitly by name like so:
self.add_conditional(MyNextState(), AssignmentPrecedence.SW_ACCESS)
If user wants to extend this class, they can pile onto the bins of conditionals freely!
--------------------------------------------------------------------------------
Misc
--------------------------------------------------------------------------------
What about complex behaviors like a read-clear counter?
if({{software read}})
next = 0
elif({{increment}})
next = prev + 1
--> Implement this by stacking multiple NextStateConditional in the same assignment precedence.
In this case, there would be a special action on software read that would be specific to read-clear counters
this would get inserted ahead of the search order.
Precedence & Search order
There are two layers of priority I need to keep track of:
- Assignment Precedence
RTL precedence of the assignment conditional
- Search order (sp?)
Within an assignment precedence, order in which the NextStateConditional classes are
searched for a match
For assignment precedence, it makes sense to use an integer enumeration for this
since there really aren't too many precedence levels that apply here.
Space out the integer enumerations so that user can reliably insert their own actions, ie:
my_precedence = AssignmentPrecedence.SW_ACCESS + 1
For search order, provide a user API to load a NextStateConditional into
a precedence 'bin'. Pushing into a bin always inserts into the front of the search order
This makes sense since user overrides will always want to be highest priority - and
rule themselves out before falling back to builtin behavior

View File

@@ -0,0 +1,69 @@
--------------------------------------------------------------------------------
Readback mux layer
--------------------------------------------------------------------------------
Implementation:
- Big always_comb block
- Initialize default rd_data value
- Lotsa if statements that operate on reg strb to assign rd_data
- Merges all fields together into reg
- pulls value from storage element struct, or input struct
- Provision for optional flop stage?
Mux Strategy:
Flat case statement:
-- Cant parameterize
+ better performance?
Flatten array then mux:
- First, flatten ALL readback values into an array
Round up the size of the array to next ^2
needs to be fully addressable anyways!
This can be in a combinational block
Initialize the array to the default readback value
then, assign all register values. Use loops where necessary.
Append an extra 'is-valid' bit if I need to slverr on bad reads
- Next, use the read address as an index into this array
- If needed, I can do a staged decode!
Compute the most balanced fanin staging in Python. eg:
64 regs --mux--> 8x8 --mux--> 1
128 regs --mux--> 8x16 --mux--> 1
Favor smaller fanin first. Latter stage should have more fanin since routing congestion will be easier
256 regs --mux--> 16x16 --mux--> 1
- Potential sparseness of this makes me uncomfortable,
but its synthesis SEEMS like it would be really efficient!
- TODO: Rethink this
I feel like people will complain about this
It will likely also be pretty sim-inefficient?
Flat 1-hot array then OR reduce: <-- DO THIS
- Create a bus-wide flat array
eg: 32-bits x N readable registers
- Assign each element:
the readback value of each register
... masked by the register's access strobe
- I could also stuff an extra bit into the array that denotes the read is valid
A missed read will OR reduce down to a 0
- Finally, OR reduce all the elements in the array down to a flat 32-bit bus
- Retiming the large OR fanin can be done by chopping up the array into stages
for 2 stages, sqrt(N) gives each stage's fanin size. Round to favor
more fanin on 2nd stage
3 stages uses cube-root. etc...
- This has the benefit of re-using the address decode logic.
synth can choose to replicate logic if fanout is bad
WARNING:
Beware of read/write flop stage asymmetry & race conditions.
Eg. If a field is rclr, dont want to sample it after it gets read:
addr --> strb --> clear
addr --> loooong...retime --> sample rd value
Should guarantee that read-sampling happens at the same cycle as any read-modify
Forwards response strobe back up to cpu interface layer
TODO:
Dont forget about alias registers here
TODO:
Does the endinness the user sets matter anywhere?

View File

@@ -0,0 +1,9 @@
--------------------------------------------------------------------------------
Output Port mapping layer
--------------------------------------------------------------------------------
Assign to output struct port
Still TBD if this will actually be a distinct layer.
Cosmetically, this might be nicer to interleave with the field section above
Assign storage element & other derived values as requested by properties