diff --git a/docs/api.rst b/docs/api.rst new file mode 100644 index 0000000..8101436 --- /dev/null +++ b/docs/api.rst @@ -0,0 +1,5 @@ +Exporter API +============ + +.. autoclass:: peakrdl.regblock.RegblockExporter + :members: diff --git a/docs/architecture.rst b/docs/architecture.rst index 8492ad9..4de1188 100644 --- a/docs/architecture.rst +++ b/docs/architecture.rst @@ -49,6 +49,10 @@ fanin re-timing stage can be enabled. This stage is automatically inserted at a balanced point in the read-data reduction so that fanin and logic-levels are optimally reduced. +.. figure:: diagrams/readback.png + :width: 65% + :align: center + A second optional read response retiming register can be enabled in-line with the path back to the CPU interface layer. This can be useful if the CPU interface protocol used has a fully combinational response path, and the design's complexity requires diff --git a/docs/conf.py b/docs/conf.py index ccb7faf..82fc076 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -29,6 +29,8 @@ author = 'Alex Mykyta' # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # ones. extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.napoleon', "sphinxcontrib.wavedrom", ] render_using_wavedrompy = True diff --git a/docs/cpuif/addressing.rst b/docs/cpuif/addressing.rst deleted file mode 100644 index b1b758f..0000000 --- a/docs/cpuif/addressing.rst +++ /dev/null @@ -1,9 +0,0 @@ -CPU Interface Addressing -======================== - -TODO: write about the following: - -* cpuif addressing is always 0-based (aka relative to the block's root) -* It is up to the decoder to handle the offset -* Address bus width is pruned down -* recommend that the decoder/interconnect reserve a full ^2 block of addresses to simplify decoding diff --git a/docs/cpuif/apb3.rst b/docs/cpuif/apb3.rst index 4c44e71..351f674 100644 --- a/docs/cpuif/apb3.rst +++ b/docs/cpuif/apb3.rst @@ -1,11 +1,31 @@ -AMBA APB3 -========= +AMBA 3 APB +========== -TODO: Describe the following +Implements the register block using an +`AMBA 3 APB `_ +CPU interface. -* List of interface signals +The APB3 CPU interface comes in two i/o port flavors: - * interface name & modports (link to advanced topics in case user wants to override) - * flattened equivalents +SystemVerilog Interface + Class: :class:`peakrdl.regblock.cpuif.apb3.APB3_Cpuif` -* Download link to SV interface definition + Interface Definition: :download:`apb3_intf.sv <../../test/lib/cpuifs/apb3/apb3_intf.sv>` + +Flattened inputs/outputs + Flattens the interface into descrete input and output ports. + + Class: :class:`peakrdl.regblock.cpuif.apb3.APB3_Cpuif_flattened` + + +.. warning:: + Some IP vendors will incorrectly implement the address signalling + assuming word-addresses. (that each increment of ``PADDR`` is the next word) + + For this exporter, values on the interface's ``PADDR`` input are interpreted + as byte-addresses. (a 32-bit APB bus increments ``PADDR`` in steps of 4) + Although APB protocol does not allow for unaligned transfers, this is in + accordance to the official AMBA bus specification. + + Be sure to double-check the interpretation of your interconnect IP. A simple + bit-shift operation can be used to correct this if necessary. diff --git a/docs/cpuif/axi4lite.rst b/docs/cpuif/axi4lite.rst index 465626b..a8861f7 100644 --- a/docs/cpuif/axi4lite.rst +++ b/docs/cpuif/axi4lite.rst @@ -1,11 +1,29 @@ AMBA AXI4-Lite ============== -TODO: Describe the following +Implements the register block using an +`AMBA AXI4-Lite `_ +CPU interface. -* List of interface signals +The AXI4-Lite CPU interface comes in two i/o port flavors: - * interface name & modports (link to advanced topics in case user wants to override) - * flattened equivalents +SystemVerilog Interface + Class: :class:`peakrdl.regblock.cpuif.axi4lite.AXI4Lite_Cpuif` -* Download link to SV interface definition + Interface Definition: :download:`apb3_intf.sv <../../test/lib/cpuifs/axi4lite/axi4lite_intf.sv>` + +Flattened inputs/outputs + Flattens the interface into descrete input and output ports. + + Class: :class:`peakrdl.regblock.cpuif.axi4lite.AXI4Lite_Cpuif_flattened` + + +Pipelined Performance +--------------------- +This implementation of the AXI4-Lite interface supports transaction pipelining +which can significantly improve performance of back-to-back transfers. + +In order to support transaction pipelining, the CPU interface will accept multiple +concurrent transactions. The number of outstanding transactions allowed is automatically +determined based on the register file pipeline depth (affected by retiming options), +and influences the depth of the internal transaction response skid buffer. diff --git a/docs/cpuif/introduction.rst b/docs/cpuif/introduction.rst new file mode 100644 index 0000000..3d622de --- /dev/null +++ b/docs/cpuif/introduction.rst @@ -0,0 +1,26 @@ +Introduction +============ + +The CPU interface logic layer provides an abstraction between the +application-specific bus protocol and the internal register file logic. +When exporting a design, you can select from a variety of popular CPU interface +protocols. These are described in more detail in the pages that follow. + + +Addressing +^^^^^^^^^^ + +The regblock exporter will always generate its address decoding logic using local +address offsets. The absolute address offset of your device shall be +handled by your system interconnect, and present addresses to the regblock that +only include the local offset. + +For example, consider a fictional AXI4-Lite device that: + +- Consumes 4 kB of address space (``0x000``-``0xFFF``). +- The device is instantiated in your system at global address ``0x80_0000``-``0x80_0FFF``. +- After decoding transactions destined to the device, the system interconnect shall + ensure that AxADDR values are presented to the device as relative addresses - within + the range of ``0x000``-``0xFFF``. +- If care is taken to align the global address offset to the size of the device, + creating a relative address is as simple as pruning down address bits. diff --git a/docs/cpuif/passthrough.rst b/docs/cpuif/passthrough.rst new file mode 100644 index 0000000..9b2640b --- /dev/null +++ b/docs/cpuif/passthrough.rst @@ -0,0 +1,9 @@ +CPUIF Passthrough +================= + +This CPUIF mode bypasses the protocol converter stage and directly exposes the +internal CPUIF handshake signals to the user. + +Class: :class:`peakrdl.regblock.cpuif.passthrough.PassthroughCpuif` + +For more details on the protocol itself, see: :ref:`cpuif_protocol`. diff --git a/docs/diagrams/diagrams.odg b/docs/diagrams/diagrams.odg index 5d500fe..fdc1d82 100644 Binary files a/docs/diagrams/diagrams.odg and b/docs/diagrams/diagrams.odg differ diff --git a/docs/diagrams/readback.png b/docs/diagrams/readback.png new file mode 100644 index 0000000..9508650 Binary files /dev/null and b/docs/diagrams/readback.png differ diff --git a/docs/hwif.rst b/docs/hwif.rst index c464832..d7f3ada 100644 --- a/docs/hwif.rst +++ b/docs/hwif.rst @@ -1,8 +1,51 @@ Hardware Interface ------------------ -TODO: Describe the following +The generated register block will present the entire hardware interface to the user +using two struct ports: -* hwif_in / hwif_out structs and their contents -* shorthand notation used in this reference: ``hwif_in..xyz`` -* Example of how to peel back a sub-hierarchy struct +* ``hwif_in`` +* ``hwif_out`` + +All field inputs and outputs as well as signals are consolidated into these +struct ports. The presence of each depends on the specific contents of the desgin +being exported. + + +Using structs for the hardware interface has the following benefits: + +* Preserves register map component grouping, arrays, and hierarchy. +* Avoids naming collisions and cumbersome signal name flattening. +* Allows for more natural mapping and distribution of register block signals to a design's hardware components. +* Use of unpacked arrays/structs prevents common assignment mistakes as they are enforced by the compiler. + + +Structs are organized as follows: ``hwif_out..`` + +For example, a simple design such as: + +.. code-block:: systemrdl + + addrmap my_design { + reg { + field { + sw = rw; + hw = rw; + we; + } my_field; + } my_reg[2]; + }; + +... results in the following struct members: + +.. code-block:: text + + hwif_out.my_reg[0].my_field.value + hwif_in.my_reg[0].my_field.next + hwif_in.my_reg[0].my_field.we + hwif_out.my_reg[1].my_field.value + hwif_in.my_reg[1].my_field.next + hwif_in.my_reg[1].my_field.we + +For brevity in this documentation, hwif features will be described using shorthand +notation that omits the hierarchcal path: ``hwif_out..`` diff --git a/docs/index.rst b/docs/index.rst index 0165210..2b0b814 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,17 +1,25 @@ -PeakRDL-regblock -================ +Introduction +============ -.. important:: +PeakRDL-regblock is a free and open-source control & status register (CSR) compiler. +This code generator that will translate your SystemRDL register descripton into +a synthesizable SystemVerilog RTL module that can be easily instantiated into +your hardware design. - This project has no official releases yet and is still under active development! +* Generates fully synthesizable SystemVerilog RTL (IEEE 1800-2012) +* Options for many popular CPU interface protocols (AMBA APB, AXI4-Lite, and more) +* Configurable pipelining options for designs with fast clock rates. +* Broad support for SystemRDL 2.0 features -TODO: Intro text - Installing ---------- +.. important:: + + This project has no official releases yet and is still under active development! + Install from `PyPi`_ using pip .. code-block:: bash @@ -22,6 +30,45 @@ Install from `PyPi`_ using pip .. _PyPi: https://pypi.org/project/peakrdl-regblock + +Quick Start +----------- + +Below is a simple example that demonstrates how to generate a SystemVerilog +implementation from SystemRDL source. + +.. code-block:: python + :emphasize-lines: 2-3, 23-27 + + from systemrdl import RDLCompiler, RDLCompileError + from peakrdl.regblock import RegblockExporter + from peakrdl.regblock.cpuif.apb3 import APB3_Cpuif + + input_files = [ + "PATH/TO/my_register_block.rdl" + ] + + # Create an instance of the compiler + rdlc = RDLCompiler() + try: + # Compile your RDL files + for input_file in input_files: + rdlc.compile_file(input_file) + + # Elaborate the design + root = rdlc.elaborate() + except RDLCompileError: + # A compilation error occurred. Exit with error code + sys.exit(1) + + # Export a SystemVerilog implementation + exporter = RegblockExporter() + exporter.export( + root, "path/to/output_dir", + cpuif_cls=APB3_Cpuif + ) + + Links ----- @@ -39,17 +86,19 @@ Links self architecture hwif + api limitations .. toctree:: :hidden: :caption: CPU Interfaces - cpuif/addressing + cpuif/introduction cpuif/apb3 cpuif/axi4lite - cpuif/advanced + cpuif/passthrough cpuif/internal_protocol + cpuif/advanced .. toctree:: :hidden: diff --git a/peakrdl/regblock/cpuif/axi4lite/axi4lite_tmpl.sv b/peakrdl/regblock/cpuif/axi4lite/axi4lite_tmpl.sv index 5ca4753..287cc3a 100644 --- a/peakrdl/regblock/cpuif/axi4lite/axi4lite_tmpl.sv +++ b/peakrdl/regblock/cpuif/axi4lite/axi4lite_tmpl.sv @@ -1,5 +1,4 @@ -// LATENCY = {{cpuif.regblock_latency}} -// MAX OUTSTANDING = {{cpuif.max_outstanding}} +// Max Outstanding Transactions: {{cpuif.max_outstanding}} logic [{{clog2(cpuif.max_outstanding+1)-1}}:0] axil_n_in_flight; logic axil_prev_was_rd; logic axil_arvalid; @@ -11,6 +10,8 @@ logic axil_wvalid; logic [{{cpuif.data_width-1}}:0] axil_wdata; logic axil_aw_accept; logic axil_resp_acked; + +// Transaction request accpetance always_ff {{get_always_ff_event(cpuif.reset)}} begin if({{get_resetsignal(cpuif.reset)}}) begin axil_prev_was_rd <= '0; diff --git a/peakrdl/regblock/exporter.py b/peakrdl/regblock/exporter.py index dbb677d..e4e075e 100644 --- a/peakrdl/regblock/exporter.py +++ b/peakrdl/regblock/exporter.py @@ -16,7 +16,7 @@ from .utils import get_always_ff_event from .scan_design import DesignScanner class RegblockExporter: - def __init__(self, **kwargs): + def __init__(self, **kwargs) -> None: user_template_dir = kwargs.pop("user_template_dir", None) # Check for stray kwargs @@ -57,7 +57,53 @@ class RegblockExporter: ) - def export(self, node: Union[RootNode, AddrmapNode], output_dir:str, **kwargs): + def export(self, node: Union[RootNode, AddrmapNode], output_dir:str, **kwargs) -> None: + """ + Parameters + ---------- + node: AddrmapNode + Top-level SystemRDL node to export. + output_dir: str + Path to the output directory where generated SystemVerilog will be written. + Output includes two files: a module definition and package definition. + cpuif_cls: :class:`peakrdl.regblock.cpuif.CpuifBase` + Specify the class type that implements the CPU interface of your choice. + Defaults to AMBA APB3. + module_name: str + Override the SystemVerilog module name. By default, the module name + is the top-level node's name. + package_name: str + Override the SystemVerilog package name. By default, the package name + is the top-level node's name with a "_pkg" suffix. + reuse_hwif_typedefs: bool + By default, the exporter will attempt to re-use hwif struct definitions for + nodes that are equivalent. This allows for better modularity and type reuse. + Struct type names are derived using the SystemRDL component's type + name and declared lexical scope path. + + If this is not desireable, override this parameter to ``False`` and structs + will be generated more naively using their hierarchical paths. + retime_read_fanin: bool + Set this to ``True`` to enable additional read path retiming. + For large register blocks that operate at demanding clock rates, this + may be necessary in order to manage large readback fan-in. + + The retiming flop stage is automatically placed in the most optimal point in the + readback path so that logic-levels and fanin are minimized. + + Enabling this option will increase read transfer latency by 1 clock cycle. + retime_read_response: bool + Set this to ``True`` to enable an additional retiming flop stage between + the readback mux and the CPU interface response logic. + This option may be beneficial for some CPU interfaces that implement the + response logic fully combinationally. Enabling this stage can better + isolate timing paths in the register file from the rest of your system. + + Enabling this when using CPU interfaces that already implement the + response path sequentially may not result in any meaningful timing improvement. + + Enabling this option will increase read transfer latency by 1 clock cycle. + """ # If it is the root node, skip to top addrmap if isinstance(node, RootNode): self.top_node = node.top