PCIe DMA #1

New Issue

bslathi19 · 2025-11-09T00:03:10-08:00

bslathi19 commented

2025-11-09 00:03:10 -08:00

We need to generate DMA requests based on register reads and writes. The structure of a DMA request is as follows:

    modport req_src (
        output req_src_addr,
        output req_src_sel,
        output req_dst_addr,
        output req_dst_sel,
        output req_imm,
        output req_imm_en,
        output req_len,
        output req_tag,
        output req_id,
        output req_dest,
        output req_user,
        output req_valid,
        input  req_ready
    );

For read requests:

req_src_add: Tthe address that will be in the TLP. The bottom 2 bits, aka Address Type, is hardcoded to 0 for untranslated.
req_src_sel: ngl not sure what this does
req_dst_addr: the address of the dma ram where the data is written. This address gets shifted over by RAM_ADDR_W-RAM_SEG_ADDR_W. This simplifies to $clog2(RAM_SEGS*RAM_SEG_BE_W); RAM_SEGS is 2, and RAM_SEG_BE_W is 8, so $clog2(16) = 4, so the address is shifted over by 4 (divided by 16). Really this just means the ram is 2 segments, and each segment is 64 bits. 128 bits (16 bytes) per address, so divide the address by 16.
req_dst_sel: this ends up coming out on the dma interface as cmd_sel when it writes to the memory? Does not look like sel is used at all.
req_imm/req_imm_en: Not sure what these are used for
req_len: self explanatory
req_tag: As far as I can tell, this is just so the requester can match status responses with requests. It doesn't go into the pcie core at all.
req_id: Does not look to be used
req_dest: Not used in pcie core
req_user: not used in pcie core
req_valid: self explanatory
req_ready: self explanatory

I think the id/dest/user fields are used to handle the requests, and are not necessarily a part of the request itself. It also looks like if we send a request for 0 bytes, the response will say that 0 bytes are valid which I guess makes sense. For write requests its pretty similar I believe its just the the addresses are switched.

We need to generate DMA requests based on register reads and writes. The structure of a DMA request is as follows: ``` modport req_src ( output req_src_addr, output req_src_sel, output req_dst_addr, output req_dst_sel, output req_imm, output req_imm_en, output req_len, output req_tag, output req_id, output req_dest, output req_user, output req_valid, input req_ready ); ``` For read requests: req_src_add: Tthe address that will be in the TLP. The bottom 2 bits, aka Address Type, is hardcoded to 0 for untranslated. req_src_sel: ngl not sure what this does req_dst_addr: the address of the dma ram where the data is written. This address gets shifted over by RAM_ADDR_W-RAM_SEG_ADDR_W. This simplifies to $clog2(RAM_SEGS*RAM_SEG_BE_W); RAM_SEGS is 2, and RAM_SEG_BE_W is 8, so $clog2(16) = 4, so the address is shifted over by 4 (divided by 16). Really this just means the ram is 2 segments, and each segment is 64 bits. 128 bits (16 bytes) per address, so divide the address by 16. req_dst_sel: this ends up coming out on the dma interface as cmd_sel when it writes to the memory? Does not look like sel is used at all. req_imm/req_imm_en: Not sure what these are used for req_len: self explanatory req_tag: As far as I can tell, this is just so the requester can match status responses with requests. It doesn't go into the pcie core at all. req_id: Does not look to be used req_dest: Not used in pcie core req_user: not used in pcie core req_valid: self explanatory req_ready: self explanatory I think the id/dest/user fields are used to handle the requests, and are not necessarily a part of the request itself. It also looks like if we send a request for 0 bytes, the response will say that 0 bytes are valid which I guess makes sense. For write requests its pretty similar I believe its just the the addresses are switched.

bslathi19 commented

2025-11-09 00:04:36 -08:00

So the only fields we really need to do are the source and destination addresses, and the length. We can autogenerate the tag based on just a counter of packets. When we write to a certain register, it will then send out that descriptor. When we get a response, it will latch those values to another set of registers with a clear on read bit for valid.

bslathi19 commented

2025-11-09 00:12:41 -08:00

For simplicity, read and write will have their own sections.

bslathi19 commented

2025-11-09 18:34:30 -08:00

We added this and it seems to be working, except that if we try to run it multiple times in a row, it doesn't end up overwriting the data. We may need to test this in sim and or get a trace on it with an ILA.

bslathi19 commented

2025-11-09 22:17:44 -08:00

Here are the original read request and the write request, right after a reboot. You can see that the read request is ack'd, then the status is valid. So is the write request.

screenshot_09-Nov-2025_22-15-42.png

48 KiB

screenshot_09-Nov-2025_22-15-53.png

50 KiB

bslathi19 commented

2025-11-09 22:19:04 -08:00

However, when we try the second read we do not see a status valid for the read.

screenshot_09-Nov-2025_22-18-18.png

48 KiB

screenshot_09-Nov-2025_22-18-27.png

50 KiB

bslathi19 commented

2025-11-09 22:21:38 -08:00

Maybe we need to look at the RQ and RC interfaces? It might be that we are not getting a response from the cpu again? Maybe the CPU needs to see a second tag or something?

bslathi19 commented

2025-11-09 22:34:49 -08:00

Hmm supposedly the core is still incrementing the tag when we send the request. I think we will need to look at the actual axi streams. they are like 256 bits wide though so it will be a pretty big ILA I guess.

bslathi19 commented

2025-11-10 22:39:05 -08:00

so for the RQ and RC streams, they are both 256 bit with 8 bits of keep only, RQ is 62 bits and RC has 75 bits of user.

So we can do a 75 bit tuser,

data: 256
keep: 8
user: 75
last: 1
valid: 1
ready: 1

so for the RQ and RC streams, they are both 256 bit with 8 bits of keep only, RQ is 62 bits and RC has 75 bits of user. So we can do a 75 bit tuser, data: 256 keep: 8 user: 75 last: 1 valid: 1 ready: 1

bslathi19 commented

2025-11-10 22:40:32 -08:00

Huh, It looks like even though we read the second data, the data that we are writing back is still the old data. So its coming from the FPGA, not from the PC.

bslathi19 commented

2025-11-11 21:15:54 -08:00

So the problem is that we are not writing to the memory again. We are seeing wr_cmd_valid for the first write, but not the second one. But read is still happening, so we are reading the nonsense data.

bslathi19 commented

2025-11-11 21:57:50 -08:00

Here are the RC and RQ busses for the FIRST transfer, this is the one that is successful.

screenshot_11-Nov-2025_21-53-45.png

30 KiB

screenshot_11-Nov-2025_21-53-13.png

52 KiB

bslathi19 commented

2025-11-11 22:18:10 -08:00

ok the only thing of note here that I can think of is that the sequence number is the same: 0. Do we need to change it so that we send different sequence numbers?

bslathi19 commented

2025-11-11 22:52:52 -08:00

Interesting, in the example they enable client tag, should we try that?

bslathi19 commented

2025-11-12 22:53:41 -08:00

Yes, that fixes it.

bslathi19 closed this issue

2025-11-12 22:53:41 -08:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: bslathi19/alibaba_pcie#1