PCIe DMA #1

Closed
opened 2025-11-09 00:03:10 -08:00 by bslathi19 · 14 comments
Owner

We need to generate DMA requests based on register reads and writes. The structure of a DMA request is as follows:

    modport req_src (
        output req_src_addr,
        output req_src_sel,
        output req_dst_addr,
        output req_dst_sel,
        output req_imm,
        output req_imm_en,
        output req_len,
        output req_tag,
        output req_id,
        output req_dest,
        output req_user,
        output req_valid,
        input  req_ready
    );

For read requests:

req_src_add: Tthe address that will be in the TLP. The bottom 2 bits, aka Address Type, is hardcoded to 0 for untranslated.
req_src_sel: ngl not sure what this does
req_dst_addr: the address of the dma ram where the data is written. This address gets shifted over by RAM_ADDR_W-RAM_SEG_ADDR_W. This simplifies to $clog2(RAM_SEGS*RAM_SEG_BE_W); RAM_SEGS is 2, and RAM_SEG_BE_W is 8, so $clog2(16) = 4, so the address is shifted over by 4 (divided by 16). Really this just means the ram is 2 segments, and each segment is 64 bits. 128 bits (16 bytes) per address, so divide the address by 16.
req_dst_sel: this ends up coming out on the dma interface as cmd_sel when it writes to the memory? Does not look like sel is used at all.
req_imm/req_imm_en: Not sure what these are used for
req_len: self explanatory
req_tag: As far as I can tell, this is just so the requester can match status responses with requests. It doesn't go into the pcie core at all.
req_id: Does not look to be used
req_dest: Not used in pcie core
req_user: not used in pcie core
req_valid: self explanatory
req_ready: self explanatory

I think the id/dest/user fields are used to handle the requests, and are not necessarily a part of the request itself. It also looks like if we send a request for 0 bytes, the response will say that 0 bytes are valid which I guess makes sense. For write requests its pretty similar I believe its just the the addresses are switched.

We need to generate DMA requests based on register reads and writes. The structure of a DMA request is as follows: ``` modport req_src ( output req_src_addr, output req_src_sel, output req_dst_addr, output req_dst_sel, output req_imm, output req_imm_en, output req_len, output req_tag, output req_id, output req_dest, output req_user, output req_valid, input req_ready ); ``` For read requests: req_src_add: Tthe address that will be in the TLP. The bottom 2 bits, aka Address Type, is hardcoded to 0 for untranslated. req_src_sel: ngl not sure what this does req_dst_addr: the address of the dma ram where the data is written. This address gets shifted over by RAM_ADDR_W-RAM_SEG_ADDR_W. This simplifies to $clog2(RAM_SEGS*RAM_SEG_BE_W); RAM_SEGS is 2, and RAM_SEG_BE_W is 8, so $clog2(16) = 4, so the address is shifted over by 4 (divided by 16). Really this just means the ram is 2 segments, and each segment is 64 bits. 128 bits (16 bytes) per address, so divide the address by 16. req_dst_sel: this ends up coming out on the dma interface as cmd_sel when it writes to the memory? Does not look like sel is used at all. req_imm/req_imm_en: Not sure what these are used for req_len: self explanatory req_tag: As far as I can tell, this is just so the requester can match status responses with requests. It doesn't go into the pcie core at all. req_id: Does not look to be used req_dest: Not used in pcie core req_user: not used in pcie core req_valid: self explanatory req_ready: self explanatory I think the id/dest/user fields are used to handle the requests, and are not necessarily a part of the request itself. It also looks like if we send a request for 0 bytes, the response will say that 0 bytes are valid which I guess makes sense. For write requests its pretty similar I believe its just the the addresses are switched.
Author
Owner

So the only fields we really need to do are the source and destination addresses, and the length. We can autogenerate the tag based on just a counter of packets. When we write to a certain register, it will then send out that descriptor. When we get a response, it will latch those values to another set of registers with a clear on read bit for valid.

So the only fields we really need to do are the source and destination addresses, and the length. We can autogenerate the tag based on just a counter of packets. When we write to a certain register, it will then send out that descriptor. When we get a response, it will latch those values to another set of registers with a clear on read bit for valid.
Author
Owner

For simplicity, read and write will have their own sections.

For simplicity, read and write will have their own sections.
Author
Owner

We added this and it seems to be working, except that if we try to run it multiple times in a row, it doesn't end up overwriting the data. We may need to test this in sim and or get a trace on it with an ILA.

We added this and it seems to be working, except that if we try to run it multiple times in a row, it doesn't end up overwriting the data. We may need to test this in sim and or get a trace on it with an ILA.
Author
Owner

Here are the original read request and the write request, right after a reboot. You can see that the read request is ack'd, then the status is valid. So is the write request.

Here are the original read request and the write request, right after a reboot. You can see that the read request is ack'd, then the status is valid. So is the write request.
Author
Owner

However, when we try the second read we do not see a status valid for the read.

However, when we try the second read we do not see a status valid for the read.
Author
Owner

Maybe we need to look at the RQ and RC interfaces? It might be that we are not getting a response from the cpu again? Maybe the CPU needs to see a second tag or something?

Maybe we need to look at the RQ and RC interfaces? It might be that we are not getting a response from the cpu again? Maybe the CPU needs to see a second tag or something?
Author
Owner

Hmm supposedly the core is still incrementing the tag when we send the request. I think we will need to look at the actual axi streams. they are like 256 bits wide though so it will be a pretty big ILA I guess.

Hmm supposedly the core is still incrementing the tag when we send the request. I think we will need to look at the actual axi streams. they are like 256 bits wide though so it will be a pretty big ILA I guess.
Author
Owner

so for the RQ and RC streams, they are both 256 bit with 8 bits of keep only, RQ is 62 bits and RC has 75 bits of user.

So we can do a 75 bit tuser,

data: 256
keep: 8
user: 75
last: 1
valid: 1
ready: 1

so for the RQ and RC streams, they are both 256 bit with 8 bits of keep only, RQ is 62 bits and RC has 75 bits of user. So we can do a 75 bit tuser, data: 256 keep: 8 user: 75 last: 1 valid: 1 ready: 1
Author
Owner

Huh, It looks like even though we read the second data, the data that we are writing back is still the old data. So its coming from the FPGA, not from the PC.

Huh, It looks like even though we read the second data, the data that we are writing back is still the old data. So its coming from the FPGA, not from the PC.
Author
Owner

So the problem is that we are not writing to the memory again. We are seeing wr_cmd_valid for the first write, but not the second one. But read is still happening, so we are reading the nonsense data.

So the problem is that we are not writing to the memory again. We are seeing wr_cmd_valid for the first write, but not the second one. But read is still happening, so we are reading the nonsense data.
Author
Owner

Here are the RC and RQ busses for the FIRST transfer, this is the one that is successful.

Here are the RC and RQ busses for the FIRST transfer, this is the one that is successful.
Author
Owner

ok the only thing of note here that I can think of is that the sequence number is the same: 0. Do we need to change it so that we send different sequence numbers?

ok the only thing of note here that I can think of is that the sequence number is the same: 0. Do we need to change it so that we send different sequence numbers?
Author
Owner

Interesting, in the example they enable client tag, should we try that?

Interesting, in the example they enable client tag, should we try that?
Author
Owner

Yes, that fixes it.

Yes, that fixes it.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bslathi19/alibaba_pcie#1