32 Bit address support #1

Closed
opened 2026-04-25 22:11:06 -07:00 by bslathi19 · 34 comments
Owner

By changing the addressing modes, it is possible to convert the 6502 from using 16 bit addresses to 32 bit addresses. Any instruction which deals with 2-byte addresses is modified to work with 4-byte addresses instead.

The 6502 has 15 addressing modes. Here is how each of them will change:

Addressing Mode Change Detail
Implicit - No change
Accumulator - No change
Immediate - Registers are still 8 bits, so no change
Zero Page - No change
(Zero Page) read 4 bytes from zeropage Pointer is now 4 bytes instead of 2, zero page address remains 1 byte
Zero Page,X - No change
Zero Page,Y - No change
Relative - No change
Absolute Instruction is 5 bytes instead of 3 Specify the entire 32 bit address in the instruction instead of just 16.
Absolute,X Instruction is 5 bytes instead of 3 Specify the entire 32 bit address in the instruction instead of just 16.
Absolute,X Indirect Instruction is 5 bytes instead of 3 Specify the entire 32 bit address to the pointer in the instruction instead of just 16. 4 bytes will be loaded from the pointer for the PC instead of just 2. This is only used with JMP
Absolute,Y Instruction is 5 bytes instead of 3 pecify the entire 32 bit address in the instruction instead of just 16.
Indirect Instruction is 5 bytes instead of 3 instruction encodes a 32 pointer to a 32 bit jump target.
Indexed Indirect read 4 bytes from zeropage Pointer is now 4 bytes instead of 2, zero page address remains 1 byte
Indirect Indexed read 4 bytes from zeropage Pointer is now 4 bytes instead of 2, zero page address remains 1 byte

Overall, there are not too many different changes.

  1. reading 4 bytes from zero page instead of 2
  2. encoding 4 bytes in the instruction instead of 2
  3. Reading 4 bytes from non-zeropage

The registers all remain 8 bits, except for the program counter which obviously must be 32 bits.

The vector addresses also change since they are 32 bit now instead of 16 bit.

Backwards compatibility is NOT a requirement. There is NO need to have a 16 bit mode, or be able to run existing 6502 code in any way.

By changing the addressing modes, it is possible to convert the 6502 from using 16 bit addresses to 32 bit addresses. Any instruction which deals with 2-byte addresses is modified to work with 4-byte addresses instead. The 6502 has 15 addressing modes. Here is how each of them will change: | Addressing Mode | Change | Detail | | --- | --- | --- | | Implicit | - | No change | | Accumulator | - | No change | | Immediate | - | Registers are still 8 bits, so no change | | Zero Page | - | No change | | (Zero Page) | read 4 bytes from zeropage | Pointer is now 4 bytes instead of 2, zero page address remains 1 byte | | Zero Page,X | - | No change | | Zero Page,Y | - | No change | | Relative | - | No change | | Absolute | Instruction is 5 bytes instead of 3 | Specify the entire 32 bit address in the instruction instead of just 16. | | Absolute,X | Instruction is 5 bytes instead of 3 | Specify the entire 32 bit address in the instruction instead of just 16. | | Absolute,X Indirect | Instruction is 5 bytes instead of 3 | Specify the entire 32 bit address to the pointer in the instruction instead of just 16. 4 bytes will be loaded from the pointer for the PC instead of just 2. This is only used with JMP | | Absolute,Y | Instruction is 5 bytes instead of 3 | pecify the entire 32 bit address in the instruction instead of just 16. | | Indirect | Instruction is 5 bytes instead of 3 | instruction encodes a 32 pointer to a 32 bit jump target. | | Indexed Indirect | read 4 bytes from zeropage | Pointer is now 4 bytes instead of 2, zero page address remains 1 byte | | Indirect Indexed | read 4 bytes from zeropage | Pointer is now 4 bytes instead of 2, zero page address remains 1 byte | Overall, there are not too many different changes. 1. reading 4 bytes from zero page instead of 2 2. encoding 4 bytes in the instruction instead of 2 3. Reading 4 bytes from non-zeropage The registers all remain 8 bits, except for the program counter which obviously must be 32 bits. The vector addresses also change since they are 32 bit now instead of 16 bit. Backwards compatibility is NOT a requirement. There is NO need to have a 16 bit mode, or be able to run existing 6502 code in any way.
Author
Owner

Starting from the top, the first thing that needs to change is state IND0

All IND0 does is go to INDX1, bypassing the INDX0 state where the zp address is added with the X register. This state therefore requires no modifications since it will be handled by the INDXn states.

IND0 : state <= INDX1;

We will revisit this one when we handle the other zero page indirect modes.

Starting from the top, the first thing that needs to change is state `IND0` All `IND0` does is go to `INDX1`, bypassing the `INDX0` state where the zp address is added with the X register. This state therefore requires no modifications since it will be handled by the `INDXn` states. https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L1018 We will revisit this one when we handle the other zero page indirect modes.
Author
Owner

second state that needs changed is ABSn

currently there are 2 states, for loading 2 bytes

ABS0 = 6'd0, // ABS - fetch LSB
ABS1 = 6'd1, // ABS - fetch MSB

ABS0 increments the program counter

ABS0,
JMPIX0,
JMPIX2,
ABSX0,
FETCH,
BRA0,
BRA2,
BRK3,
JMPI1,
JMP1,
RTI4,
RTS3: PC_inc = 1;

The ALU is set to add by default, AI is set to 0 by default, BI is set to DIMUX by default.

ABS1 sets the next address to the combination of DIMUX and the ALU output

ABS1: AB = { DIMUX, ADD };

So, in order to support 32 bit addresses we need to have 16 more bits of temporary storage. Right now it uses the ALU output register, as well as the input register. We can have an ALU shift register which simple stores the last 2 results of the ALU. In the final state when we jump to the new address, it will do

AB = {DIMUX, ALU_SR[1] ALU_SR[0], ADD}

This will be the result of ABS3. ABS0,1,2 will all be the same, just incrementing PC

second state that needs changed is `ABSn` currently there are 2 states, for loading 2 bytes https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L212-L213 ABS0 increments the program counter https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L382-L393 The ALU is set to add by default, AI is set to 0 by default, BI is set to DIMUX by default. ABS1 sets the next address to the combination of DIMUX and the ALU output https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L426 So, in order to support 32 bit addresses we need to have 16 more bits of temporary storage. Right now it uses the ALU output register, as well as the input register. We can have an ALU shift register which simple stores the last 2 results of the ALU. In the final state when we jump to the new address, it will do `AB = {DIMUX, ALU_SR[1] ALU_SR[0], ADD}` This will be the result of ABS3. ABS0,1,2 will all be the same, just incrementing PC
Author
Owner

Actually before we do that, we need to do the vectors so that we can even reset the chip.
We start in state BRK0

if( reset )
state <= BRK0;

This sets the address to the current stack pointer, which will still only be 16 bits. We can hardcode the upper 16 bits to 0.
JSR0,
BRK0: DO = PCH;
JSR1,
BRK1: DO = PCL;

In BRK0 and BRK1, as well as JSR0 and JSR1, we push the current PC to the stack. We need to add 2 more states so that we can write all 32 bits, instead of just 16

BRK2: DO = (IRQ | NMI_edge) ? (P & 8'b1110_1111) : P;

In BRK2 we write the processor status register, so we can just move that back a few cycles

BRK1,
JSR1,
PULL1,
RTS1,
RTS2,
RTI1,
RTI2,
RTI3,
BRK2: AB = { STACKPAGE, ADD };

BRK1 and BRK2 increment the address, our stats will do the same

BRK2: PC_temp = res ? 16'hfffc :
NMI_edge ? 16'hfffa : 16'hfffe;

Here is where the vectors are harcoded. We will change these vectors to be at 0xFFFFFFF4, 0xFFFFFFF8, and 0xFFFFFFFC.

Actually before we do that, we need to do the vectors so that we can even reset the chip. We start in state BRK0 https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L952-L953 This sets the address to the current stack pointer, which will still only be 16 bits. We can hardcode the upper 16 bits to 0. https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L486-L490 In BRK0 and BRK1, as well as JSR0 and JSR1, we push the current PC to the stack. We need to add 2 more states so that we can write all 32 bits, instead of just 16 https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L494 In BRK2 we write the processor status register, so we can just move that back a few cycles https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L441-L449 BRK1 and BRK2 increment the address, our stats will do the same https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L366-L367 Here is where the vectors are harcoded. We will change these vectors to be at 0xFFFFFFF4, 0xFFFFFFF8, and 0xFFFFFFFC.
Author
Owner

The BRK changes are added in 9476c6a0dd

Now that we have those, we need to update the JMP state, since it is only waiting 1 cycle for an address image.png

JMP does not really do anything. If we add 2 more JMP0 like states as well as the ALU shift register, this should be trivial.

The BRK changes are added in 9476c6a0dd3bec6bf7d521cdd2c3467bcd3fb929 Now that we have those, we need to update the JMP state, since it is only waiting 1 cycle for an address ![image.png](/attachments/67546c15-05a8-4f59-9800-87c1c87c5695) JMP does not really do anything. If we add 2 more JMP0 like states as well as the ALU shift register, this should be trivial.
9.8 KiB
Author
Owner

JMP changes are added in 019b84f41d

This is the Absolute jump

JMP changes are added in 019b84f41d6ea775194eeeb250fbd0eb185c3779 This is the Absolute jump
Author
Owner

Lets tackle absolute for normal instructions next.

747438a9b6

This was pretty simple, we just copy the ABS0 state two more times.

Lets tackle absolute for normal instructions next. 747438a9b678417f56eb94c90a31c456f70056b5 This was pretty simple, we just copy the ABS0 state two more times.
Author
Owner

abs,x next.

Looks like we can just copy this state

ABSX1 = 6'd3, // ABS, X - fetch MSB and send to ALU (+Carry)

2 more times

abs,x next. Looks like we can just copy this state https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L215 2 more times
Author
Owner

Added abs,x here
cb6cac1245

Added abs,x here cb6cac12451b7a673680625ea2f77e9d4895305f
Author
Owner

abs,y should also be handled by abs,x add those to the test also

abs,y should also be handled by abs,x add those to the test also
Author
Owner

Lets tackle absolute,x indirect.

This is states JMPIXn

Like absx, We can probably just copy this state twice

JMPIX1 = 6'd52, // JMP (,X)- fetch MSB and send to ALU (+Carry)

Lets tackle absolute,x indirect. This is states JMPIXn Like absx, We can probably just copy this state twice https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L264
Author
Owner

Ok that is one in dc339cb725

Now for regular indirect. We can just copy JMPI0 twice.

Ok that is one in dc339cb725af758c7bc9838d4920e9d921d31a55 Now for regular indirect. We can just copy JMPI0 twice.
Author
Owner

Added in b31d7490b2

Added in b31d7490b2d318d742cdd20a44003655cc565613
Author
Owner

Lets do Indirect Indexed, since it is apparently the most common indirection mode.
according to the state listing, here are the steps that we do

INDY0 = 6'd18, // (ZP),Y - fetch ZP address, and send ZP to ALU (+1)
INDY1 = 6'd19, // (ZP),Y - fetch at ZP+1, and send LSB to ALU (+Y)
INDY2 = 6'd20, // (ZP),Y - fetch data, and send MSB to ALU (+Carry)
INDY3 = 6'd21, // (ZP),Y) - fetch data (if page boundary crossed)

How should we make this work with 32 bit addresses?

The first step loads the LSB and sends the ZP index to ALU
the second step reads the MSB and sends the LSB to ALU to add Y
the third step reads

So we need to do a combination of steps 2 and 3. Instead of loading data from the calculated address, we need to read the 3rd and 4th bytes from zero page and add the carry. Only then can we read from the computed address.

Lets do Indirect Indexed, since it is apparently the most common indirection mode. according to the state listing, here are the steps that we do https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L230-L233 How should we make this work with 32 bit addresses? The first step loads the LSB and sends the ZP index to ALU the second step reads the MSB and sends the LSB to ALU to add Y the third step reads So we need to do a combination of steps 2 and 3. Instead of loading data from the calculated address, we need to read the 3rd and 4th bytes from zero page and add the carry. Only then can we read from the computed address.
Author
Owner

Hmm that plan would not work because we need the ALU to be adding the offset, whereas this instruction also uses the ALU to generate the address.

Hmm that plan would not work because we need the ALU to be adding the offset, whereas this instruction also uses the ALU to generate the address.
Author
Owner

So we need to load 4 bytes from zeropage, which means we need to calculate 4 new addresses, but at the same time we also need to add the Y register to what we are reading. We might need to add another adder just for handling increasing the address.

So we need to load 4 bytes from zeropage, which means we need to calculate 4 new addresses, but at the same time we also need to add the Y register to what we are reading. We might need to add another adder just for handling increasing the address.
Author
Owner

dfe27d4ec7

What I ended up doing was adding another signal which could increment the address bus register. This does mean that there are two 32 bit adders, one for PC and one for address bus, along with the ALU. I think that this is better than the alternative of adding extra registers and taking nearly twice as many cycles

dfe27d4ec783cbc8e3f03a18853f24042a6a0c89 What I ended up doing was adding another signal which could increment the address bus register. This does mean that there are two 32 bit adders, one for PC and one for address bus, along with the ALU. I think that this is better than the alternative of adding extra registers and taking nearly twice as many cycles
Author
Owner

Last one is indexed indirect, which should be easier since we don't have to add to a 32 bit number, only an 8 bit number.

Last one is indexed indirect, which should be easier since we don't have to add to a 32 bit number, only an 8 bit number.
Author
Owner

ok it still uses the ALU to increment the ZP pointer, so we can use the same extra adder that we added for the other ZP state.

ok it still uses the ALU to increment the ZP pointer, so we can use the same extra adder that we added for the other ZP state.
Author
Owner

Done in 2a9af9e9dc

We've added all addressing modes, but not all instructions will be functional. For example, the absolute and absolute,x addressing modes are supported on a bunch of different instructions, we need to make sure that the timings line up for the side effects.

What we probably need to do is test every single instruction in every single addressing mode that it supports.

Done in 2a9af9e9dcfcde47be772f659a0a8b91d71ee282 We've added all addressing modes, but not all instructions will be functional. For example, the absolute and absolute,x addressing modes are supported on a bunch of different instructions, we need to make sure that the timings line up for the side effects. What we probably need to do is test every single instruction in every single addressing mode that it supports.
Author
Owner

I think we should have tests for every instruction which has those addressing modes. Also, we need to handle the extra cycles if there is a page crossing. Lets start at the top width ADC

I think we should have tests for every instruction which has those addressing modes. Also, we need to handle the extra cycles if there is a page crossing. Lets start at the top width ADC
Author
Owner

thats gonna be boring. One thing that we definitely need to test is JSR, RTS, BRK, and RTI.
These involve pushing and popping more bytes to/from the stack.

thats gonna be boring. One thing that we definitely need to test is JSR, RTS, BRK, and RTI. These involve pushing and popping more bytes to/from the stack.
Author
Owner

basically we just need 2 more jsr0/jsr1 states, easy as.

basically we just need 2 more jsr0/jsr1 states, easy as.
Author
Owner

What the 6502 does is use the stack pointer as temporary storage for the LSB of the jump target. We need to store 3 bytes instead of 1 though. We use the ALU to decrement the stack pointer while we are writing the current PC and status register to the stack. What we could do instead is read the next 4 bytes and jump to it using the alu output shift register like we do for jump

Hmm we can't use the ALU to both store the stack pointer and also use its shift register to store the target address.

Ah we can swap it back in state JSR4, that will keep it inline since JSR5 will read the second byte of the address.

What the 6502 does is use the stack pointer as temporary storage for the LSB of the jump target. We need to store 3 bytes instead of 1 though. We use the ALU to decrement the stack pointer while we are writing the current PC and status register to the stack. What we could do instead is read the next 4 bytes and jump to it using the alu output shift register like we do for jump Hmm we can't use the ALU to both store the stack pointer and also use its shift register to store the target address. Ah we can swap it back in state JSR4, that will keep it inline since JSR5 will read the second byte of the address.
Author
Owner

In terms of cycle count, we have added 4 cycles making JSR take 10 cycles. This is almost double the length, but hey performance was not a goal.

In terms of cycle count, we have added 4 cycles making JSR take 10 cycles. This is almost double the length, but hey performance was not a goal.
Author
Owner

RTS follows a very similar pattern to JSR, except instead of pushing from the stack we read from the stack. This should be a little bit easier too.

RTS follows a very similar pattern to JSR, except instead of pushing from the stack we read from the stack. This should be a little bit easier too.
Author
Owner

we cannot use the ALU to decrease the stack pointer while also using the ALU registers to store the temporary address. What we could do is modify the shift register to take in data from DIMUX instead of just the ALU

we cannot use the ALU to decrease the stack pointer while also using the ALU registers to store the temporary address. What we could do is modify the shift register to take in data from DIMUX instead of just the ALU
Author
Owner

when the original cpu pushes the address to the stack, it has already incremented PC once, so if the addresss start at 0x200

JSR LSB MSB, the PC is pointing at MSB when we start writing. This is 1 less than the address we want to jump to. When we are writing 32 bits, we are now 3 less than what we need to jump to. We cannot just keep increaseing PC though becuase that means it will be changing as we are writing to it. We could add 2 more dummy cycles at the end of RTS, or we could make it so that we can add 3 to PC instead of just 1. One requires more area, one requires more cycles. Since area is basically free, lets add the +3

when the original cpu pushes the address to the stack, it has already incremented PC once, so if the addresss start at 0x200 JSR LSB MSB, the PC is pointing at MSB when we start writing. This is 1 less than the address we want to jump to. When we are writing 32 bits, we are now 3 less than what we need to jump to. We cannot just keep increaseing PC though becuase that means it will be changing as we are writing to it. We could add 2 more dummy cycles at the end of RTS, or we could make it so that we can add 3 to PC instead of just 1. One requires more area, one requires more cycles. Since area is basically free, lets add the +3
Author
Owner

Ah we need to use the alu_sr sel to store dimux into the shift register, so that we can use the alu to decrement the stack pointer

Ah we need to use the alu_sr sel to store dimux into the shift register, so that we can use the alu to decrement the stack pointer
Author
Owner

Now we need to implement BRK and RTI. This should be mostly similar to JSR and RTS except that we push and pull the flags register as well.

Now we need to implement BRK and RTI. This should be mostly similar to JSR and RTS except that we push and pull the flags register as well.
Author
Owner

We basically did BRK earlier when we did reset. RTI now works, but we should test external interrupts though just to make sure they function as expected.

We basically did BRK earlier when we did reset. RTI now works, but we should test external interrupts though just to make sure they function as expected.
Author
Owner

Oh also the branch commands will need to take calculate a 32 bit address instead of 16 bit, so thats potentially 2 more cycles that each one will take.

Oh also the branch commands will need to take calculate a 32 bit address instead of 16 bit, so thats potentially 2 more cycles that each one will take.
Author
Owner

So it kind of works but WAI increase pc by too many, it should increment PC by 1 put it increments it by 2 instead. WAI is kind of fake, but we can add a state to decode to fix this anyway.

So it kind of works but WAI increase pc by too many, it should increment PC by 1 put it increments it by 2 instead. WAI is kind of fake, but we can add a state to decode to fix this anyway.
Author
Owner

Ok we changed how branch works, so it should mostly work now. With that, I think we have everything mostly functional. Now we can work on modifying cc65 to generate code for our new target.

Ok we changed how branch works, so it should mostly work now. With that, I think we have everything mostly functional. Now we can work on modifying cc65 to generate code for our new target.
Author
Owner

I think this is ready to merge, works good enough.

I think this is ready to merge, works good enough.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bslathi19/verilog6502#1