32 Bit address support #1

New Issue

bslathi19 · 2026-04-25T22:11:06-07:00

bslathi19 commented

2026-04-25 22:11:06 -07:00

By changing the addressing modes, it is possible to convert the 6502 from using 16 bit addresses to 32 bit addresses. Any instruction which deals with 2-byte addresses is modified to work with 4-byte addresses instead.

The 6502 has 15 addressing modes. Here is how each of them will change:

Addressing Mode	Change	Detail
Implicit	-	No change
Accumulator	-	No change
Immediate	-	Registers are still 8 bits, so no change
Zero Page	-	No change
(Zero Page)	read 4 bytes from zeropage	Pointer is now 4 bytes instead of 2, zero page address remains 1 byte
Zero Page,X	-	No change
Zero Page,Y	-	No change
Relative	-	No change
Absolute	Instruction is 5 bytes instead of 3	Specify the entire 32 bit address in the instruction instead of just 16.
Absolute,X	Instruction is 5 bytes instead of 3	Specify the entire 32 bit address in the instruction instead of just 16.
Absolute,X Indirect	Instruction is 5 bytes instead of 3	Specify the entire 32 bit address to the pointer in the instruction instead of just 16. 4 bytes will be loaded from the pointer for the PC instead of just 2. This is only used with JMP
Absolute,Y	Instruction is 5 bytes instead of 3	pecify the entire 32 bit address in the instruction instead of just 16.
Indirect	Instruction is 5 bytes instead of 3	instruction encodes a 32 pointer to a 32 bit jump target.
Indexed Indirect	read 4 bytes from zeropage	Pointer is now 4 bytes instead of 2, zero page address remains 1 byte
Indirect Indexed	read 4 bytes from zeropage	Pointer is now 4 bytes instead of 2, zero page address remains 1 byte

Overall, there are not too many different changes.

reading 4 bytes from zero page instead of 2
encoding 4 bytes in the instruction instead of 2
Reading 4 bytes from non-zeropage

The registers all remain 8 bits, except for the program counter which obviously must be 32 bits.

The vector addresses also change since they are 32 bit now instead of 16 bit.

Backwards compatibility is NOT a requirement. There is NO need to have a 16 bit mode, or be able to run existing 6502 code in any way.

By changing the addressing modes, it is possible to convert the 6502 from using 16 bit addresses to 32 bit addresses. Any instruction which deals with 2-byte addresses is modified to work with 4-byte addresses instead. The 6502 has 15 addressing modes. Here is how each of them will change: | Addressing Mode | Change | Detail | | --- | --- | --- | | Implicit | - | No change | | Accumulator | - | No change | | Immediate | - | Registers are still 8 bits, so no change | | Zero Page | - | No change | | (Zero Page) | read 4 bytes from zeropage | Pointer is now 4 bytes instead of 2, zero page address remains 1 byte | | Zero Page,X | - | No change | | Zero Page,Y | - | No change | | Relative | - | No change | | Absolute | Instruction is 5 bytes instead of 3 | Specify the entire 32 bit address in the instruction instead of just 16. | | Absolute,X | Instruction is 5 bytes instead of 3 | Specify the entire 32 bit address in the instruction instead of just 16. | | Absolute,X Indirect | Instruction is 5 bytes instead of 3 | Specify the entire 32 bit address to the pointer in the instruction instead of just 16. 4 bytes will be loaded from the pointer for the PC instead of just 2. This is only used with JMP | | Absolute,Y | Instruction is 5 bytes instead of 3 | pecify the entire 32 bit address in the instruction instead of just 16. | | Indirect | Instruction is 5 bytes instead of 3 | instruction encodes a 32 pointer to a 32 bit jump target. | | Indexed Indirect | read 4 bytes from zeropage | Pointer is now 4 bytes instead of 2, zero page address remains 1 byte | | Indirect Indexed | read 4 bytes from zeropage | Pointer is now 4 bytes instead of 2, zero page address remains 1 byte | Overall, there are not too many different changes. 1. reading 4 bytes from zero page instead of 2 2. encoding 4 bytes in the instruction instead of 2 3. Reading 4 bytes from non-zeropage The registers all remain 8 bits, except for the program counter which obviously must be 32 bits. The vector addresses also change since they are 32 bit now instead of 16 bit. Backwards compatibility is NOT a requirement. There is NO need to have a 16 bit mode, or be able to run existing 6502 code in any way.

bslathi19 commented

2026-04-25 22:18:30 -07:00

Starting from the top, the first thing that needs to change is state IND0

All IND0 does is go to INDX1, bypassing the INDX0 state where the zp address is added with the X register. This state therefore requires no modifications since it will be handled by the INDXn states.

		verilog6502/src/cpu_65c02.v
		Line 1018 in 06f933fa56
	
				        IND0    : state <= INDX1;

We will revisit this one when we handle the other zero page indirect modes.

Starting from the top, the first thing that needs to change is state `IND0` All `IND0` does is go to `INDX1`, bypassing the `INDX0` state where the zp address is added with the X register. This state therefore requires no modifications since it will be handled by the `INDXn` states. https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L1018 We will revisit this one when we handle the other zero page indirect modes.

bslathi19 commented

2026-04-25 22:33:11 -07:00

second state that needs changed is ABSn

currently there are 2 states, for loading 2 bytes

		verilog6502/src/cpu_65c02.v
		Lines 212 to 213 in 06f933fa56
	
				    ABS0   = 6'd0,  // ABS     - fetch LSB

				    ABS1   = 6'd1,  // ABS     - fetch MSB

ABS0 increments the program counter

		verilog6502/src/cpu_65c02.v
		Lines 382 to 393 in 06f933fa56
	
				        ABS0,

				        JMPIX0,

				        JMPIX2,

				        ABSX0,

				        FETCH,

				        BRA0,

				        BRA2,

				        BRK3,

				        JMPI1,

				        JMP1,

				        RTI4,

				        RTS3:           PC_inc = 1;

The ALU is set to add by default, AI is set to 0 by default, BI is set to DIMUX by default.

ABS1 sets the next address to the combination of DIMUX and the ALU output

		verilog6502/src/cpu_65c02.v
		Line 426 in 06f933fa56
	
				        ABS1:           AB = { DIMUX, ADD };

So, in order to support 32 bit addresses we need to have 16 more bits of temporary storage. Right now it uses the ALU output register, as well as the input register. We can have an ALU shift register which simple stores the last 2 results of the ALU. In the final state when we jump to the new address, it will do

AB = {DIMUX, ALU_SR[1] ALU_SR[0], ADD}

This will be the result of ABS3. ABS0,1,2 will all be the same, just incrementing PC

second state that needs changed is `ABSn` currently there are 2 states, for loading 2 bytes https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L212-L213 ABS0 increments the program counter https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L382-L393 The ALU is set to add by default, AI is set to 0 by default, BI is set to DIMUX by default. ABS1 sets the next address to the combination of DIMUX and the ALU output https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L426 So, in order to support 32 bit addresses we need to have 16 more bits of temporary storage. Right now it uses the ALU output register, as well as the input register. We can have an ALU shift register which simple stores the last 2 results of the ALU. In the final state when we jump to the new address, it will do `AB = {DIMUX, ALU_SR[1] ALU_SR[0], ADD}` This will be the result of ABS3. ABS0,1,2 will all be the same, just incrementing PC

bslathi19 commented

2026-04-25 22:54:43 -07:00

Actually before we do that, we need to do the vectors so that we can even reset the chip.
We start in state BRK0

		verilog6502/src/cpu_65c02.v
		Lines 952 to 953 in 06f933fa56
	
				    if( reset )

				        state <= BRK0;

This sets the address to the current stack pointer, which will still only be 16 bits. We can hardcode the upper 16 bits to 0.

		verilog6502/src/cpu_65c02.v
		Lines 486 to 490 in 06f933fa56
	
				        JSR0,

				        BRK0:    DO = PCH;

				        JSR1,

				        BRK1:    DO = PCL;

In BRK0 and BRK1, as well as JSR0 and JSR1, we push the current PC to the stack. We need to add 2 more states so that we can write all 32 bits, instead of just 16

		verilog6502/src/cpu_65c02.v
		Line 494 in 06f933fa56
	
				        BRK2:    DO = (IRQ | NMI_edge) ? (P & 8'b1110_1111) : P;

In BRK2 we write the processor status register, so we can just move that back a few cycles

		verilog6502/src/cpu_65c02.v
		Lines 441 to 449 in 06f933fa56
	
				        BRK1,

				        JSR1,

				        PULL1,

				        RTS1,

				        RTS2,

				        RTI1,

				        RTI2,

				        RTI3,

				        BRK2:           AB = { STACKPAGE, ADD };

BRK1 and BRK2 increment the address, our stats will do the same

		verilog6502/src/cpu_65c02.v
		Lines 366 to 367 in 06f933fa56
	
				        BRK2:           PC_temp =      res ? 16'hfffc :

				                                  NMI_edge ? 16'hfffa : 16'hfffe;

Here is where the vectors are harcoded. We will change these vectors to be at 0xFFFFFFF4, 0xFFFFFFF8, and 0xFFFFFFFC.

Actually before we do that, we need to do the vectors so that we can even reset the chip. We start in state BRK0 https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L952-L953 This sets the address to the current stack pointer, which will still only be 16 bits. We can hardcode the upper 16 bits to 0. https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L486-L490 In BRK0 and BRK1, as well as JSR0 and JSR1, we push the current PC to the stack. We need to add 2 more states so that we can write all 32 bits, instead of just 16 https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L494 In BRK2 we write the processor status register, so we can just move that back a few cycles https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L441-L449 BRK1 and BRK2 increment the address, our stats will do the same https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L366-L367 Here is where the vectors are harcoded. We will change these vectors to be at 0xFFFFFFF4, 0xFFFFFFF8, and 0xFFFFFFFC.

bslathi19 commented

2026-04-26 08:55:39 -07:00

The BRK changes are added in 9476c6a0dd

Now that we have those, we need to update the JMP state, since it is only waiting 1 cycle for an address

JMP does not really do anything. If we add 2 more JMP0 like states as well as the ALU shift register, this should be trivial.

The BRK changes are added in 9476c6a0dd3bec6bf7d521cdd2c3467bcd3fb929 Now that we have those, we need to update the JMP state, since it is only waiting 1 cycle for an address ![image.png](/attachments/67546c15-05a8-4f59-9800-87c1c87c5695) JMP does not really do anything. If we add 2 more JMP0 like states as well as the ALU shift register, this should be trivial.

image.png

9.8 KiB

bslathi19 commented

2026-04-26 19:29:57 -07:00

JMP changes are added in 019b84f41d

This is the Absolute jump

JMP changes are added in 019b84f41d6ea775194eeeb250fbd0eb185c3779 This is the Absolute jump

bslathi19 commented

2026-04-26 20:34:46 -07:00

Lets tackle absolute for normal instructions next.

747438a9b6

This was pretty simple, we just copy the ABS0 state two more times.

Lets tackle absolute for normal instructions next. 747438a9b678417f56eb94c90a31c456f70056b5 This was pretty simple, we just copy the ABS0 state two more times.

bslathi19 commented

2026-04-26 20:37:22 -07:00

abs,x next.

Looks like we can just copy this state

		verilog6502/src/cpu_65c02.v
		Line 215 in 06f933fa56
	
				    ABSX1  = 6'd3,  // ABS, X  - fetch MSB and send to ALU (+Carry)

2 more times

abs,x next. Looks like we can just copy this state https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L215 2 more times

bslathi19 commented

2026-04-26 21:04:11 -07:00

Added abs,x here
cb6cac1245

Added abs,x here cb6cac12451b7a673680625ea2f77e9d4895305f

bslathi19 commented

2026-04-26 21:06:33 -07:00

abs,y should also be handled by abs,x add those to the test also

bslathi19 commented

2026-04-26 21:12:59 -07:00

Lets tackle absolute,x indirect.

This is states JMPIXn

Like absx, We can probably just copy this state twice

		verilog6502/src/cpu_65c02.v
		Line 264 in 06f933fa56
	
				    JMPIX1 = 6'd52, // JMP (,X)- fetch MSB and send to ALU (+Carry)

Lets tackle absolute,x indirect. This is states JMPIXn Like absx, We can probably just copy this state twice https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L264

bslathi19 commented

2026-04-26 22:02:38 -07:00

Ok that is one in dc339cb725

Now for regular indirect. We can just copy JMPI0 twice.

Ok that is one in dc339cb725af758c7bc9838d4920e9d921d31a55 Now for regular indirect. We can just copy JMPI0 twice.

bslathi19 commented

2026-04-26 22:17:33 -07:00

Added in b31d7490b2

Added in b31d7490b2d318d742cdd20a44003655cc565613

bslathi19 commented

2026-04-26 22:33:55 -07:00

Lets do Indirect Indexed, since it is apparently the most common indirection mode.
according to the state listing, here are the steps that we do

		verilog6502/src/cpu_65c02.v
		Lines 230 to 233 in 06f933fa56
	
				    INDY0  = 6'd18, // (ZP),Y  - fetch ZP address, and send ZP to ALU (+1)

				    INDY1  = 6'd19, // (ZP),Y  - fetch at ZP+1, and send LSB to ALU (+Y)

				    INDY2  = 6'd20, // (ZP),Y  - fetch data, and send MSB to ALU (+Carry)

				    INDY3  = 6'd21, // (ZP),Y) - fetch data (if page boundary crossed)

How should we make this work with 32 bit addresses?

The first step loads the LSB and sends the ZP index to ALU
the second step reads the MSB and sends the LSB to ALU to add Y
the third step reads

So we need to do a combination of steps 2 and 3. Instead of loading data from the calculated address, we need to read the 3rd and 4th bytes from zero page and add the carry. Only then can we read from the computed address.

Lets do Indirect Indexed, since it is apparently the most common indirection mode. according to the state listing, here are the steps that we do https://git.byronlathi.com/bslathi19/verilog6502/src/commit/06f933fa56fb4a83ef4580c3b1febf11fc9c6c59/src/cpu_65c02.v#L230-L233 How should we make this work with 32 bit addresses? The first step loads the LSB and sends the ZP index to ALU the second step reads the MSB and sends the LSB to ALU to add Y the third step reads So we need to do a combination of steps 2 and 3. Instead of loading data from the calculated address, we need to read the 3rd and 4th bytes from zero page and add the carry. Only then can we read from the computed address.

bslathi19 commented

2026-04-27 07:08:01 -07:00

Hmm that plan would not work because we need the ALU to be adding the offset, whereas this instruction also uses the ALU to generate the address.

bslathi19 commented

2026-04-27 20:53:29 -07:00

So we need to load 4 bytes from zeropage, which means we need to calculate 4 new addresses, but at the same time we also need to add the Y register to what we are reading. We might need to add another adder just for handling increasing the address.

bslathi19 commented

2026-04-27 22:03:05 -07:00

dfe27d4ec7

What I ended up doing was adding another signal which could increment the address bus register. This does mean that there are two 32 bit adders, one for PC and one for address bus, along with the ALU. I think that this is better than the alternative of adding extra registers and taking nearly twice as many cycles

dfe27d4ec783cbc8e3f03a18853f24042a6a0c89 What I ended up doing was adding another signal which could increment the address bus register. This does mean that there are two 32 bit adders, one for PC and one for address bus, along with the ALU. I think that this is better than the alternative of adding extra registers and taking nearly twice as many cycles

bslathi19 commented

2026-04-27 22:03:55 -07:00

Last one is indexed indirect, which should be easier since we don't have to add to a 32 bit number, only an 8 bit number.

bslathi19 commented

2026-04-27 22:34:55 -07:00

ok it still uses the ALU to increment the ZP pointer, so we can use the same extra adder that we added for the other ZP state.

bslathi19 commented

2026-04-27 23:06:25 -07:00

Done in 2a9af9e9dc

We've added all addressing modes, but not all instructions will be functional. For example, the absolute and absolute,x addressing modes are supported on a bunch of different instructions, we need to make sure that the timings line up for the side effects.

What we probably need to do is test every single instruction in every single addressing mode that it supports.

Done in 2a9af9e9dcfcde47be772f659a0a8b91d71ee282 We've added all addressing modes, but not all instructions will be functional. For example, the absolute and absolute,x addressing modes are supported on a bunch of different instructions, we need to make sure that the timings line up for the side effects. What we probably need to do is test every single instruction in every single addressing mode that it supports.

bslathi19 commented

2026-04-28 21:45:52 -07:00

I think we should have tests for every instruction which has those addressing modes. Also, we need to handle the extra cycles if there is a page crossing. Lets start at the top width ADC

bslathi19 commented

2026-04-28 22:18:31 -07:00

thats gonna be boring. One thing that we definitely need to test is JSR, RTS, BRK, and RTI.
These involve pushing and popping more bytes to/from the stack.

thats gonna be boring. One thing that we definitely need to test is JSR, RTS, BRK, and RTI. These involve pushing and popping more bytes to/from the stack.

bslathi19 commented

2026-04-28 22:23:12 -07:00

basically we just need 2 more jsr0/jsr1 states, easy as.

bslathi19 commented

2026-04-28 22:59:18 -07:00

What the 6502 does is use the stack pointer as temporary storage for the LSB of the jump target. We need to store 3 bytes instead of 1 though. We use the ALU to decrement the stack pointer while we are writing the current PC and status register to the stack. What we could do instead is read the next 4 bytes and jump to it using the alu output shift register like we do for jump

Hmm we can't use the ALU to both store the stack pointer and also use its shift register to store the target address.

Ah we can swap it back in state JSR4, that will keep it inline since JSR5 will read the second byte of the address.

What the 6502 does is use the stack pointer as temporary storage for the LSB of the jump target. We need to store 3 bytes instead of 1 though. We use the ALU to decrement the stack pointer while we are writing the current PC and status register to the stack. What we could do instead is read the next 4 bytes and jump to it using the alu output shift register like we do for jump Hmm we can't use the ALU to both store the stack pointer and also use its shift register to store the target address. Ah we can swap it back in state JSR4, that will keep it inline since JSR5 will read the second byte of the address.

bslathi19 commented

2026-04-28 23:10:32 -07:00

In terms of cycle count, we have added 4 cycles making JSR take 10 cycles. This is almost double the length, but hey performance was not a goal.

bslathi19 commented

2026-04-28 23:12:25 -07:00

RTS follows a very similar pattern to JSR, except instead of pushing from the stack we read from the stack. This should be a little bit easier too.

bslathi19 commented

2026-04-29 09:39:54 -07:00

we cannot use the ALU to decrease the stack pointer while also using the ALU registers to store the temporary address. What we could do is modify the shift register to take in data from DIMUX instead of just the ALU

bslathi19 commented

2026-04-29 09:55:33 -07:00

when the original cpu pushes the address to the stack, it has already incremented PC once, so if the addresss start at 0x200

JSR LSB MSB, the PC is pointing at MSB when we start writing. This is 1 less than the address we want to jump to. When we are writing 32 bits, we are now 3 less than what we need to jump to. We cannot just keep increaseing PC though becuase that means it will be changing as we are writing to it. We could add 2 more dummy cycles at the end of RTS, or we could make it so that we can add 3 to PC instead of just 1. One requires more area, one requires more cycles. Since area is basically free, lets add the +3

when the original cpu pushes the address to the stack, it has already incremented PC once, so if the addresss start at 0x200 JSR LSB MSB, the PC is pointing at MSB when we start writing. This is 1 less than the address we want to jump to. When we are writing 32 bits, we are now 3 less than what we need to jump to. We cannot just keep increaseing PC though becuase that means it will be changing as we are writing to it. We could add 2 more dummy cycles at the end of RTS, or we could make it so that we can add 3 to PC instead of just 1. One requires more area, one requires more cycles. Since area is basically free, lets add the +3

bslathi19 commented

2026-04-30 20:50:27 -07:00

Ah we need to use the alu_sr sel to store dimux into the shift register, so that we can use the alu to decrement the stack pointer

bslathi19 commented

2026-04-30 21:31:32 -07:00

Now we need to implement BRK and RTI. This should be mostly similar to JSR and RTS except that we push and pull the flags register as well.

bslathi19 commented

2026-04-30 22:27:34 -07:00

We basically did BRK earlier when we did reset. RTI now works, but we should test external interrupts though just to make sure they function as expected.

bslathi19 commented

2026-04-30 22:29:35 -07:00

Oh also the branch commands will need to take calculate a 32 bit address instead of 16 bit, so thats potentially 2 more cycles that each one will take.

bslathi19 commented

2026-04-30 22:52:25 -07:00

So it kind of works but WAI increase pc by too many, it should increment PC by 1 put it increments it by 2 instead. WAI is kind of fake, but we can add a state to decode to fix this anyway.

bslathi19 commented

2026-05-04 23:02:18 -07:00

Ok we changed how branch works, so it should mostly work now. With that, I think we have everything mostly functional. Now we can work on modifying cc65 to generate code for our new target.

bslathi19 commented

2026-05-09 15:29:25 -07:00

I think this is ready to merge, works good enough.

bslathi19 closed this issue

2026-05-09 15:34:23 -07:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: bslathi19/verilog6502#1

	ABS0 = 6'd0, // ABS - fetch LSB
	ABS1 = 6'd1, // ABS - fetch MSB

	BRK2: PC_temp = res ? 16'hfffc :
	NMI_edge ? 16'hfffa : 16'hfffe;