Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hookup emu-russia/dmgcpu #1

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Rodrigodd
Copy link
Contributor

Hello! For some weeks now, I have been investigating if I could use the Verilog simulation in this repo to better understand the inner workings of the Game Boy and in some way improve the precision of my Game Boy emulator.

The main point that I want to investigate was the inner work of the PPU, and the precise read/write timing between the CPU and the PPU. But, as far as I know, you are using only an equivalent CPU implementation, not a reverse-engineered CPU from the Game Boy. This made me a little hesitant in learning its timing, knowing it may still be off.

But I discovered that there is a reverse-engineered HDL of the CPU core at https://github.com/emu-russia/dmgcpu, and I decided to try hooking it up with the simulation in this repo.

Connecting the two together went almost painless. But I could not get it to work; the CPU never starts running after the circuit RESET.

I tried investigating where things were going wrong, but I still could not figure it out (my investigation is resumed in notes.txt, copied in the section below), and I am not experienced in debugging Verilog. I could investigate a little more, but I presume there will be many more problems after fixing this one, so I became a little demotivated.

Regardless, I decided to publish my work here to not make it all go to waste.


Investigation

The address bus of the CPU, when running in dmg-sim, never increments, whereas the simulation in dmgcpu does.

The signal d of the Sequencer never changes from 0x00..04000 to 0x00402..00, like the simulation in dmgcpu does.

The following inputs to the CPU differ from the simulation in dmgcpu:

  • slightly changed in the CLKs' phase during the reset, nothing major.
  • OSC_STABLE is always low, instead of high before the reset.
  • NMI is unconnected, instead of always 0
  • IPL_REQ is always high, instead of always 0
  • SYNC_RESET is always high, instead of going low after the reset.

OSC_STABLE becomes high when UPOF (last bit of DIV, 16Hz) is high, and TUBO, a latch for the CLK_ENA signal from the CPU, becomes low. In dmgcpu, UPOF is hardcoded to be high.

OSC_STABLE = T1&~T2 | ~T1&T2 | UPOF&~TUBO
TUBO = nor_latch(.set(CLK_ENA), .reset(RESET | ~OSC_ENA))

OSC_STABLE feeds the SYNC_RESET (CPU T12), which becomes locked low in ASOL latch. This made me think that CLK_ENA should only go high after SYNC_RESET is low.

Looking at Seq.v (see logic.py), CLK_ENA is low when RESET is high, or if the CPU halted:

RESET SYNC_RESET CLK_ENA
0 0 ~(d[100] | IR[4] & d[101]) | SeqControl_1
0 1 ~(d[100] | IR[4] & d[101])
1 x 0

From _GekkioNames.v:

  • d[100]: s1_op_halt_s0xx
  • d[101]: s1_op_nop_stop_s0xx

So CLK_ENA is almost unaffected by SYNC_RESET.

Another way to make OSC_STABLE go high is to ensure to only reset when UPOF is high. But UPOF does not run until RESET is low! And the CPU RESET is directly connected to the global simulation RESET!

Maybe UPOF should be reset high? Probably not, it may mess up with the DIV timing, and the simulation is too slow to make guessed changes.

UPOF is reset by RESET_DIV:

  • ~RESET_DIV = ~(RESET | ~OSC_STABLE | (FF04_FF07 & CPU_WR & ~A1 & ~A2)).
  • RESET_DIV = RESET | ~OSC_STABLE | <write to DIV>

ASOL is reset by RESET.

@msinger
Copy link
Owner

msinger commented Mar 17, 2024

Hi Rodrigo,

thank you very much for sharing your work. I want to look into this, and I just realized that I had fixed two small bugs in my working directory that I hadn't committed yet. I just committed them now, so maybe you want to rebase your stuff to the new changes on master. At first glance, I'm very happy that you found a stupid mistake I made with the SERY inverter feeding itself. I still had graphical glitches when simulating Zelda DX. I will run a new simulation with this fix, maybe it helps. If you wouldn't mind, could you make a separate pull request for that SERY fix? I will merge that immediately.

I will comment a bit more on this when I have more time and after I made a few tests myself.

Thanks,
Michael

was set to `sery = !sery` before.
Was need when trying to hook up emurussia/dmgcpu.
@msinger
Copy link
Owner

msinger commented Mar 17, 2024

Okay, it looks to me, that the CPU is misbehaving. It raises the cpu_clk_ena signal too early. I documented that signal a few years ago here: http://iceboy.a-singer.de/doc/dmg_cpu_connections.html
You can read it in the row T11 of the table on that page. The second paragraph of the description is the important one. It says that the CLK_ENA (T11) signal of the CPU must be initialized with 0 and raised by the CPU when TABA/OSC_STABLE (T15) gets high. If the CPU raises CLK_ENA earlier than this, then SYNC_RESET (T12) will never be released by the circuit behind AFER.
I documented this before we had detailed die shots of the CPU core. So I didn't have insight and details could be wrong. Maybe @ogamespec can help us, when he sees that his name got referenced here. :)

What we can see is that SYNC_RESET never gets released:
image
It should get released after around 32 ms.
When we zoom in, we see that cpu_clk_ena gets raised way too early:
image
Because of that, TABA never gets pulsed and SYNC_RESET never gets released.

I removed this port from the CPU instantiation:

//.CLK_ENA(cpu_clk_ena),   // out T11

Then I added these two lines somewhere above or below the CPU instantiation:

initial cpu_clk_ena = 0;
always_ff @(negedge cpu_in_t12) cpu_clk_ena <= 1;

In a hackish way, this ensures that cpu_clk_enais raised at the correct time.
Now it looks like the CPU starts running and the instruction register, data and address lines are changing:
image
The simulation runs very slow though, I haven't seen any changes to the PPU registers yet. I think I'm running the original bootrom, so this may still take a while until it gets there.

I noticed that you added the timescale directive everywhere. This isn't necessary in Icarus. I configured the timescale in the timescale.f file. It is given to Icarus at command line. But maybe you need this for Verilator, idk. I don't even know which one Icarus uses if both are given.

I hope this hack helps you to continue with your research. I don't know where exactly the issue is. I'm planning to do my own implementation of the CPU, but first I want to finish the CPU layout that I'm recreating in Electric VLSI. I'm working on that for a while now and it is very frustrating, because it seems that I can't get the proportions right while simultaneously obeying the spacing rules of the MOSIS fabrication technology that I've selected for the project. It's good to know that someone finds this stuff useful. I think @ogamespec will be also happy that you are using his CPU. If he's not reacting here, then maybe you could also write him an issue on his project, just to let him know at least.

@msinger
Copy link
Owner

msinger commented Mar 17, 2024

I just noticed, something else is not working correctly. The conditional jump instruction at address 0x000a at the beginning of the bootrom is failing somehow and the CPU restarts executing at address 0 over and over again.

	LD SP,$fffe		; $0000  Setup Stack

	XOR A			; $0003  Zero the memory from $8000-$9FFF (VRAM)
	LD HL,$9fff		; $0004
Addr_0007:
	LD (HL-),A		; $0007
	BIT 7,H		; $0008
	JR NZ, Addr_0007	; $000a     <-- This fails and it jumps to 0

@Rodrigodd
Copy link
Contributor Author

Thank you! So it is exactly what I had first suspected. I tried looking at the CPU sequencer netlist trace to see if there was anything obviously wrong, but I had no confidence I would figure out something. I will start using your hack.

I noticed that you added the timescale directive everywhere.

dmgcpu has timescale directives in every file, and when connecting the two projects together, Icarus was emitting a warning or error about having timescales in just some files. But I didn't research what was the most sensible fix for that, this was just a hack fix for now.

I think @ogamespec will be also happy that you are using his CPU. If he's not reacting here, then maybe you could also write him an issue on his project, just to let him know at least.

I will make an issue there, at least to make things more cross-referenced.

I just noticed something else is not working correctly. The conditional jump instruction at address 0x000a at the beginning of the bootrom is failing somehow, and the CPU restarts executing at address 0 over and over again.

I will take a look at that. I had my emulator emitting VCD traces, so it is easier to spot where an error is first introduced.

Again, thank you for the help!

@msinger
Copy link
Owner

msinger commented Mar 17, 2024

Glad I could help. Let me know if there is any progress.

@ogamespec
Copy link

Hi, I've read what's been written here, but for the most part it's all outside the SM83 Core, and that's not my "expertise" there (it's already been studied by others and I don't go there).
Regarding OSC_STABLE and CLK_ENA, I replied in Issue (emu-russia/dmgcpu#219);
As for timescale and CLK, I do it this way:

`timescale 1ns/1ns
always #25 CLK = ~CLK;

That is, 1 simulation step is equal to 1 ns, but I made the CLK transit longer, so that the circuits had time to "settle".
Also keep in mind that HDL in emu-russia/dmgcpu is in the "NOP Engine" stage :D That is, I made sure that something is moving there and somehow lost interest.
Good luck!

@Rodrigodd
Copy link
Contributor Author

Updated the dmgcpu to my submodule.

With emu-russia/dmgcpu#219 fixed, the CPU now starts running instructions. I am testing the execution in quickboot.bin. It had a problem where all registers were inverted, hackly fixed (see emu-russia/dmgcpu#240); and now it is derailing on a RET after a CALL, probably because writing to memory doesn't work (see emu-russia/dmgcpu#239).

Fix check of conditional flags. See emu-russia/dmgcpu#266.
The delays in the CLK6 is not necessary anymore. A transparent DLatch
was added to the condtional branch logic (as the original hardware has),
avoiding the need of the delays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants