Adventures in GameBoy Emulation: Part 2 (The CPU)


Time for part 2 in the series. Seemed like the time to do it as the CPU is the big focus of DMGe at the minute. The goal is to get the 256 byte on chip BIOS ROM to run from start to finish. It’s a fairly simple program with the purpose of initialising the hardware and checking that the Nintendo logo on the cartridge matches the logo stored in the BIOS ROM.

It does this by comparing each logo byte-for-byte.


The BIOS is really the simplest program for the GameBoy that we could possibly have. It essentially does the following when it runs:

  • Set VRAM to zero.
  • Initialise the stack at 0xFFFE.
  • Initialise audio.
  • Set the background palette.
  • Decompress and store cartridge logo data in VRAM.
  • Set the background tilemap.
  • Display the logo and scroll it to the centre of the screen.
  • Play the “po-ling” sound.
  • Compare the logo data on the cartridge to the logo data in the BIOS ROM. If it’s a match, finish running and transfer control to 0x100 (mapped to the cartridge), if it doesn’t match, jump into an infinite loop and hang. Note that the BIOS also unmaps itself from memory by writing 0x01 to 0xFF50.

So there you have it. Probably as simple as you can get on the GameBoy. At the minute, we’re stepping through each opcode and implementing it as needed, as that will allow us to start work on the GPU and get something running properly before we move on to emulating games.

Many emulators would just ignore the BIOS and start right at 0x100, but the goal with DMGe is to be more true to the original experience of playing the GameBoy. Therefore, the most important part of the BIOS is to ensure that the logo comparison succeeds (or else we won’t be playing many games šŸ˜¦ ).

The magic happens in the following:

          LD HL,$0104 
          LD DE,$00a8

          LD A,(DE)
          INC DE
          CP (HL)
          JR NZ,$fe
          INC HL
          LD A,L
          CP $34
          JR NZ, Addr_00E6

Just a brief explanation of this code. It’s fairly simple. It just does the following:

Create two pointers. One to the logo data in the cartridge (0x104) and one to the logo data in the BIOS ROM (0xA8):

LD HL, $0104
LD DE, $00a8

Load a byte from the logo data in the BIOS ROM (pointed to by DE), increment DE by one byte, compare it with the same byte from the cartridge (pointed to by HL) and if they match, increment HL to check the next byte. If at any point the bytes don’t match, the “CP (HL)” instructionĀ won’t set the zero flag, and “JP NZ, $fe” will jump into an infinite loop and stop anything else from happening. The final piece of the code simply checks if we’ve compared every byte yet or whether we should repeat another loop.

          LD A,(DE)
          INC DE 
          CP (HL) 
          JR NZ,$fe
          INC HL
          LD A,L 
          CP $34 
          JR NZ, Addr_00E6

That’s essentially the purpose of the GameBoy BIOS in a nutshell. Let’s move on to how we’ll actually run it.

Emulating the CPU

The CPU chip from the Nintendo GameBoy.
The CPU chip from the Nintendo GameBoy.

As stated in the previous installment to this series, the GameBoy CPU (pictured above) is an 8-bit SharpĀ LR35902. Many people refer to it as a Z80, but this isn’t strictly true. While it contains several Z80 specific instruction set enhancements over the Intel 8080, it doesn’t retain any of the registers that were introduced in the Z80. I suppose you could call it an amalgamation of the Intel 8080 and the Z80.

The way that we emulate the CPU in DMGe is an interpreter. We could use alternative, faster, ways of doing things such as dynamic recompilation, but I feel that it’s pretty unnecessary given the speed of most modern computers. It would only serve to make things more complicated than they have to be.

The emulation process is basically as follows:

  • Fetch opcode.
  • Look up opcode in opcode table.
  • Call relevant function which emulates the opcode.
  • Store the cycles taken.
  • Increment the program counter.
  • Add cycles taken to total cycles.
  • Repeat.

You’ll notice that we deal with cycles. Don’t worry too much about this at the minute. It will be used later for timing. Specifically timing the GPU states to the CPU.


The GameBoy CPU has several registers which it uses for storing small amounts of data for fast access during execution. We first define a structure as a template for each register:

 typedef union
           uint8_t lo;
           uint8_t hi;
      uint16_t word;

This allows us to access each register pair as a word (for example AF), or we can access the pair as two 8-bit registers.

We then use this structure to create another structure that actually holds our register definitions:

 //Our CPU regisgers
 typedef struct
      //Our basic registers, AF, BC, DE, HL
      Register16_t AF, BC, DE, HL;

      //Special Registers
      Register16_t SP, PC;

That’s it really for the registers. Now that this has been covered we can move on to the opcodes.


This is by far the biggest task in developing an emulator. CPUs have A LOT of instructions. In order for most things to run, we need to implement all of them, and it isn’t an easy task. It’s very time consuming and finding a mistake/bug can be very difficult, but this is the part where we can start to watch GameBoy code run, and in my opinion, it’s worth it.

We need to start off by defining two opcode tables. One for standard opcodes and one for special opcodes (bit operations):

 //Opcode tables
 void (c_DMGCPU::*OPCodes[0xFF])(void);
 void (c_DMGCPU::*OPCodesCB[0xFF])(void);

You’ll notice that one array is called “OPCodesCB”. This is because bit operation opcodes tend to be prefixed with 0xCB on the GameBoy. I’ll not go into detail on how we differentiate between standard opcodes and prefixed ones, but it’s essentially just a switch that detects when 0xCB is present at the program counter and calls an opcode from “OPCodesCB” instead of from the standard table.

We populate the opcode table line by line with each function. An example would be:

OPCodes[0x3E] = &c_DMGCPU::OPCode0x3E;

Really all we’re doing here is storing the address of a class member function in the array using an address-of operator (&). This allows us to call the correct function when we receive an opcode. This is also pretty simple:


Here, we’re using a dereference operator (*) to call the correct function. Basically, in English this would be “call the function whose address is stored at index ReadByte(PC) in OPCodes”.

So let’s look at the actual implementation of an opcode. Before we do this, we need to know what opcode we’re going to implement, and exactly what it does. There are several references available for this and I’ll link them in at the bottom of the post.

For this we’ll be implementing “LD A, d8”. This is quite a simple instruction that just loads an immediate 8-bit value stored in memory at PC + 1 into register A. The code is as follows:

//Load immediate 8-bit value into A.
void c_DMGCPU::OPCode0x3E()
     DbgOut(DBG_CPU, VERBOSE_2, "LD A, d8");
     Registers.AF.hi = MMU->ReadByte(Registers.PC.word + 1);
     Clock.m = 2;
     Clock.t = 8;
     Registers.PC.word += 2;

All we’re doing is sending some debug text to stdio to tell us what the CPU is doing, and then doing the operation. We then set how many cycles the operation took and increment the program counter by two bytes, as the instruction is two bytes long “3E dd”.

That is really all there is to say about implementing the opcodes. It’s just a case of implementing each opcode and ensuring that your code accurately simulates each processor instruction. It’s also worthwhile taking a look at the GameBoy CPU Manual to ensure that you’re updating the flags correctly for arithmetic instructions also.

The Stack

I feel that this is a topic which should also be covered. The purpose of this is not to be a tutorial on how to build an emulator, but more as an introduction to emulation. However, I think that this is an important point for would-be emulator programmers.

In almost any system that you come up against in emulation, there will be some sort of stack to deal with. If you have any processor which is capable of calling functions, a stack will be necessary. Let’s take a look at some code from the BIOS:

LD SP, $fffe

SP stands for “Stack Pointer”. It tells the CPU where to put the stack in memory and it is down to the program which is running to decide this. In this case, the BIOS sets the stack to 0xFFFE in memory. You might think that this would be a bad idea, as it is so close to the top of the addressable memory (remember we only have a 16-bit address bus), but in the GameBoy the stack grows downwards.

As an example of how the stack is used, we’ll take a look at calling and returning from a function/subroutine. Any time a program on the GameBoy wants to use a function it’ll use the “CALL a16” function (or maybe a conditional call, but we won’t go into that here). The CPU then needs to store the address of the instruction after the “CALL” onto the stack and jump to that function. This is how it is implemented in DMGe:

void c_DMGCPU::OPCode0xCD()
     DbgOut(DBG_CPU, VERBOSE_2, "CALL a16");

     //Write address of next instruction to the stack and decrement SP.
     MMU->WriteWord(Registers.SP.word, Registers.PC.word + 3);

     //We wrote two bytes, so decrement accordingly. (Stack grows downwards).
     Registers.SP.word -= 2;

     //Set PC to address of function.
     Registers.PC.word = MMU->ReadWord(Registers.PC.word + 1);

     //Set how many cycles it took us to complete this opcode.
     Clock.m = 3;
     Clock.t = 24;

As you can see, it’s quite simple. We write the address of the next instruction onto the stack, decrement the stack pointer by 2 as we wrote 2 bytes to the stack and jump to the function/subroutine.

We then also have the problem of how we get back to where we should be when the function is finished. This is done in most CPUs with a return operation. In GameBoy assembler it’s usually written as “RET”. Here’s the implementation from DMGe:

void c_DMGCPU::OPCode0xC9()
     DbgOut(DBG_CPU, VERBOSE_2, "RET");

     //Increment SP to find the return address.
     Registers.SP.word += 2;

     //And jump there.
     Registers.PC.word = MMU->ReadWord(Registers.SP.word);

     //Set many cycles we took to complete this operation.
     Clock.m = 1;
     Clock.t = 16;

This is more or less the opposite of the “CALL” opcode. We increment the stack pointer to find our return address, read that address into the program counter and we’re done!

The stack can also be used to temporarily store data by pushing and poping registers, but this works essentially the same way that calling functions does, and so I won’t cover it.

Conclusion and Useful Links

And that’s part 2 of the series. I hope you’re enjoying it so far. Next, I’ll be covering the MMU and GPU.

If you’re working on your own emulator (doesn’t necessarily need to be for the GameBoy) and you’re finding this helpful, or need some help, don’t hesitate to get in touch by leaving a reply to this post or using the contact form :).

Here are some of the references I’ve been using. They might be useful to anyone else working on a GameBoy emulator:

GameBoy CPU Manual

GameBoy Opcode Table

GameBoy Wikipedia Article

Until next time šŸ™‚