Adventures in GameBoy Emulation: Part 2 (The CPU)

Introduction

Time for part 2 in the series. Seemed like the time to do it as the CPU is the big focus of DMGe at the minute. The goal is to get the 256 byte on chip BIOS ROM to run from start to finish. It’s a fairly simple program with the purpose of initialising the hardware and checking that the Nintendo logo on the cartridge matches the logo stored in the BIOS ROM.

It does this by comparing each logo byte-for-byte.

The BIOS

The BIOS is really the simplest program for the GameBoy that we could possibly have. It essentially does the following when it runs:

  • Set VRAM to zero.
  • Initialise the stack at 0xFFFE.
  • Initialise audio.
  • Set the background palette.
  • Decompress and store cartridge logo data in VRAM.
  • Set the background tilemap.
  • Display the logo and scroll it to the centre of the screen.
  • Play the “po-ling” sound.
  • Compare the logo data on the cartridge to the logo data in the BIOS ROM. If it’s a match, finish running and transfer control to 0x100 (mapped to the cartridge), if it doesn’t match, jump into an infinite loop and hang. Note that the BIOS also unmaps itself from memory by writing 0x01 to 0xFF50.

So there you have it. Probably as simple as you can get on the GameBoy. At the minute, we’re stepping through each opcode and implementing it as needed, as that will allow us to start work on the GPU and get something running properly before we move on to emulating games.

Many emulators would just ignore the BIOS and start right at 0x100, but the goal with DMGe is to be more true to the original experience of playing the GameBoy. Therefore, the most important part of the BIOS is to ensure that the logo comparison succeeds (or else we won’t be playing many games ūüė¶ ).

The magic happens in the following:

Addr_00E0:
          LD HL,$0104 
          LD DE,$00a8

Addr_00E6:
          LD A,(DE)
          INC DE
          CP (HL)
          JR NZ,$fe
          INC HL
          LD A,L
          CP $34
          JR NZ, Addr_00E6

Just a brief explanation of this code. It’s fairly simple. It just does the following:

Create two pointers. One to the logo data in the cartridge (0x104) and one to the logo data in the BIOS ROM (0xA8):

LD HL, $0104
LD DE, $00a8

Load a byte from the logo data in the BIOS ROM (pointed to by DE), increment DE by one byte, compare it with the same byte from the cartridge (pointed to by HL) and if they match, increment HL to check the next byte. If at any point the bytes don’t match, the “CP (HL)” instruction¬†won’t set the zero flag, and “JP NZ, $fe” will jump into an infinite loop and stop anything else from happening. The final piece of the code simply checks if we’ve compared every byte yet or whether we should repeat another loop.

Addr_00E6:
          LD A,(DE)
          INC DE 
          CP (HL) 
          JR NZ,$fe
          INC HL
          LD A,L 
          CP $34 
          JR NZ, Addr_00E6

That’s essentially the purpose of the GameBoy BIOS in a nutshell. Let’s move on to how we’ll actually run it.

Emulating the CPU

The CPU chip from the Nintendo GameBoy.
The CPU chip from the Nintendo GameBoy.

As stated in the previous installment to this series, the GameBoy CPU (pictured above) is an 8-bit Sharp¬†LR35902. Many people refer to it as a Z80, but this isn’t strictly true. While it contains several Z80 specific instruction set enhancements over the Intel 8080, it doesn’t retain any of the registers that were introduced in the Z80. I suppose you could call it an amalgamation of the Intel 8080 and the Z80.

The way that we emulate the CPU in DMGe is an interpreter. We could use alternative, faster, ways of doing things such as dynamic recompilation, but I feel that it’s pretty unnecessary given the speed of most modern computers. It would only serve to make things more complicated than they have to be.

The emulation process is basically as follows:

  • Fetch opcode.
  • Look up opcode in opcode table.
  • Call relevant function which emulates the opcode.
  • Store the cycles taken.
  • Increment the program counter.
  • Add cycles taken to total cycles.
  • Repeat.

You’ll notice that we deal with cycles. Don’t worry too much about this at the minute. It will be used later for timing. Specifically timing the GPU states to the CPU.

Registers

The GameBoy CPU has several registers which it uses for storing small amounts of data for fast access during execution. We first define a structure as a template for each register:

 typedef union
 {
      struct
      {
           uint8_t lo;
           uint8_t hi;
      };
      uint16_t word;
 }Register16_t;

This allows us to access each register pair as a word (for example AF), or we can access the pair as two 8-bit registers.

We then use this structure to create another structure that actually holds our register definitions:

 //Our CPU regisgers
 typedef struct
 {
      //Our basic registers, AF, BC, DE, HL
      Register16_t AF, BC, DE, HL;

      //Special Registers
      Register16_t SP, PC;
 }regs_t;

That’s it really for the registers. Now that this has been covered we can move on to the opcodes.

Opcodes

This is by far the biggest task in developing an emulator. CPUs have A LOT of instructions. In order for most things to run, we need to implement all of them, and it isn’t an easy task. It’s very time consuming and finding a mistake/bug can be very difficult, but this is the part where we can start to watch GameBoy code run, and in my opinion, it’s worth it.

We need to start off by defining two opcode tables. One for standard opcodes and one for special opcodes (bit operations):

 //Opcode tables
 void (c_DMGCPU::*OPCodes[0xFF])(void);
 void (c_DMGCPU::*OPCodesCB[0xFF])(void);

You’ll notice that one array is called “OPCodesCB”. This is because bit operation opcodes tend to be prefixed with 0xCB on the GameBoy. I’ll not go into detail on how we differentiate between standard opcodes and prefixed ones, but it’s essentially just a switch that detects when 0xCB is present at the program counter and calls an opcode from “OPCodesCB” instead of from the standard table.

We populate the opcode table line by line with each function. An example would be:

OPCodes[0x3E] = &c_DMGCPU::OPCode0x3E;

Really all we’re doing here is storing the address of a class member function in the array using an address-of operator (&). This allows us to call the correct function when we receive an opcode. This is also pretty simple:

(this->*OPCodes[MMU->ReadByte(Registers.PC.word)])();

Here, we’re using a dereference operator (*) to call the correct function. Basically, in English this would be “call the function whose address is stored at index ReadByte(PC) in OPCodes”.

So let’s look at the actual implementation of an opcode. Before we do this, we need to know what opcode we’re going to implement, and exactly what it does. There are several references available for this and I’ll link them in at the bottom of the post.

For this we’ll be implementing “LD A, d8”. This is quite a simple instruction that just loads an immediate 8-bit value stored in memory at PC + 1 into register A. The code is as follows:

//Load immediate 8-bit value into A.
void c_DMGCPU::OPCode0x3E()
{
     DbgOut(DBG_CPU, VERBOSE_2, "LD A, d8");
     Registers.AF.hi = MMU->ReadByte(Registers.PC.word + 1);
     Clock.m = 2;
     Clock.t = 8;
     Registers.PC.word += 2;
}

All we’re doing is sending some debug text to stdio to tell us what the CPU is doing, and then doing the operation. We then set how many cycles the operation took and increment the program counter by two bytes, as the instruction is two bytes long “3E dd”.

That is really all there is to say about implementing the opcodes. It’s just a case of implementing each opcode and ensuring that your code accurately simulates each processor instruction. It’s also worthwhile taking a look at the GameBoy CPU Manual to ensure that you’re updating the flags correctly for arithmetic instructions also.

The Stack

I feel that this is a topic which should also be covered. The purpose of this is not to be a tutorial on how to build an emulator, but more as an introduction to emulation. However, I think that this is an important point for would-be emulator programmers.

In almost any system that you come up against in emulation, there will be some sort of stack to deal with. If you have any processor which is capable of calling functions, a stack will be necessary. Let’s take a look at some code from the BIOS:

LD SP, $fffe

SP stands for “Stack Pointer”. It tells the CPU where to put the stack in memory and it is down to the program which is running to decide this. In this case, the BIOS sets the stack to 0xFFFE in memory. You might think that this would be a bad idea, as it is so close to the top of the addressable memory (remember we only have a 16-bit address bus), but in the GameBoy the stack grows downwards.

As an example of how the stack is used, we’ll take a look at calling and returning from a function/subroutine. Any time a program on the GameBoy wants to use a function it’ll use the “CALL a16” function (or maybe a conditional call, but we won’t go into that here). The CPU then needs to store the address of the instruction after the “CALL” onto the stack and jump to that function. This is how it is implemented in DMGe:

void c_DMGCPU::OPCode0xCD()
{
     DbgOut(DBG_CPU, VERBOSE_2, "CALL a16");

     //Write address of next instruction to the stack and decrement SP.
     MMU->WriteWord(Registers.SP.word, Registers.PC.word + 3);

     //We wrote two bytes, so decrement accordingly. (Stack grows downwards).
     Registers.SP.word -= 2;

     //Set PC to address of function.
     Registers.PC.word = MMU->ReadWord(Registers.PC.word + 1);

     //Set how many cycles it took us to complete this opcode.
     Clock.m = 3;
     Clock.t = 24;
}

As you can see, it’s quite simple. We write the address of the next instruction onto the stack, decrement the stack pointer by 2 as we wrote 2 bytes to the stack and jump to the function/subroutine.

We then also have the problem of how we get back to where we should be when the function is finished. This is done in most CPUs with a return operation. In GameBoy assembler it’s usually written as “RET”. Here’s the implementation from DMGe:

void c_DMGCPU::OPCode0xC9()
{
     DbgOut(DBG_CPU, VERBOSE_2, "RET");

     //Increment SP to find the return address.
     Registers.SP.word += 2;

     //And jump there.
     Registers.PC.word = MMU->ReadWord(Registers.SP.word);

     //Set many cycles we took to complete this operation.
     Clock.m = 1;
     Clock.t = 16;
}

This is more or less the opposite of the “CALL” opcode. We increment the stack pointer to find our return address, read that address into the program counter and we’re done!

The stack can also be used to temporarily store data by pushing and poping registers, but this works essentially the same way that calling functions does, and so I won’t cover it.

Conclusion and Useful Links

And that’s part 2 of the series. I hope you’re enjoying it so far. Next, I’ll be covering the MMU and GPU.

If you’re working on your own emulator (doesn’t necessarily need to be for the GameBoy) and you’re finding this helpful, or need some help, don’t hesitate to get in touch by leaving a reply to this post or using the contact form :).

Here are some of the references I’ve been using. They might be useful to anyone else working on a GameBoy emulator:

GameBoy CPU Manual

GameBoy Opcode Table

GameBoy Wikipedia Article

Until next time ūüôā

SRAM P-Mod for Digilent Basys 2 FPGA Board: Part 1

Introduction

In this series, I’ll be covering the design and building of an SPI SRAM PMOD for the Digilent Basys2 FPGA Board.

This project came about more out of necessity than anything else. I want to design an 8-bit CPU on my FPGA board, but it has no RAM (other than the block ram in the FPGA, but it is very limited). Having searched and searched, I was unable to find a premade PMOD (the name that Digilent gives to expansion boards) that contained RAM, so I thought, “Hey! Why not make my own?”.

Choosing Parts

The first job was to source cheap, high-quality parts before I even attempted to design a PCB. I was lucky to find everything that I needed at Farnell. Lets have a look at our bill of materials:

As you can see, this isn’t exactly a very big project, which is good, because it accomplishes the task at hand without costing a fortune. The bill of materials cost is extremely low, working out at roughly ¬£12.26, which already includes the cost of the PCB. I’ve found an excellent PCB fabricator at OSH Park. They do large runs with multiple boards in order to keep costs down and end up costing far less than most other fabricators.

The Design

The design itself was not complicated. It was more a matter of joining the right leads on the IC up to the right pins on the PMOD header. No huge amounts of electronic engineering involved. About the only thing that required a little thought was the choice of filter capacitor. Here’s the finished design:

Eagle PCB design for SRAM PMOD
My SRAM PMOD design in eagle.
My SRAM PMOD design rendered by OSH Park.
My SRAM PMOD design rendered by OSH Park.

You might notice in my designe that the “Chip Select” pin (pin 1) is tied low at all times. This is because I don’t plan on disabling the chip at any time, and also because my Basys2 PMOD header doesn’t have enough I/O space to connect it anywhere else.

What Next?

This is as far as I am currently. After I receive my boards and components I’ll write another article showing it being assembled and tested. I’ll also include the eagle schematic and board files for download so you can build one yourself!

Until next time.

COEGen v0.01 – Generate .coe files from binary files for Xilinx FPGA block RAM.

What it Does

This is just a simple utility for creating .coe files to initialise Xilinx FPGA block ram. It takes an input file, typically binary data of some sort, and out puts ‘inputfilename’.coe.

You can even set the width of each memory block and the length of the memory.

How to Use

Using the program is pretty simple. Here’s an example:

 $ COEGen --file binary.bin --width 8 --depth 256 

This will output ‘binary.bin.coe’. This will initialise 256 bytes of 8-bit memory from the file binary.bin.

Note that if the specified memory size is larger than the input file, the rest of the block RAM will be filled with zeros.

Download

The easiest way to get COEGen is to install a binary package. Here’s the link:

coegen-0.01_i386.deb

Update:

I’ve uploaded a build of COEGen for Windows.

COEGen-0.01.zip

You’ll need to use Code::Blocks to build it. The only dependencies are the boost libraries. How to get the source:

 $ git clone https://github.com/wornwinter/COEGen.git 

If you have any issues, feel free to contact me by commenting below. Enjoy!

Adventures in GameBoy Emulation: Part 1

A Little Background

This is a short series I’ll be writing to document a GameBoy emulator that I’m working on (DMGe).

The project was created out of a merger between two projects.

WWGB: My own project.

ZGB: A project that I collaborated in on GitHub.

Now, I know that it’s been done time and time again. A new GameBoy emulator is nothing to get too excited about, but that’s not the point of this project. It’s a learning experience. Primarily, I want to learn more about embedded architecture, and you might learn something too by reading this series.

Why I Chose the GameBoy

Originally, I wrote a Chip8 emulator. This served as a great learning experience, but was by no means a good example of embedded system emulation. The Chip8 interpreter lacked a lot of things that are typical most bespoke computer systems. Namely the following:

  • No real MMU or memory mapping.
  • No clock cycles/machine cycles or timing to consider (apart from 60Hz timers).
  • No interrupts.
  • Very limited opcode set.
  • No variable length instructions.

The GameBoy, on the other hand, has all of these things and a simple enough opcode set too. Another key area that interested me was the memory bank switching circuits for accessing all of the cartridge data within a 16-bit address space.

Research

Implementing an emulator is never an easy task. You’re essentially rebuilding a system from the ground up in most cases. The first place to start then is research. You have to know an awful lot about the system you’re planning to emulate before you even start to code.

I started out by finding what technical documentation I could on the Internet. Being such an old console, there is no lack of it. There has been plenty of time for people to reverse engineer the console and document it since its release. Links are included to my reference materials at the bottom of this post.

Key Points

These are the core parts of the GameBoy that I would have to consider in order to get started.

  • CPU – Custom Sharp Core (LR35902). Essentially an Intel 8080 variant that has some of the improvements introduced in the Z80 present. Two sets of opcodes. Those that are non-prefixed, and those that are prefixed with 0xCB. The 0xCB opcodes deal primarily with bit manipulation, while the others deal with typical processor operations such as load immediate, jump etc.
  • MMU – The GameBoy has a 16-bit address space, allowing for a total of 65535 bytes to be addressed, although in reality the console does not have this much memory to address. For example, reading from 0xC000 -> 0xE000 will return the same values as reading from 0xE000 -> 0xFE00 as the internal 8kB of RAM is mirrored in both these locations. It also possesses specific locations for switchable ROM and RAM banks, VRAM, sprite attribute memory, interrupts and general I/O.

Things like graphics and sound I decided to leave for a later date. They’re nice, but I wanted to get the CPU core up and running as soon as possible, but that’ll be continued in the next part.

References

GameBoy CPU Manual

Wikipedia

GameBoy Opcodes