Time for part 2 in the series. Seemed like the time to do it as the CPU is the big focus of DMGe at the minute. The goal is to get the 256 byte on chip BIOS ROM to run from start to finish. It’s a fairly simple program with the purpose of initialising the hardware and checking that the Nintendo logo on the cartridge matches the logo stored in the BIOS ROM.
It does this by comparing each logo byte-for-byte.
The BIOS is really the simplest program for the GameBoy that we could possibly have. It essentially does the following when it runs:
Set VRAM to zero.
Initialise the stack at 0xFFFE.
Set the background palette.
Decompress and store cartridge logo data in VRAM.
Set the background tilemap.
Display the logo and scroll it to the centre of the screen.
Play the “po-ling” sound.
Compare the logo data on the cartridge to the logo data in the BIOS ROM. If it’s a match, finish running and transfer control to 0x100 (mapped to the cartridge), if it doesn’t match, jump into an infinite loop and hang. Note that the BIOS also unmaps itself from memory by writing 0x01 to 0xFF50.
So there you have it. Probably as simple as you can get on the GameBoy. At the minute, we’re stepping through each opcode and implementing it as needed, as that will allow us to start work on the GPU and get something running properly before we move on to emulating games.
Many emulators would just ignore the BIOS and start right at 0x100, but the goal with DMGe is to be more true to the original experience of playing the GameBoy. Therefore, the most important part of the BIOS is to ensure that the logo comparison succeeds (or else we won’t be playing many games 😦 ).
The magic happens in the following:
JR NZ, Addr_00E6
Just a brief explanation of this code. It’s fairly simple. It just does the following:
Create two pointers. One to the logo data in the cartridge (0x104) and one to the logo data in the BIOS ROM (0xA8):
LD HL, $0104
LD DE, $00a8
Load a byte from the logo data in the BIOS ROM (pointed to by DE), increment DE by one byte, compare it with the same byte from the cartridge (pointed to by HL) and if they match, increment HL to check the next byte. If at any point the bytes don’t match, the “CP (HL)” instruction won’t set the zero flag, and “JP NZ, $fe” will jump into an infinite loop and stop anything else from happening. The final piece of the code simply checks if we’ve compared every byte yet or whether we should repeat another loop.
JR NZ, Addr_00E6
That’s essentially the purpose of the GameBoy BIOS in a nutshell. Let’s move on to how we’ll actually run it.
Emulating the CPU
As stated in the previous installment to this series, the GameBoy CPU (pictured above) is an 8-bit Sharp LR35902. Many people refer to it as a Z80, but this isn’t strictly true. While it contains several Z80 specific instruction set enhancements over the Intel 8080, it doesn’t retain any of the registers that were introduced in the Z80. I suppose you could call it an amalgamation of the Intel 8080 and the Z80.
The way that we emulate the CPU in DMGe is an interpreter. We could use alternative, faster, ways of doing things such as dynamic recompilation, but I feel that it’s pretty unnecessary given the speed of most modern computers. It would only serve to make things more complicated than they have to be.
The emulation process is basically as follows:
Look up opcode in opcode table.
Call relevant function which emulates the opcode.
Store the cycles taken.
Increment the program counter.
Add cycles taken to total cycles.
You’ll notice that we deal with cycles. Don’t worry too much about this at the minute. It will be used later for timing. Specifically timing the GPU states to the CPU.
The GameBoy CPU has several registers which it uses for storing small amounts of data for fast access during execution. We first define a structure as a template for each register:
This allows us to access each register pair as a word (for example AF), or we can access the pair as two 8-bit registers.
We then use this structure to create another structure that actually holds our register definitions:
//Our CPU regisgers
//Our basic registers, AF, BC, DE, HL
Register16_t AF, BC, DE, HL;
Register16_t SP, PC;
That’s it really for the registers. Now that this has been covered we can move on to the opcodes.
This is by far the biggest task in developing an emulator. CPUs have A LOT of instructions. In order for most things to run, we need to implement all of them, and it isn’t an easy task. It’s very time consuming and finding a mistake/bug can be very difficult, but this is the part where we can start to watch GameBoy code run, and in my opinion, it’s worth it.
We need to start off by defining two opcode tables. One for standard opcodes and one for special opcodes (bit operations):
You’ll notice that one array is called “OPCodesCB”. This is because bit operation opcodes tend to be prefixed with 0xCB on the GameBoy. I’ll not go into detail on how we differentiate between standard opcodes and prefixed ones, but it’s essentially just a switch that detects when 0xCB is present at the program counter and calls an opcode from “OPCodesCB” instead of from the standard table.
We populate the opcode table line by line with each function. An example would be:
OPCodes[0x3E] = &c_DMGCPU::OPCode0x3E;
Really all we’re doing here is storing the address of a class member function in the array using an address-of operator (&). This allows us to call the correct function when we receive an opcode. This is also pretty simple:
Here, we’re using a dereference operator (*) to call the correct function. Basically, in English this would be “call the function whose address is stored at index ReadByte(PC) in OPCodes”.
So let’s look at the actual implementation of an opcode. Before we do this, we need to know what opcode we’re going to implement, and exactly what it does. There are several references available for this and I’ll link them in at the bottom of the post.
For this we’ll be implementing “LD A, d8”. This is quite a simple instruction that just loads an immediate 8-bit value stored in memory at PC + 1 into register A. The code is as follows:
//Load immediate 8-bit value into A.
DbgOut(DBG_CPU, VERBOSE_2, "LD A, d8");
Registers.AF.hi = MMU->ReadByte(Registers.PC.word + 1);
Clock.m = 2;
Clock.t = 8;
Registers.PC.word += 2;
All we’re doing is sending some debug text to stdio to tell us what the CPU is doing, and then doing the operation. We then set how many cycles the operation took and increment the program counter by two bytes, as the instruction is two bytes long “3E dd”.
That is really all there is to say about implementing the opcodes. It’s just a case of implementing each opcode and ensuring that your code accurately simulates each processor instruction. It’s also worthwhile taking a look at the GameBoy CPU Manual to ensure that you’re updating the flags correctly for arithmetic instructions also.
I feel that this is a topic which should also be covered. The purpose of this is not to be a tutorial on how to build an emulator, but more as an introduction to emulation. However, I think that this is an important point for would-be emulator programmers.
In almost any system that you come up against in emulation, there will be some sort of stack to deal with. If you have any processor which is capable of calling functions, a stack will be necessary. Let’s take a look at some code from the BIOS:
LD SP, $fffe
SP stands for “Stack Pointer”. It tells the CPU where to put the stack in memory and it is down to the program which is running to decide this. In this case, the BIOS sets the stack to 0xFFFE in memory. You might think that this would be a bad idea, as it is so close to the top of the addressable memory (remember we only have a 16-bit address bus), but in the GameBoy the stack grows downwards.
As an example of how the stack is used, we’ll take a look at calling and returning from a function/subroutine. Any time a program on the GameBoy wants to use a function it’ll use the “CALL a16” function (or maybe a conditional call, but we won’t go into that here). The CPU then needs to store the address of the instruction after the “CALL” onto the stack and jump to that function. This is how it is implemented in DMGe:
DbgOut(DBG_CPU, VERBOSE_2, "CALL a16");
//Write address of next instruction to the stack and decrement SP.
MMU->WriteWord(Registers.SP.word, Registers.PC.word + 3);
//We wrote two bytes, so decrement accordingly. (Stack grows downwards).
Registers.SP.word -= 2;
//Set PC to address of function.
Registers.PC.word = MMU->ReadWord(Registers.PC.word + 1);
//Set how many cycles it took us to complete this opcode.
Clock.m = 3;
Clock.t = 24;
As you can see, it’s quite simple. We write the address of the next instruction onto the stack, decrement the stack pointer by 2 as we wrote 2 bytes to the stack and jump to the function/subroutine.
We then also have the problem of how we get back to where we should be when the function is finished. This is done in most CPUs with a return operation. In GameBoy assembler it’s usually written as “RET”. Here’s the implementation from DMGe:
DbgOut(DBG_CPU, VERBOSE_2, "RET");
//Increment SP to find the return address.
Registers.SP.word += 2;
//And jump there.
Registers.PC.word = MMU->ReadWord(Registers.SP.word);
//Set many cycles we took to complete this operation.
Clock.m = 1;
Clock.t = 16;
This is more or less the opposite of the “CALL” opcode. We increment the stack pointer to find our return address, read that address into the program counter and we’re done!
The stack can also be used to temporarily store data by pushing and poping registers, but this works essentially the same way that calling functions does, and so I won’t cover it.
Conclusion and Useful Links
And that’s part 2 of the series. I hope you’re enjoying it so far. Next, I’ll be covering the MMU and GPU.
If you’re working on your own emulator (doesn’t necessarily need to be for the GameBoy) and you’re finding this helpful, or need some help, don’t hesitate to get in touch by leaving a reply to this post or using the contact form :).
Here are some of the references I’ve been using. They might be useful to anyone else working on a GameBoy emulator:
In this series, I’ll be covering the design and building of an SPI SRAM PMOD for the Digilent Basys2 FPGA Board.
This project came about more out of necessity than anything else. I want to design an 8-bit CPU on my FPGA board, but it has no RAM (other than the block ram in the FPGA, but it is very limited). Having searched and searched, I was unable to find a premade PMOD (the name that Digilent gives to expansion boards) that contained RAM, so I thought, “Hey! Why not make my own?”.
The first job was to source cheap, high-quality parts before I even attempted to design a PCB. I was lucky to find everything that I needed at Farnell. Lets have a look at our bill of materials:
As you can see, this isn’t exactly a very big project, which is good, because it accomplishes the task at hand without costing a fortune. The bill of materials cost is extremely low, working out at roughly £12.26, which already includes the cost of the PCB. I’ve found an excellent PCB fabricator at OSH Park. They do large runs with multiple boards in order to keep costs down and end up costing far less than most other fabricators.
The design itself was not complicated. It was more a matter of joining the right leads on the IC up to the right pins on the PMOD header. No huge amounts of electronic engineering involved. About the only thing that required a little thought was the choice of filter capacitor. Here’s the finished design:
You might notice in my designe that the “Chip Select” pin (pin 1) is tied low at all times. This is because I don’t plan on disabling the chip at any time, and also because my Basys2 PMOD header doesn’t have enough I/O space to connect it anywhere else.
This is as far as I am currently. After I receive my boards and components I’ll write another article showing it being assembled and tested. I’ll also include the eagle schematic and board files for download so you can build one yourself!
This is a short series I’ll be writing to document a GameBoy emulator that I’m working on (DMGe).
The project was created out of a merger between two projects.
WWGB: My own project.
ZGB: A project that I collaborated in on GitHub.
Now, I know that it’s been done time and time again. A new GameBoy emulator is nothing to get too excited about, but that’s not the point of this project. It’s a learning experience. Primarily, I want to learn more about embedded architecture, and you might learn something too by reading this series.
Why I Chose the GameBoy
Originally, I wrote a Chip8 emulator. This served as a great learning experience, but was by no means a good example of embedded system emulation. The Chip8 interpreter lacked a lot of things that are typical most bespoke computer systems. Namely the following:
No real MMU or memory mapping.
No clock cycles/machine cycles or timing to consider (apart from 60Hz timers).
Very limited opcode set.
No variable length instructions.
The GameBoy, on the other hand, has all of these things and a simple enough opcode set too. Another key area that interested me was the memory bank switching circuits for accessing all of the cartridge data within a 16-bit address space.
Implementing an emulator is never an easy task. You’re essentially rebuilding a system from the ground up in most cases. The first place to start then is research. You have to know an awful lot about the system you’re planning to emulate before you even start to code.
I started out by finding what technical documentation I could on the Internet. Being such an old console, there is no lack of it. There has been plenty of time for people to reverse engineer the console and document it since its release. Links are included to my reference materials at the bottom of this post.
These are the core parts of the GameBoy that I would have to consider in order to get started.
CPU – Custom Sharp Core (LR35902). Essentially an Intel 8080 variant that has some of the improvements introduced in the Z80 present. Two sets of opcodes. Those that are non-prefixed, and those that are prefixed with 0xCB. The 0xCB opcodes deal primarily with bit manipulation, while the others deal with typical processor operations such as load immediate, jump etc.
MMU – The GameBoy has a 16-bit address space, allowing for a total of 65535 bytes to be addressed, although in reality the console does not have this much memory to address. For example, reading from 0xC000 -> 0xE000 will return the same values as reading from 0xE000 -> 0xFE00 as the internal 8kB of RAM is mirrored in both these locations. It also possesses specific locations for switchable ROM and RAM banks, VRAM, sprite attribute memory, interrupts and general I/O.
Things like graphics and sound I decided to leave for a later date. They’re nice, but I wanted to get the CPU core up and running as soon as possible, but that’ll be continued in the next part.
As of Linux Kernel 3.19, Takashi’s work (GitHub) has been merged in. This means that on any recent Linux installation, installing the drivers shouldn’t be necessary. You can skip right to installing FFADO from source and setting up a jack sink in pulseaudio.
This is something that I wanted to do for quite a while. I’ve never really used Linux for music production because I prefer Windows and OSX software such as Pro Tools, but it’s nice to be able to listen to music while I code. And Linux is so much nicer to develop code for than Windows. This tutorial will focus on setting the desk up on a Linux Mint/Ubuntu environment.
So first off, what we are going to do is install a new kernel module which allows our Linux system to communicate with the ProjectMix (thanks to Takashi Sakamoto for that), then create a Jack sink into which we feed all of our PulseAudio sound. We’ll also be using FFADO (http://www.ffado.org/) to control the internal mixer in the ProjectMix.
We’re going to need several things installed before we even begin to set this up. First off, you’ll need the latest version of “Git”. Open up a terminal and install it with the following command:
sudo apt-get install git
We’ll also need to set up some build dependencies for FFADO. This is going to involve adding some source repositories. Open up /etc/apt/sources.list as root with the following:
sudo gedit /etc/apt/sources.list
Add the following lines to it. On Ubuntu, replace “saucy” with whatever version you are using. On Mint, I’ve found that it really doesn’t matter as far as FFADO build dependencies go.
We’ll start by grabbing the code for the ProjectMix driver module from GitHub and the FFADO source code from ffado.org. I’d also recommend that you create a “ProjectMix” folder to keep everything tidy.
git clone https://github.com/takaswie/snd-firewire-improve.git
tar -xvzf libffado-2.2.1.tgz
Okay, now that we have the source code and it’s extracted. We have to build and install it. We’ll start with the driver module:
That’s the main step complete. ALSA can now communicate with the ProjectMix. You can actually test this by rebooting your machine, and you will hear “pops” coming from the speakers as the driver sets the sample rate of the sound card.
This would now be a good opportunity to install qjackctl, as you will need it to configure the sample rate and routing of the sound card in future.
sudo apt-get install qjackctl
And with that, it’s time to install FFADO. It will have nothing to do with streaming audio to the device, but it is necessary for controlling the ProjectMix internal mixer. You can build and install it with the following commands.
sudo scons install
At this point it would be a good idea to check that you can stream audio to your device. Reboot and start up qjackctl. Go into settings and select ProjectMix as your interface under ALSA, then press the start button. Open up ffado-mixer, either in a terminal window or just click the icon in your menu, then set your output to “Aux 1/2”. Make sure your speakers are plugged into outputs 1 and 2 on the device and put the volume fader in ffado-mixer to a reasonable level. You can test the audio output by opening up any audio software that supports Jack and playing some audio through it.
Hear some sound? Great, let’s continue and set up your Jack sink for PulseAudio (this will allow desktop applications to send sound to the card). Don’t hear anything or are you having any other issues? Leave a comment below and I’ll do my best to help.
PulseAudio and Jack
Admittedly this isn’t the most ideal way to do things, but until Linux has something like CoreAudio for OSX, it will have to do. We’ll start by killing the PulseAudio service until we finish setting everything up or else it will just get in the way. This is quite simple to do.
Start by opening your PulseAudio client configuration as root:
sudo gedit /etc/pulse/client.conf
Add the line “autospawn = no”. Save it, but don’t close it yet.
Open another terminal window/tab and enter the following command:
This will stop the service while we install the jack-sink module. Install it with the following command:
sudo apt-get install pulseaudio-module-jack
You’ll now need to edit your PulseAudio configuration to load the new module when it starts. Open up your configuration file:
sudo gedit /etc/pulse/default.pa
Add the following line:
Save it and close. Finally, go back to your “client.conf” and erase the “autospawn=no” line that we added earlier, save it and then close.
The Moment of Truth
That’s the setup completed. Close everything and reboot your computer.
Don’t panic if you don’t hear any sound immediately. Make sure you go into your sound settings and set the output to the jack sink that we installed. It should appear just like any other soundcard that you have. Also, make sure that you open ffado-mixer and set the output to “Aux 1/2”.
Test the sound output by loading up any application that uses PulseAudio. Even playing a YouTube video will do. Hopefully this will be all that you need to do. It’s also worth noting that if you do wish to use your ProjectMix for audio recording/mixing on Linux, applications like Ardour will work just fine alongside your PulseAudio sink.
If you have any issues or concerns, comment below and I’ll do my best to help.