Scanline Effects

Pokemon Emerald features a wide variety of visual effects that stretch the limits of what a Game Boy Advance's hardware can achieve. Here are a few examples:

The player stands in a dark cave after having used Flash. A vignette effect blacks out the edges of the screen, with the player able to see the map only through a circular hole in the darkness.

The player begins a battle against Sidney, of the Elite Four. As part of the battle intro, the screen fades to white — not all at once, but rather with the fade spreading outward from the center of the screen.

The player gets into a wild battle while surfing through the seas. The screen is distorted with a wavy effect, as if the player is suddenly seeing the environment through turbulent water.

The Game Boy Advance is capable of brightening or darkening rectangular areas of the screen by a flat brightness adjustment. How is the first screenshot darkening a non-rectangular area of the screen to a pitch-black shade? How is the second screenshot lightening parts of the screen based on how far they are from the center? And what is even happening in that third screenshot?

The short answer is that using clever programming trickery, it's possible to interrupt the Game Boy Advance while it's physically controlling the colors of pixels on the screen. Each row of pixels, or scanline, gives you a chance to run a small, fast piece of code. This code can change the screen configuration out from under the Game Boy Advance, allowing you to desynch the configurations for two different rows of pixels.

What is a scanline effect?

It's possible to configure the Game Boy Advance's visual output using I/O registers, as documented on GBATEK. Normally, the effects of these registers apply to the entire screen.

It's also possible to configure the Game Boy Advance to run particular pieces of code in response to important hardware events. If the Game Boy Advance happens to be running any other code, that code will be completely paused — interrupted by the event handlers you've set. For that reason, these events are called interrupts. The two interrupts of note right now are "v-blank" and "h-blank."

The v-blank handler runs when the Game Boy Advance is about to start drawing a new frame to the screen — when it's about to start changing the colors of each physical pixel on the screen, with these changes being based on what's in VRAM and the I/O registers. The GBA changes pixels in row-major order — that is, row-by-row from top to bottom; within each row, left-to-right. The h-blank handler runs just before the GBA begins changing the pixels for each row.

You can combine these two features, the I/O registers and the interrupts, to take configuration values that normally apply to the entire screen, and instead have them apply only to specific rows of pixels on-screen. If you change the I/O registers during h-blank — if you do so quickly enough, before the GBA actually gets around to setting pixel colors — then you can produce a wide variety of effects. This explains the three screenshots above:

The first screenshot, showing the cave darkness/flash effect, takes advantage of the hardware-level ability to lighten or darken rectangular areas of the screen via I/O registers. Every scanline, the WIN0H register is updated; this register controls the left and right edges of one of these rectangles. The overworld code pre-computes the values to write for the entire screen, and specifies the I/O register to modify by configuring the scanline effect system.
- The GBA hardware can define four areas to lighten or darken: "rectangular window 0," "rectangular window 1," "sprite-shaped window," and "everything not in a window." The cave darkness effect sets the GBA up to darken everything not in a window, and then uses the scanline effect to resize rectangular window 0, turning it into a circular hole carved out of the darkness.
The second screenshot shows the battle transition for battling Sidney of the Elite Four. It also takes advantage of the ability to lighten areas of the screen, but this time, it only changes the brightness via the BLDY I/O register. The desired gradient is computed during each frame so that the brightness can be updated row-by-row while the next frame of animation is being presented on-screen.
The third screenshot shows the battle transition for some wild encounters while surfing. In this case, its h-blank interrupt is updating the vertical position of the three background layers used for the overworld, by writing to REG_BG1VOFS, REG_BG2VOFS, and REG_BG3VOFS. This has the effect of stretching and shrinking parts of the screen. Think about how the GBA draws background layers: if the GBA hardware draws row 0 of the background at row 0 on-screen, and then we shift the background down by one pixel before the GBA gets to the next row on-screen, then it'll draw row 0 of the background again at row 1 on-screen. When the background is moved down by one pixel during a scanline, the image stretches; when the background is moved up, a row of background pixels are skipped, shrinking that part of the image.

You can do all of this manually. However, it's relatively difficult. The v-blank and h-blank interrupts aren't given a lot of time to run, so you have to pre-compute all of the data you want to adjust per-scanline, and then use small chunks of code to copy that data into the right I/O registers per-scanline. Additionally, you actually need two chunks of pre-computed data: you need to double-buffer the pre-computed data so you can update it without the risk of screen tearing. Pokemon Emerald contains a scanline effect system used to coordinate the whole process.

DMA

Before I get to explaining how the scanline effect system is built, there's one other piece of GBA hardware technology we need to know about: DMA transfers. DMA stands for direct memory access. The basic idea is that the CPU can queue up to four high-speed data transfers to run in the background. From the CPU's perspective, these transfers happen concurrently with each other and with code execution. What's actually happening is that a higher-priority DMA transfer pauses all lower-priority DMA transfers, and all DMA transfers pause the CPU.

When queuing a DMA transfer, the CPU can set the amount of data to transfer, the source and destination locations, and the timing of the transfer. For example, you can queue a DMA transfer that copies X bytes from one location to another immediately, pausing the CPU until the whole transfer finishes. You can also queue a repeating DMA transfer: "I want you to copy data from one location to another, but I want you to copy it 2 bytes at a time, and I only want you to copy a pair of bytes during each h-blank. Keep doing that until I tell you to stop." These repeating transfers can be queued to happen every h-blank or every v-blank; the CPU will be paused while bytes are being copied, but will unpause outside of those little slices of time (which is why the CPU is able to tell DMA to stop). This can be used to transfer data without the need for an h-blank interrupt, if you only need to move a couple bytes per h-blank.

As mentioned, up to four DMA transfers can be queued at a time, with different priority levels. Specifically, there are four DMA "channels," numbered DMA0 through DMA3, with DMA0 being the highest-priority and DMA3 being the lowest-priority. Some channels have limitations on where they're allowed to write to; for example, DMA0 and DMA3 can't write to the console's sound memory, and DMA0, DMA1, and DMA2 can't write to flash memory.

The bare minimum approaches

There are two bare-minimum approaches for implementing a scanline effect, if you have to code one completely from scratch. Pokemon Emerald has a scanline effect system ready-made for you to use, and this system uses the first approach. However, that approach has some limitations that make it inadequate for a lot of visual effects, so some parts of the game, like battle transitions, use the second approach instead. It's worth knowing about both approaches: if you want something more complex than the scanline effect system can manage, then you'll have to implement the second approach yourself.

Without an h-blank interrupt
- Steps
  - Create two buffers for per-scanline data: a staging buffer, where you'll build your data; and a live buffer, to hold data once it's ready.
  - During normal CPU execution, update your per-scanline data in the staging buffer.
  - During v-blank...
    - Switch which buffer is considered the staging buffer, and which buffer is considered the live buffer. This means that the data that was being staged is now live.
    - Queue a repeating DMA transfer to copy data from the live buffer to the I/O registers at a rate of one unit per h-blank. This must be a high-priority transfer: DMA0 is your best option.
    - Copy the first scanline's data manually. Since you're queuing the repeating DMA transfer from v-blank, that transfer will miss the first h-blank for each frame.
- Benefits
  - You don't need an h-blank interrupt.
  - You can abstract the data transfers more easily. When the outside world wants a scanline effect, all it has to do is pre-compute that data, specify the destination I/O register, and then update that data per-frame. The outside world doesn't have to be involved in how precisely that data makes its way into the I/O registers per scanline.
- Costs
  - You can't use DMA0 for anything else while this effect is running.
  - You can basically only update one I/O register per scanline.
  - If you're building a general system for scanline effects and you choose to make it work this way, then you need to remember to deal with that "first h-blank" edge-case described above when you're writing the system's internal code. Things using the system don't have to know about that whole thing, but you as the hypothetical creator of the system do have to be careful about it.
With an h-blank interrupt
- Steps
  - Create two buffers for per-scanline data: a staging buffer, where you'll build your data; and a live buffer, to hold data once it's ready.
  - During normal CPU execution, update your per-scanline data in the staging buffer.
  - During v-blank, use a one-time DMA transfer to copy from the staging buffer to the live buffer. This can be a low-priority transfer e.g. via DMA3.
  - During h-blank, load data for the VCOUNT-th scanline from the live buffer, and write it to I/O registers.
- Benefits
  - You can compute as much data per scanline as you want. You're only limited by how much RAM you have to store the data in, and how quickly you can copy that data into the I/O registers during h-blank.
- Costs
  - The DMA channel that you use for this approach can't be used for anything else while the scanline effect is active. (You might be able to skip the need for DMA entirely if, instead of copying between buffers, you just switch which buffer is which, like the non-h-blank-interrupt approach does.)
  - Different effects will want to update different I/O registers — possibly multiple I/O registers, and in different ways. For optimal performance, each effect will need to define its own h-blank handler to perform the specific data copies that it wants.

Below, we'll cover the scanline effect system — Game Freak's implementation of the first approach — in more detail, and describe how to control it.

The scanline effect system

The scanline effect system is defined in scanline_effect.h and scanline_effect.c. It contains storage for double-buffered pre-computed data to copy per-scanline, and it contains code to automate a lot of the bookkeeping and data copying needed to perform scanline effects. The system can update a single u16 per scanline (i.e. a single I/O register), or a single u32 per scanline. The per-scanline data must be set up by whatever is asking the scanline effect system to run.

The scanline effect system alternates between using buffer 0 and buffer 1 as the data source for each frame. The system performs per-scanline data updates using DMA0, with the DMA transfer configured to run once per h-blank, and to increment the source address for each transfer. This means that the scanline effect system doesn't use an h-blank interrupt; however, it limits the system in two ways. The system configures the DMA transfer to move only a single u16 or u32 at a time, so it can only update one I/O register. Additionally, even if the system chose a different configuration, it would still only be able to update multiple I/O registers if those registers were located directly next to each other in memory.

The first screenshot above relies entirely on the scanline effect system to actually draw the visual effect. The second and third screenshots work a little differently: battle transitions will borrow the storage location used by the scanline effect system; but when it comes to actually running interrupts, updating I/O registers, and so on, battle transitions prefer to do that work themselves. The usual pattern within battle transitions is to always use scanline buffer 0 as a staging buffer for data that is "under construction;" once all data for the screen has been computed, DMA is used during v-blank to copy the pre-computed data from scanline buffer 0 to scanline buffer 1 en masse at high speed; and then an h-blank interrupt is used to actually update I/O registers per-scanline, with scanline buffer 1 being the "live buffer" from which the data is copied.

(Because battle transitions use their own h-blank interrupt, they can read the VCOUNT I/O register from inside that interrupt to know what scanline (what pixel Y-coordinate) is currently being drawn. This means that instead of using a "blind" repeating DMA transfer, they can just read gScanlineEffectRegBuffers[1][VCOUNT] and then write that value to the destination I/O registers.)

Controlling the effect

A scanline effect has to prep the data for a given frame of animation in advance. This data is double-buffered, with gScanlineEffectRegBuffers holding the two buffers used. Each buffer holds 960 bytes, or six bytes per scanline. (The scanline effect system can only update a single u16 (two bytes) or a single u32 (four bytes) per scanline, but as mentioned above, some other code uses custom scanline logic while still storing data in gScanlineEffectRegBuffers; this other code can update multiple values.)

gScanlineEffect stores the configuration and state of the current scanline effect:

Field	Description
`dmaSrcBuffers`	The source to copy from per scanline. Scanline effects are double-buffered, so this field is two pointers: one destination per buffer.
`dmaDest`	The destination to copy to per scanline.
`dmaControl`	A value to use for the `DMA0CNT_H` I/O register. The scanline effects header provides constants `SCANLINE_EFFECT_DMACNT_16BIT` and `SCANLINE_EFFECT_DMACNT_32BIT` for copying a 16-bit value or a 32-bit value per scanline.
`setFirstScanlineReg`	Internal. A function pointer used to update the first scanline. The scanline effect system queues DMA transfers from v-blank, so those transfers don't actually run until the second h-blank for a frame; for the first scanline, data must be transferred manually during v-blank. This function pointer carries out that transfer, using different functions for transferring a 16-bit value versus a 32-bit value.
`srcBuffer`	Scanline effects are double-buffered. This value indicates which buffer will be used for the next frame.
`state`	The current state of the scanline effect. 0 means the effect is disabled. 1 means the effect should start up. 3 means the effect is queued to stop.
`waveTaskId`	Internal. The scanline effect system offers a helper function, `ScanlineEffect_InitWave`, for sine-wave effects. This function is fire-and-forget, configuring the scanline effect and creating a task that belongs to the scanline effect system. The task ID is stored here.

You can set the dmaDest, dmaControl, and state by calling ScanlineEffect_SetParams.

You'll want to write your precomputed data directly into gScanlineEffectRegBuffers[gScanlineEffect.srcBuffer].

The ScanlineEffect_InitHBlankDmaTransfer function should be called during a v-blank interrupt handler. No other DMA0 transfers should be run during the current frame. This function queues the repeating DMA0 transfer mentioned above, and also handles switching between the two pre-computed data buffers (recall that the data is double-buffered; the scanline effect system alternates between buffers on each frame).

You can use ScanlineEffect_Stop to stop an ongoing scanline effect. You can use ScanlineEffect_Clear to reset all internal scanline state; however, you should only use this if you're sure an effect isn't currently running, and if you're sure a wave task doesn't exist, as this function just resets the state (it doesn't stop an ongoing effect; it just forgets everything about the last effect).

Go to top

Scanline Effects

What is a scanline effect?

DMA

The bare minimum approaches

The scanline effect system

Controlling the effect

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!