Static background in mode 2

John Tsiombikas (Nuclear / Mindlapse)
7 July 2020

Continuing from the first hack where I painted the screen red, the next step would be to display a static image on screen. The SNES PPU (Picture Processing Unit) supports 8 different video modes with a mix of capabilities and tradeoffs.

The "Retro Game Mechanics Explained" channel on youtube has a very nice overview of all the SNES video modes and their tradeoffs in these two videos:

In fact the whole SNES hardware playlist is highly recommended.

I decided to start with mode 2, which is very much a middle of the road mode, providing a cross-section of SNES graphics features, such as 2 background layers with a very typical 16 colors pixel format, and the ability to define per-column scrolling offsets which I'd like to try out next.

In order to show a picture on screen in mode 2, I had to do the following:

Write a tool that breaks an image into 8x8 tiles, ideally coalescing duplicates, converts the tiles into the native pixel format expected by the SNES PPU, generates a tilemap which reconstructs the image by referencing the correct tile for each screen cell, and outputs both of these along with the palette colors, as an assembly source file with a bunch of data declarations.
Establish a video RAM layout for the tiles and the tilemap, and let the PPU know where to find them by setting the corresponding registers.
Populate the palette with the 16 image colors.
Copy the tile data and the tilemap to the correct place in video RAM.
Route the correct "background" layer (BG1) to the "main" screen (there is a main and a secondary screen, and we can specify which graphics elements end up in each one, for the purpose of combining them with various operations).

Image converter tool: img2snes

img2snes reads a 4bpp PNG image (later I'll add support for 2bpp and 8bpp conversions as well), and outputs tile data, a tilemap, and a colormap in the form of ca65-compatible assembly data declarations, in the format required by the SNES PPU.

img2snes processes the image in 8x8 pixel blocks, emiting unique tiles, and tilemap entries as it goes along. If a tile is identical to some existing tile, the new tilemap entry refers to the existing tile. Finally the tile pixels are converted to the bitplane format expected by the SNES PPU.

4bpp tile data layout

In 4bpp modes each tile is laid out in memory in pairs of bitplanes, one pair after the other, and each pair of bitplanes is scanline-interleaved. So for instance if a tile is to be placed in video RAM starting from address 0, then the first bits of all 8 pixels of its first row form a byte placed at address 0, followed by the second bits of all 8 pixels of its first row (remember: video RAM is word-addressed, so address 0 contains two bytes). At address 1, a byte with the first bits of the second row are expected, followed by a bytw with the second bits of the second row. And so on until all 8 rows (16 bytes) of the first bitplane pair of the tile are placed. Then at address 8 (16 bytes from the start), the second pair of bitplanes is laid out in exactly the same way, but with the third and fourth bits instead of the first and second.

+----------+-----------------+-----------------+
|word addr | word first byte | word second byte|
+----------+-----------------+-----------------+
|    0     |  1st row bit 0  |  1st row bit 1  | \
|    1     |  2nd row bit 0  |  2nd row bit 1  | | bitplane pair [0,1]
|   ...    |      ...        |       ...       | |     (16 bytes)
|    7     |  8th row bit 0  |  8th row bit 1  | /
|    8     |  1st row bit 2  |  1st row bit 3  | \
|    9     |  2nd row bit 2  |  2nd row bit 3  | | bitplane pair [2,3]
|   ...    |      ...        |       ...       | |     (16 bytes)
|    15    |  8th row bit 2  |  8th row bit 3  | /
+----------+-----------------+-----------------+

Video RAM layout

I decided on the following video RAM layout:

word addr | data         | required size
----------+--------------+-------------------------------------------------
   0000h  | BG1 tilemap  | 2kb (32x32 tile refs, 2 bytes each)
   1000h  | tile data    | ~16kb (477 unique 8x8 4bpp tiles, 32 bytes each)
----------+--------------+-------------------------------------------------

Background data (tilemaps) can be placed with 1k-word granularity, by setting bits 2-7 of the corresponding BGxSC register. Bits 0-1 of the same register define the tilemap size:

00: 32x32 tiles
01: 64x32 tiles
10: 32x64 tiles
11: 64x64 tiles

Tile data can be placed with 4k-word granularity, by setting the 4 high bits of the video ram address in the appropriate nibble of the BG12NBA or the BG23NBA register (BG1 tiles at the low nibble of BG12NBA). Of course tile data areas may overlap and their tiles be shared by multiple backgrounds as needed.

Copying data to the video memory

Video memory is not mapped to the CPU address space. To write data into Video memory, first we have to write the destination (word) address to VMADDL/VMADDH and then write the data to be placed at that address to VMDATAL/VMDATAH. After that (assuming we've set up VMAINC correctly), the address is automatically incremented, and the next write to VMDATAL/VMDATAH transfers a word to the next address, making it easy and efficient to transfer blocks of data, without having to set the address for each word explicitly.

Writing to video memory can only be done during video blanking periods (horizontal blanking, vertical blanking, or forced blanking).

If data need to be updated continuously at runtime, it's best to use DMA to transfer all the necessary data as quickly as possible during vblank. But for this write-once at initialization time scenario, there's no reason to muck with all that. While keeping forced blanking on, I just run a copy loop on the processor to transfer the data a word at a time. Specifically I wrote a generic copy_vmem routine, which expects the following arguments on the stack: destination vmem address (in words), source address, and number of bytes to copy (must be even, since it always has to copy words ot video ram). Abusing the 65816 pea (push effective address) instruction to push arbitrary words onto the stack, makes calling it as easy as:

        pea logo_tiles_width * logo_tiles_height / 2  ; bytes to transfer
        pea logo_tiles                  ; source tile data pointer
        pea vmem_tiles_offs     ; vmem dest address: 4096 (in words)
        jsr copy_vmem
        rep #$20
        .a16
        pla
        pla
        pla
        sep #$20
        .a8

Aside: efficient stack frame access

The copy_vmem routine, uses a neat trick I've read here: http://6502org.wikidot.com/software-65816-parameters-on-stack

The 65816 allows the "zero page" (which is called "direct page" on the 65816) to be moved anywhere in the address space, by setting the D register. This makes it very convenient to use the D register essentially as a stack frame pointer, and access arguments on the stack and local variables simply with zero-page+index addressing modes.

Of course the D register must be saved first, and restored before returning from the function:

    ; function entry (assumes 16bit accumulator mode)
    phd         ; save D
    tsc         ; C (16bit accum) <- S (stack pointer)
    tcd         ; D <- C
    ...
    ; function exit
    pld         ; restore original D
    rts

Having done that, the stack frame (assuming no local variable space), becomes:

zero page |
address   |               item
----------+--------------------------------------
    00    | empty space for the next push
    01    | saved previous contents of D register
    03    | return address
    05    | \ arg1: destination vmem address
    06    | /
    07    | \ arg2: source address
    08    | /
    09    | \ arg3: size in bytes
    0a    | /
----------+--------------------------------------

So the complete video memory copy routine is the following:

        ; copy_vmem(vmem_offset, src, num_words)
copy_vmem:
        rep #$30        ; 16bit accumulator and index registers
        .a16
        .i16
        phd             ; save d
        tsc             ; and make it point to the stack
        tcd
        sep #$20        ; restore 8bit accum
        .a8

        lda #$80        ; auto incerment after wiriting high byte
        sta REG_VMAINC
        lda $5          ; dest addr low byte
        sta REG_VMADDL
        lda $6          ; dest addr high byte
        sta REG_VMADDH
        ldy #0
@loop:
        lda ($7),y      ; A <- *(srcptr + Y)  (low byte)
        sta REG_VMDATAL
        iny
        lda ($7),y      ; A <- *(srcptr + Y)  (high byte)
        sta REG_VMDATAH
        iny
        cpy $9          ; compare Y with arg3
        bne @loop

        pld
        sep #$10        ; back to 8bit index registers
        .i8
        rts

Wrapping it up

There's nothing more to it really. After copying the tilemap and tile data to video RAM, and setting up the palette (which I won't go into because it's trivial: write index to REG_CGADD, write data to REG_CGDATA twice, see cmap_loop in test.asm), the only thing left to do is set up the PPU correctly to display the image.

        setreg REG_BGMODE, $02      ; mode 2, 8x8 tiles
        stz REG_BG1SC               ; BG1 tilemap at 0
        setreg REG_BG12NBA, $1      ; BG1 tiles at offs 8kb (4k-words)
        setreg REG_TM, $1           ; main screen: BG1

Setting bit 1 of REG_TM ("Through Main register) routes the graphics from BG1 to the "main screen".

Really the most interesting thing about all this is how to convert the data to what the SNES PPU expects. Feel free to go through the img2snes source code for that, refering back to the diagrams above. Especially focus on the functions: wrtiles, wrtilemap and wrpalette in img2snes/src/main.c.

References

Back to SNES hacking notes index