Because of a yen to interface directly with a machine, I’ve been learning to program 6502 assembly language. When I told my colleague (who bears a strong resemblance to Keanu Reeves, apropos of nothing) he asked why not learn something I’m more likely to use, like x86 assembly.
I think the argument actually goes the opposite way: there is almost no reason for a developer to write x86 assembly these days - its practically a target created for relatively smart compilers. The 6502 and related processors, on the other hand, still have a lot of life in the homebrew scene (in particular, as gaming platforms, the Atari 2600 and the NES can hardly be said to be totally exhausted as creative platforms). So, I’m much more likely to write meaningful assembly language programs for those platforms than I am to write a meaningful program for an x86.
The essence of this argument, I suppose, is that to choose to write assembly in 2019 is as much an aesthetic choice as a technical one, and the aesthetics of the 6502 are more compelling by far than those of the x86.
I’m using Hugg’s “Making Games for the Atari 2600” (for the most part). His books expects you to use 8BitWorkshop, which I have to admit is a pretty slick thing: an integrated development environment and emulator.
However, I don’t like web apps, so I’m using dasm and Stella (both of which can be installed via homebrew on the mac. My shitty makefile looks like this:
So building a specific file, for instance, rainbows.asm goes like this:
make rainbows.run
Which both compiles and executes the Stella emulator (this is a janky Make target, since it doesn’t produce an artifact, but I wanted something that I could run that would always execute the emulator.)
Here is the listing for rainbows.asm if you want to try this at home:
processor 6502
include "vcs.h"
include "macro.h"
org $f000
BGColor equ $81
Start
CLEAN_START
NextFrame
lda #2
sta VBLANK
sta VSYNC
sta WSYNC
sta WSYNC
sta WSYNC
lda #0
sta VSYNC
ldx #37
LVBlank
sta WSYNC
dex
bne LVBlank
lda #0
sta VBLANK
ldx #192
ldy BGColor
LVScan
sty COLUBK
sta WSYNC
iny
dex
bne LVScan
lda #2
sta VBLANK
ldx #30
LVOver
sta WSYNC
dex
bne LVOver
dec BGColor
jmp NextFrame
org $fffc
.word Start
.word Start
The 2600 is a strange thing. It only supports 2 sprites which the programmer must orchestrate line by line. In fact, positioning of the sprite in the horizontal direction depends on strobing the sprite register at just the right time as the beam scrolls across the screen. You literally wait for the start of a line, calculate the x position of the sprite, and then, when the sprite needs to show up, you write to the appropriate place in memory.
Sprites are, in a real sense, totally abstract from this deeply hardware dependent implementation. The ordinary thing is to read sprite data line by line from some memory location, but nothing really demands you do this.
Indeed, the famous “force field” effect in Yars’ Revenge:
Is now sort of obvious: just pick a random position in the ROM (all your code and data just live in a big, undifferentiated except by custom, pile of memory) and then, as you scan down the screen, copy bytes into the sprite and sprite color register.
This makes my have the following flight of fancy: what if we imagine a sort of Atari 26000 which in many respects operates identically to a 2600 but which supports 20 such sprites on screen at the same time?
Are there ways to imagine a platform which embodies the aesthetics of the 2600 but liberated of the non-aesthetic limitations imposed by the hardware? Its a fun thought experiment.
You could imagine a 26000 which had 20 sprites, twice as many x pixels, a faster processor, and more ram, but the same fundamental design.
This book is actually pretty bad, which is good, because its forcing me to write a lot of assembly just to figure out what the hell he is talking about.
I’m debugging fine sprite positioning now. When you position a sprite by banging the right bit at the right moment, you’re limited by the fact that the TIA refreshes 3 times faster than the 6502, so for each cycle of the 6502 the TIA draws 3 pixels on one line.
You have to loop to burn time till your desired horizontal sprite position, but the tightest loop is 5 cycles. So that means you can only coarse position a sprite at a 15 pixel resolution.
The 2600 has a sprite fine positioning capability. The idea is you set the coarse position by waiting the appropriate amount of time, then you calculate the fine adjustment and set that.
When do you do all this? One possibility is that you calculate your coarse and fine position “off screen” (I’m presently doing it in the Overscan)
and storing the coarse and fine adjustment in memory.
Then, during the Vertical Blank (on the second to last scanline) I loop the coarse correction and then set the fine adjustment.
The fine adjustment involves a division by 15. The 6502 has neither native divide or multiply. To simulate them you need to loop and perform an add or a subtract. Seems like you can combine these two things so that your div loop also gives you the appropriate delay (it makes sense if you think about it for a bit).
Anyway, for didactic purposes, I’m calculating them elsewhere and setting them during the VBlank. But its not working correctly - I’m not seeing any fine adjustment:
The emulator I’m using actually has a great debugger, though:
The issue was that I was forgetting to strobe the HMOVE register, which tells the 2600 to respect the fine adjustment value I was apparently correctly calculating and applying.
There seems to be a few little things going on with the HBlank, though. Maybe I need to split up the code calculating the delay and the fine adjust onto two different scanlines? It seems like some code is leaking into a scanline up there. My intuitions are still underdeveloped here.
I was very fascinated with Atari 2600 coding in the early 2000s, though I was very bad at it at the time. But I was lurking on the mailing lists where research was actively happening and tricks were being discovered. AMA! Listen to my podcast!
The black line in the corner of your screenshot there is a basically-unavoidable artifact of how HMOVE works and is called the “HMOVE comb”. If you look at Atari screenshots and pay attention you’ll see them everywhere. Activision had a policy of hiding the effect by ensuring that the first 8 pixels of every scanline were always black by strobing HMOVE on every scanline.
It’s possible through serious deep black magick (as opposed to the normal everyday black magick required to program the 2600 at all) to perform an HMOVE in such a way that the comb does not appear, but not only is it difficult to pull off, I think it is potentially not portable to different revisions of the chip. If you want waaaaaay too much detail about why this happens, I believe “TIA Hardware Notes (A Small Opus on the TIA)” by Andrew Towers is the thing to read - he reverse-engineered the schematics to really dig into how everything worked and explained a bunch of weird edge case behaviours that were not well understood at the time.
Woah! Thanks for the info! I’m also noticing that the top of my screen is bouncing up and down by one line. My meager understanding is that this is because I’m miscounting my VBLANKS but no count seems to work right.
LOL. Tried to send you some token money but PayPal wasn’t having it.
Anyway, never seen a JoustPong variant w/ a gravity effect for the ball - interesting! (also it doesn’t feel like the ball is affected by the speed of the bat at collision, so the bat is effectively just a wall that is either there or not there?)
Thanks for the attempt, but I don’t really need money anyway!
The bats do transfer their lateral momentum to the ball, but its a small effect. I mostly created Devil’s Pong so that I could open source it, since it uses the same basic implementation strategy as Corpse Wizard (which I can’t really open source without dealing with the assets).
Rather than messing up the VBlank I was messing up the Overscan.
Here is the code listing:
processor 6502
include "vcs.h"
include "macro.h"
org $f000
BGColor equ $81
TColor equ $82
SH equ 9
YPos equ $83
XPos equ $84
XPosC equ $85
XPosF equ $86
Frame0
.byte #%00000000;$40
.byte #%01001000;$40
.byte #%00100000;$40
.byte #%00000000;$94
.byte #%00110000;$94
.byte #%00110000;$94
.byte #%00000100;$40
.byte #%00010010;$40
.byte #%00000000;$40
Start
CLEAN_START
lda #10
sta YPos
lda #0
sta XPos
NextFrame
lda #2
sta VBLANK
sta VSYNC
sta WSYNC
sta WSYNC
sta WSYNC
lda #0
sta VSYNC
ldx #35 ; skip all but the last two v blank line
LVBlank
sta WSYNC
dex
bne LVBlank
ldx XPosC
sta WSYNC
P1P
dex
bne P1P
sta RESP0
lda XPosF
sta HMP0
sta WSYNC
sta HMOVE
lda #0
sta VBLANK
ldy BGColor
sty TColor ; initialize color for scans
ldx #192
LVScan
sta WSYNC
ldy TColor
sty COLUBK
iny
sty TColor ; store color for next frame
txa
sec
sbc YPos
cmp #SH
bcc InSprite
lda #0
sta GRP0
Rt
dex
bne LVScan
sta WSYNC
lda #2
sta VBLANK
ldx #28
LVOver
sta WSYNC
dex
bne LVOver
inc XPos
lda XPos
tax
cmp #159
bne Hop
lda #0
sta XPos
Hop
txa
sta WSYNC
adc #68
clc
ldx #0
sec
PDiv sbc #15
inx
bcs PDiv
dex
eor #7
asl
asl
asl
asl
sta XPosF
txa
sta XPosC
sta WSYNC
inc YPos
dec BGColor
jmp NextFrame
InSprite
tay
lda Frame0,Y
sta GRP0
jmp Rt
org $fffc
.word Start
.word Start
For those less familiar with the details of 2600 hacking:
From the above diagram, you can see that one frame consists of 262 scanlines, of which only 192 are visible. Thirty-seven (plus three) occur, in some sense, before the screen is drawn and 30 occur after. The programmer is responsible for making sure they know what scanline they are on so that they can do things like adjust the background color, playfield graphics and sprites appropriately.
A scanline takes 76 CPU cycles to draw. In order to simultaneously track the current scanline and do potentially involved computations, the 2600 expects you to do your work in < 76 CPU cycle chunks and then sta WSYNC (write the A register to the memory location WSYNC).
sta WSYNC causes the processor to halt until the start of the next scanline. You have to be a little conservative, because some instructions take varying number of cycles, depending on the values they operate on. I suppose a very clever programmer could strobe WSYNC less than once per line, carefully tracking their cycles so as to absolutely maximize their use of the processor but that would be tricky.
The problem I had was that I was calculating too much on one of the scanlines in the OVERSCAN area and so my tracking of the number of scanlines was off, sometimes. The solution was to remove a few WSYNCs from my overscan loop and put them into that big calculation so that I never went over 76 cycles for a given chunk of calculation.