On the Atari 2600

#1

Because of a yen to interface directly with a machine, I’ve been learning to program 6502 assembly language. When I told my colleague (who bears a strong resemblance to Keanu Reeves, apropos of nothing) he asked why not learn something I’m more likely to use, like x86 assembly.

I think the argument actually goes the opposite way: there is almost no reason for a developer to write x86 assembly these days - its practically a target created for relatively smart compilers. The 6502 and related processors, on the other hand, still have a lot of life in the homebrew scene (in particular, as gaming platforms, the Atari 2600 and the NES can hardly be said to be totally exhausted as creative platforms). So, I’m much more likely to write meaningful assembly language programs for those platforms than I am to write a meaningful program for an x86.

The essence of this argument, I suppose, is that to choose to write assembly in 2019 is as much an aesthetic choice as a technical one, and the aesthetics of the 6502 are more compelling by far than those of the x86.

6 Likes
#2

I’m using Hugg’s “Making Games for the Atari 2600” (for the most part). His books expects you to use 8BitWorkshop, which I have to admit is a pretty slick thing: an integrated development environment and emulator.

However, I don’t like web apps, so I’m using dasm and Stella (both of which can be installed via homebrew on the mac. My shitty makefile looks like this:

.PHONY: clean

%.run: %.bin
	Stella -format ntsc $*.bin

clean:
	rm *.listing *.bin

%.bin: %.asm
	dasm $*.asm -l$*.listing -f3 -v5 -o$*.bin

So building a specific file, for instance, rainbows.asm goes like this:

make rainbows.run

Which both compiles and executes the Stella emulator (this is a janky Make target, since it doesn’t produce an artifact, but I wanted something that I could run that would always execute the emulator.)

Here is the listing for rainbows.asm if you want to try this at home:

        processor 6502

        include "vcs.h"
        include "macro.h"
        org $f000

BGColor equ $81

Start
        CLEAN_START

NextFrame
        lda #2
        sta VBLANK
        sta VSYNC

        sta WSYNC
        sta WSYNC
        sta WSYNC

        lda #0
        sta VSYNC

        ldx #37
LVBlank
        sta WSYNC
        dex
        bne LVBlank

        lda #0
        sta VBLANK

        ldx #192
        ldy BGColor
LVScan
        sty COLUBK
        sta WSYNC

        iny
        dex
        bne LVScan

        lda #2
        sta VBLANK
        ldx #30

LVOver
        sta WSYNC
        dex
        bne LVOver

        dec BGColor
        jmp NextFrame


        org $fffc
        .word Start
        .word Start

#3

The 2600 is a strange thing. It only supports 2 sprites which the programmer must orchestrate line by line. In fact, positioning of the sprite in the horizontal direction depends on strobing the sprite register at just the right time as the beam scrolls across the screen. You literally wait for the start of a line, calculate the x position of the sprite, and then, when the sprite needs to show up, you write to the appropriate place in memory.

Sprites are, in a real sense, totally abstract from this deeply hardware dependent implementation. The ordinary thing is to read sprite data line by line from some memory location, but nothing really demands you do this.

Indeed, the famous “force field” effect in Yars’ Revenge:

38%20PM

Is now sort of obvious: just pick a random position in the ROM (all your code and data just live in a big, undifferentiated except by custom, pile of memory) and then, as you scan down the screen, copy bytes into the sprite and sprite color register.

4 Likes
#4

This makes my have the following flight of fancy: what if we imagine a sort of Atari 26000 which in many respects operates identically to a 2600 but which supports 20 such sprites on screen at the same time?

Are there ways to imagine a platform which embodies the aesthetics of the 2600 but liberated of the non-aesthetic limitations imposed by the hardware? Its a fun thought experiment.

You could imagine a 26000 which had 20 sprites, twice as many x pixels, a faster processor, and more ram, but the same fundamental design.

6 Likes
#5

Read Racing The Beam if you haven’t already. It’s great.

#6

I’ve read it and its good, though its been a few years. The NES book (I am Error) is good too, but not as good.

1 Like
#7

I have this book on 6502 programming for the Apple II:

6502 is pretty cool! I started writing a 6502 assembler in Racket but abandoned it. I may return to it some day.

One “practical” use for x86 assembly today is to use it to write DOS games. That’s about it.

#8

This book is actually pretty bad, which is good, because its forcing me to write a lot of assembly just to figure out what the hell he is talking about.

#9

I’ve always liked how they had to stop the music for the boss fight in this Mega Man demo.

More details and a download link here:

1 Like
#10

I’m debugging fine sprite positioning now. When you position a sprite by banging the right bit at the right moment, you’re limited by the fact that the TIA refreshes 3 times faster than the 6502, so for each cycle of the 6502 the TIA draws 3 pixels on one line.

You have to loop to burn time till your desired horizontal sprite position, but the tightest loop is 5 cycles. So that means you can only coarse position a sprite at a 15 pixel resolution.

The 2600 has a sprite fine positioning capability. The idea is you set the coarse position by waiting the appropriate amount of time, then you calculate the fine adjustment and set that.

When do you do all this? One possibility is that you calculate your coarse and fine position “off screen” (I’m presently doing it in the Overscan)

and storing the coarse and fine adjustment in memory.

Then, during the Vertical Blank (on the second to last scanline) I loop the coarse correction and then set the fine adjustment.

The fine adjustment involves a division by 15. The 6502 has neither native divide or multiply. To simulate them you need to loop and perform an add or a subtract. Seems like you can combine these two things so that your div loop also gives you the appropriate delay (it makes sense if you think about it for a bit).

Anyway, for didactic purposes, I’m calculating them elsewhere and setting them during the VBlank. But its not working correctly - I’m not seeing any fine adjustment:

08

The emulator I’m using actually has a great debugger, though:

Have to do real life now, though. I’ll keep you posted on this debugging!

5 Likes
#11

Things seem to more or less be working now:

The issue was that I was forgetting to strobe the HMOVE register, which tells the 2600 to respect the fine adjustment value I was apparently correctly calculating and applying.

There seems to be a few little things going on with the HBlank, though. Maybe I need to split up the code calculating the delay and the fine adjust onto two different scanlines? It seems like some code is leaking into a scanline up there. My intuitions are still underdeveloped here.

4 Likes
#12

My code is here, if anyone wants to see the raw assembly files:

#14

I was very fascinated with Atari 2600 coding in the early 2000s, though I was very bad at it at the time. But I was lurking on the mailing lists where research was actively happening and tricks were being discovered. AMA! Listen to my podcast!

The black line in the corner of your screenshot there is a basically-unavoidable artifact of how HMOVE works and is called the “HMOVE comb”. If you look at Atari screenshots and pay attention you’ll see them everywhere. Activision had a policy of hiding the effect by ensuring that the first 8 pixels of every scanline were always black by strobing HMOVE on every scanline.

It’s possible through serious deep black magick (as opposed to the normal everyday black magick required to program the 2600 at all) to perform an HMOVE in such a way that the comb does not appear, but not only is it difficult to pull off, I think it is potentially not portable to different revisions of the chip. If you want waaaaaay too much detail about why this happens, I believe “TIA Hardware Notes (A Small Opus on the TIA)” by Andrew Towers is the thing to read - he reverse-engineered the schematics to really dig into how everything worked and explained a bunch of weird edge case behaviours that were not well understood at the time.

5 Likes
#15

Woah! Thanks for the info! I’m also noticing that the top of my screen is bouncing up and down by one line. My meager understanding is that this is because I’m miscounting my VBLANKS but no count seems to work right.

#16

Also, I accidentally cloned JoustPong in Scheme:

#17

LOL. Tried to send you some token money but PayPal wasn’t having it.

Anyway, never seen a JoustPong variant w/ a gravity effect for the ball - interesting! (also it doesn’t feel like the ball is affected by the speed of the bat at collision, so the bat is effectively just a wall that is either there or not there?)

3 Likes
#18

Thanks for the attempt, but I don’t really need money anyway!

The bats do transfer their lateral momentum to the ball, but its a small effect. I mostly created Devil’s Pong so that I could open source it, since it uses the same basic implementation strategy as Corpse Wizard (which I can’t really open source without dealing with the assets).

#19

well, glad the JoustPong concept lives on in people’s hearts at any rate :smiley:

#20

Ok, I figured out what the bounciness was.

Rather than messing up the VBlank I was messing up the Overscan.

Here is the code listing:

        processor 6502

        include "vcs.h"
        include "macro.h"
        org $f000

BGColor equ $81
TColor  equ $82

SH      equ 9
YPos    equ $83
XPos    equ $84
XPosC   equ $85        
XPosF   equ $86       
        

Frame0
        .byte #%00000000;$40
        .byte #%01001000;$40
        .byte #%00100000;$40
        .byte #%00000000;$94
        .byte #%00110000;$94
        .byte #%00110000;$94
        .byte #%00000100;$40
        .byte #%00010010;$40
        .byte #%00000000;$40        
        
Start
        CLEAN_START
        lda #10
        sta YPos
        lda #0
        sta XPos

NextFrame
        lda #2
        sta VBLANK
        sta VSYNC

        sta WSYNC
        sta WSYNC
        sta WSYNC

        lda #0
        sta VSYNC

        ldx #35                 ; skip all but the last two v blank line
LVBlank
        sta WSYNC
        dex
        bne LVBlank
        
        ldx XPosC
        sta WSYNC
P1P
        dex
        bne P1P
        sta RESP0
        lda XPosF        
        sta HMP0
        sta WSYNC
        sta HMOVE

        lda #0
        sta VBLANK

        ldy BGColor             
        sty TColor              ; initialize color for scans
       
        ldx #192
LVScan
        sta WSYNC
        ldy TColor
        sty COLUBK
        iny
        sty TColor              ; store color for next frame

        txa
        sec
        sbc YPos
        cmp #SH
        bcc InSprite
        lda #0
        sta GRP0

Rt
        dex        
        bne LVScan

        sta WSYNC
        lda #2
        sta VBLANK
        ldx #28

LVOver
        sta WSYNC
        dex
        bne LVOver

        inc XPos
        lda XPos
        tax
        cmp #159
        bne Hop
        lda #0
        sta XPos
Hop
        txa
        sta WSYNC
        adc #68
        clc     
        ldx #0
        sec
PDiv    sbc #15
        inx
        bcs PDiv
        dex
        eor #7
        asl
        asl
        asl
        asl
        sta XPosF
        txa
        sta XPosC
        sta WSYNC
        inc YPos
        dec BGColor
        jmp NextFrame

InSprite
        tay
        lda Frame0,Y
        sta GRP0
        jmp Rt
        

        org $fffc
        .word Start
        .word Start

#21

For those less familiar with the details of 2600 hacking:

From the above diagram, you can see that one frame consists of 262 scanlines, of which only 192 are visible. Thirty-seven (plus three) occur, in some sense, before the screen is drawn and 30 occur after. The programmer is responsible for making sure they know what scanline they are on so that they can do things like adjust the background color, playfield graphics and sprites appropriately.

A scanline takes 76 CPU cycles to draw. In order to simultaneously track the current scanline and do potentially involved computations, the 2600 expects you to do your work in < 76 CPU cycle chunks and then sta WSYNC (write the A register to the memory location WSYNC).

sta WSYNC causes the processor to halt until the start of the next scanline. You have to be a little conservative, because some instructions take varying number of cycles, depending on the values they operate on. I suppose a very clever programmer could strobe WSYNC less than once per line, carefully tracking their cycles so as to absolutely maximize their use of the processor but that would be tricky.

The problem I had was that I was calculating too much on one of the scanlines in the OVERSCAN area and so my tracking of the number of scanlines was off, sometimes. The solution was to remove a few WSYNCs from my overscan loop and put them into that big calculation so that I never went over 76 cycles for a given chunk of calculation.