Sunday, July 27, 2014

Calculating primes numbers on a C64

I’ve been (very) slowly working my way through various project Euler problems as a side project. For some reason, calculating primes became a fascination for me. I eventually wrote a fast, one line python script that uses a sieve. Not long after that, I decided to revive my C64 hobby. 

I’ve had a Commodore since I was 10 or 11.. Started writing assembly when I was 14 or so, and soon discovered the world of demos in the late 80’s. I’ve always wanted to do demos, and even coded a few simple ones as a teenager. 

So coming back to the C64 after 25 years, refreshing my 6502 limited coding skills, I set to work. I set a challenge for myself. Could I get this C64 to calculate the prime numbers up to 1,000,000. At first glance, this seems impossible. Then on a lark, I decided to add another challenge: Could I do it in less than one minute.

Firstly, the C64 only has 65536 bytes of memory total. There are 74,000+ primes less than 1,000,000. How would I store them? Even the amount of processing seems to be too much. Having to do multi-byte math on an 8bit processor. Looping over the data hundreds of times. I wasn’t too concerned by the time. I knew if I could get everything to fit in memory, given enough time, it could calculate it.

The first challenge was storing the data. Going to the disk drive for swap is notoriously slow. First thing is first, can I eliminate some of the problem set? Why yes. I can cut the problem set in half, by only needed to look at the odd numbers. So now we’re down to 500,000 numbers. Next on the elimination list is anything divisible by 5. We already eliminated anything that ends in zero, so now we eliminate anything ending in 5. This is another 100,000 numbers. So we are down to 400,000. Still way exceeding the limits of the C64.

My break through came when I figured out that I do not have to store the actual numbers. I just need something to represent each number. In this case, I need to represent if a number is prime or not. A boolean variable. A bit. Which means I can store 8 numbers in a single bit. There’s 400,000 numbers I need to store, and at 8 numbers per byte, I need 50,000 bytes. Yes folks, I can fit the data into the C64’s memory!

To access a particular bit, I used two different numbers. One is an offset, and another is the index into the byte. Each byte contains 8 numbers, but in theory actually represents 20 numbers, but with the unneeded ones removed. So the index is always under 20.

To calculate the primes, I’m using a sieve. You can read about it here: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes

In order to save time doing loop addition in order to multiply (remember, the 6502 doesn’t have multiply or divide instructions), I decided to pre calculate some stuff.  What I noticed is that there’s a pattern to the multiplication. Starting with the prime we’re working with, we actually add 2 times the prime (2 X prime) to get the next number, which will not be prime. This skips the numbers that will result in even numbers. The resulting number is really (prime X 3). If we added the (2 X prime) again, we get the result of (5 X prime). We don’t really need this, since we already eliminated all stuff time 5 in our storage. So we add the (2 X prime) again, which really means that we’re adding (4 X prime). The result is (7 X prime). Again, we add (2 X prime) to get (9 X prime). To loop back around, we go one more time, (2 X prime) to give us (11 X prime). If you continue this pattern, you’ll notice that you’re adding like this: 2X, 4X, 2X, 2X.. Here’s a couple examples:

prime = 3
3 + (2 x prime) = 9
9 + (4 x prime) = 21
21 + (2 x prime) = 27
27 + (2 x prime) = 33 (loop around to beginning)
33 + (2 x prime) = 39
39 + (4 x prime) = 51
etc...
prime = 7
7 + (2 x prime) = 21
21 + (4 x prime) = 49
49 + (2 x prime) = 63
63 + (2 x prime) = 77 (loop around top beginning)
77 + (2 x prime) = 91
91 + (4 x prime) = 119
etc...

And so on. So what I did at the beginning of each prime number, was to pre-calculate (2 x prime) and (4 x prime). Store that into a table that looks like this:

(2 x prime), (4 x prime), (2 x prime), (2 x prime)

Due to the way I’m storing the numbers, I’m actually using 3 bytes. 2 bytes for the offset, and a single byte for the index. So when I do my addition, I add the table’s index to the current value’s index, check to see if it carries, if so, increase the offset by one, then add the table’s offset. That gives me the next value I need to turn off.

I start with a prime of 3, pre-calculate my adding table described above, and then start adding. On each add, I turn off a bit. I keep adding until I equal or exceed a value of 1,000,000. In this case, that is when the offset is greater than 50,000.

Once I do that, I search the data for the next prime. In this case it’s 7. Then when it searches after 7, it see that the 9 is not prime, and continues to 11. And so on and so on until we hit the square root of 1,000,000, which is 1,000.

So thats the general algorithm that I used.

Some implementation details. First, I didn’t write the  division routine. I used one that I found online, and then modified it slightly. I didn’t need the full 32 bits that it was calculating, and I also didn’t need the error check. The routine did exactly what I wanted though, it allowed me to divide by 20.

Why 20? Because thats the number of ‘numbers’ stored in a byte. This means that the quotient is my offset. The remained tell me which bit I need to turn off, but how do I go from 20 to 8 bits? With a table:

.byte $00,          // 0
      %11111110,    // 1
      $00,          // 2
      %11111101,    // 3
      $00,          // 4
      $00,          // 5
      $00,          // 6
      %11111011,    // 7
      $00,          // 8
      %11110111,    // 9
      $00,          // 10
      %11101111,    // 11
      $00,          // 12
      %11011111,    // 13
      $00,          // 14
      $00,          // 15
      $00,          // 16
      %10111111,    // 17
      $00,          // 18
      %01111111     // 19

The reminder from the division is the index to this table. This tells me exactly which bit I need to work with. I look at the data in the table via the offset, and AND it with the value from this table. Then store it back in the data at the offset. 

To get the speed, I played the ‘follow the carry bit’ game. I looked through my code for places where I could remove a CLC(CLear Carry) or SEC (SEt Carry) before an ADC (ADd with Carry) or SBC(SuBtract with Carry). Sometimes I reordered something so I didn’t have to do one of those. Every one I removed was 2 cycles per loop iteration.

At one point in time, I had some self modifying code. I eventually got rid of it, because it wasn’t giving me enough, and I needed to the space.

Speaking of space, this whole implementation takes 1K. Thats 1024 bytes. That also includes a summing routine that adds all the prime numbers together after the sieve is complete. I needed to keep it under 1K, because of the memory layout. If it was any larger, I would have to play games moving pieces of the app around in memory, turning off the kernel and I/O pages. It was bad enough i turned off BASIC.

All in all, the sieve run in 55 seconds. Not too shabby for a 30 year old computer running at 1Mhz. I’m sure some of the demo scene people could get it faster, probably a lot faster. This happens to be my first 6502 assembly program in nearly 25 years.

And now the source. For my development environment, I use my MacBookPro with OSX, Kick Assembler, and VICE. I’ll also show my makefile, so others can build this easily. After I got it working, I transfer it to my real, stock, c64, and ran it there as a verification. See my other blog post on how I transferred it. Also, here’s a video of it running on my machine: https://www.youtube.com/watch?v=bl8DUJAPyFU

Makefile:

debug : disk
        # generate a monitor file from the symbol file
        awk -F' |=\\$$' '{print "al " $$3 " ." $$2}' primes.sym > primes.mon

        # Read the source file, and add any 'vars' to the monitor file
        grep "^.var.*\\$$" primes.s | awk -F' | = \\$$' '{print "al " $$3 " ." $$2} ' >> primes.mon

        echo "break .break" >> primes.mon
        echo "break .break2" >> primes.mon
        x64.app/Contents/MacOS/x64 -ntsc -warp -moncommands primes.mon primes.d64

disk : primes
        x64.app/Contents/MacOS/c1541 -format blah,00 d64 primes.d64 -attach primes.d64 -write primes.prg primes

primes : primes.s
        java -jar kick/KickAss.jar primes.s

Source (Kick Assmebler code):

// Prime number finder. First version will seive the primes up to 100.

// A basic loader. Generates:
// 10 SYS (2064)

.pc = $0801 "Basic upstart"
:BasicUpstart(start)

// A few vars
.var bits = $0c00
.var end_bits = bits + $C350 // 50000 bytes
.var curr = $19 // some random place in the zero page that seems safe
.var curr_index = $1e
.var curr_mask_index = $21

.var adder_index = $02
.var bit_mask_index = $41
.var bit_index = $43

.var index_adder_1 = $57
.var index_adder_2 = $59
.var index_adder_3 = $5b
.var index_adder_4 = $5d

.var mask_adder_1 = $5f
.var mask_adder_2 = $60
.var mask_adder_3 = $61
.var mask_adder_4 = $62

.var BIT7 = $fb
.var BIT3 = $fc

.var calculating_location = $07b0

//.var bit_mask = $87 // 20 bytes!

.var divide_amount = $14

// Stuff for divide routine
.var divisor = $d8 // 2 bytes
.var dividend = divisor + 2 // 4 bytes $da
.var div_scratch = dividend + 4 // $de
.var div_carry = div_scratch + 1  // $df
.var quotient = dividend
.var remainder = dividend + 2

.var display_msg_work = $e0
.var display_msg_ptr = $e3

.var index_and_mask_to_hex_work_index = $e4
.var index_and_mask_to_hex_work_hex = $e7
.var index_and_mask_to_hex_work_mask = $eb
.var hex_lo = $ec
.var hex_hi = $ed

.var bit_mask = $87

.pc = $0810 // 2064, Enough room over the BasicStart

start:
    // Prep some vars
    lda #$80
    sta BIT7
    lda #$08
    sta BIT3

    // Turn off the basic rom 
    lda #$36
    sta $01

    // Back up the zero page, so I can reload basic afterwards
    // This is going into the IO area, but all writes go to RAM, no
    // matter how things area configured
    ldx #$ff
copy_zp:
    lda $00,x
    sta $e000,x
    dex
    bne copy_zp

    // Black out the screen
    lda #$00
    sta $d021
    sta $d020

    // clear the screen
    ldx #$00
    lda #$20
clear_screen:
    dex
    sta $0400,X
    sta $0500,X
    sta $0600,X
    sta $0700,X
    bne clear_screen

    // Display the message
    lda #$17 // Y
    pha
    lda #$0a // X
    pha
    lda #[clearing_memory_msg_end - clearing_memory_msg] // length
    pha
    lda #<clearing_memory_msg
    pha
    lda #>clearing_memory_msg
    pha
    
    jsr display_msg

    // Clear some memory
    lda #$FF
    ldx #>end_bits
outer_clear_mem:
    stx clear_mem + 3 // self modding code.
    ldy #$00
clear_mem:
    dey
    sta bits,y // Self modding code.
    bne clear_mem
    
    dex
    cpx #>bits - 1
    bne outer_clear_mem

    // Copy 20 bytes for the bit mask.
    ldx #$00
copy_table:
    lda bit_mask_table, x
    sta bit_mask, x
    inx
    cpx #$14 // compare to 20
    bne copy_table

    lda #divide_amount
    sta divisor
    lda #$00
    sta divisor + 1

    // We know that 1 is not prime, so lets turn that bit off now.
    lda #$FE
    sta bits

    // Some initialization
    lda #$00
    sta bit_index

    lda #$03
    sta bit_mask_index

    lda #>bits
    sta bit_index + 1

    // Start with 3 as the first number to seive
    lda #$03
    sta curr
    lda #$00
    sta curr + 1
    sta curr + 2
    sta curr + 3
    sta curr + 4

    // Display the message
    lda #$17 // Y
    pha
    lda #$0a // X
    pha
    lda #[calculating_msg_end - calculating_msg] // length
    pha
    lda #<calculating_msg
    pha
    lda #>calculating_msg
    pha
    
    jsr display_msg

next_number:
    // Copy the number to the accum.
    
    // Add the 'curr' to the accum num to get a 2x of the curr in 
    // accum. Store that in adder_1, adder_3 and adder_4
    lda curr
    sta dividend

    lda curr + 1
    sta dividend + 1

    lda curr + 2
    sta dividend + 2

    // Save these so that they are zero
    lda #$00
    sta dividend + 3

    jsr divide

    // Save the quotient to the curr_index, for later use
    lda quotient
    sta curr_index
    lda quotient + 1
    sta curr_index + 1

    // Save the remainder to the curr_mask_index, for later use
    lda remainder
    sta curr_mask_index

    // Now multiply curr_index and curr_mask_index by 2 to get the adders
    asl // remainder
    cmp #$14
    bcc mask_adder_skip_carry_1
    sbc #$14
    sec  // Make sure the carry flag is set for the rol.
mask_adder_skip_carry_1:
    sta mask_adder_1
    sta mask_adder_3
    sta mask_adder_4
    lda quotient
    rol
    sta index_adder_1
    sta index_adder_3
    sta index_adder_4
    lda quotient + 1
    rol
    sta index_adder_1 + 1
    sta index_adder_3 + 1
    sta index_adder_4 + 1
    
    lda mask_adder_1
    asl // remainder
    cmp #$14
    bcc mask_adder_skip_carry_2
    sbc #$14
    sec  // Make sure the carry flag is set for the rol.
mask_adder_skip_carry_2:
    sta mask_adder_2
    lda index_adder_1
    rol
    sta index_adder_2
    lda index_adder_1 + 1
    rol
    sta index_adder_2 + 1
    
    // Now we have our adders. Lets start adding.
    lda #$FF
    sta adder_index

add_loop:
    // Determine which adder we're using.
    inc adder_index

    // Do the bin version first
    lda adder_index
    and #$03

    // Store this, its for the index_adder
    tax
    asl // multiple by 2
    tay // store it temporarily

    // Figure out the mask_index
    // No need for clc here.
    lda mask_adder_1, X
    adc bit_mask_index

    // Are we >= 20
    cmp #$14
    bcc no_mask_carry

    // Yes, subtract 20 from this, and carry to the bit_index
    // Note that carry flag is set here. 
    sbc #$14           // Let the carry flag fall through to the adc bit_index after the no_mask_carry.

no_mask_carry:
    sta bit_mask_index

    // Now figure out the bit_index
    tya
    tax

    // We don't want to clc here, since it may fall from the sbc above.
    lda index_adder_1, X
    adc bit_index
    sta bit_index
    lda index_adder_1 + 1, X
    adc bit_index + 1
    sta bit_index + 1

    // Are we done? We already have the byte we want in there, no need to load
    cmp #>[end_bits - 1]
    bcc r_we_done_2     // <, if its less than, skip the second test

r_we_done:
    lda bit_index
    cmp #<[end_bits - 1]
    bcc r_we_done_2  // < Both of these mean we continue
    beq r_we_done_2  // = Both of these mean we continue

    jmp find_next_num

r_we_done_2:

    // We now have the index we need, and an offset to our bit mask. Turn that 
    // bit off.
    ldy #$00
    ldx bit_mask_index
    lda bit_mask,X
    and (bit_index),y
    sta (bit_index),y

    jmp add_loop

find_next_num:
    // Find the bit/mask index for the curr num
    clc
    lda #>bits
    adc curr_index + 1
    sta curr_index + 1

    // curr_mask_index has the mask index
    ldx curr_mask_index

next_num_loop:
    inx
    inx
    cpx #$14
    bcc no_mask_carry_2

    // No clc needed here
    txa
    sbc #$14
    tax

    lda curr_index
    adc #$00
    sta curr_index

    lda curr_index + 1
    adc #$00
    sta curr_index + 1

no_mask_carry_2:
    lda bit_mask, X
    beq next_num_loop // skip 0's

    // Now check the bit in that location
    ldy #$00
    ora (curr_index), Y

    cmp #$FF
    bne next_num_loop

    // Store Y in the remainer
    stx curr_mask_index
    stx bit_mask_index

    lda curr_index
    sta bit_index
    lda curr_index + 1
    sta bit_index   + 1

    // Calc the number from the index and mask.
    // First sub the offset from the curr_index + 1
    lda #>bits
    sec
    sbc curr_index + 1
    sta curr_index + 1

    lda curr_index
    sta curr
    lda curr_index + 1
    sta curr + 1
    lda #$00
    sta curr + 2
    sta curr + 3

    // Multiply by 16
    ldx #$00
multi_loop:
    asl curr
    rol curr + 1
    rol curr + 2
    rol curr + 3
    inx
    cpx #$04
    bne multi_loop

    // Times 4
    lda #$00
    sta curr_index + 2
    asl curr_index
    rol curr_index + 1
    rol curr_index + 2

    asl curr_index
    rol curr_index + 1
    rol curr_index + 2

    // Add in the mask index
    clc
    lda curr_mask_index
    sta bit_mask_index
    adc curr_index
    sta curr_index
    lda #$00
    adc curr_index + 1
    sta curr_index + 1
    lda #$00
    adc curr_index + 2
    sta curr_index + 2


    // Now add'em
    clc
    lda curr
    adc curr_index
    sta curr

    lda curr + 1
    adc curr_index + 1
    sta curr + 1

    lda curr_index + 2
    adc curr + 2
    sta curr + 2

    lda #$00     // continue in case we carry?
    adc curr + 3
    sta curr + 3

    // Display something
    jsr display_curr

    // Have we hit the square root?
    lda curr +1
    cmp #$03
    bcc b_we_done_2   // <, if its less than, skip the second test
    beq b_we_done     // =, need to do the second test
    jmp summit

b_we_done:
    lda curr
    cmp #$e8
    bcc b_we_done_2  // < Both of these mean we continue
    beq b_we_done_2  // = Both of these mean we continue
    jmp summit
        
b_we_done_2:
    jmp next_number

summit:
    // Display the message
    lda #$17 // Y
    pha
    lda #$0a // X
    pha
    lda #[summing_msg_end - summing_msg] // length
    pha
    lda #<summing_msg
    pha
    lda #>summing_msg
    pha
   
    jsr display_msg
   
    // Sum things together.
    lda #$07 // 2 and 5 are prime.
    sta curr
    lda #$00
    sta curr + 1
    sta curr + 2
    sta curr + 3
    sta curr + 4

    lda #<bits
    sta curr_index
    lda #>bits
    sta curr_index + 1

    ldx #$01 // Start on the first bit, next it'll advance to the 3rd bit
next_num_loop_2:
    ldy #$00
    inx
    inx
    
    cpx #$14
    bcc no_mask_carry_3

    // No clc needed here
    txa
    sbc #$14
    tax
    lda curr_index
    adc #$00
    sta curr_index
    bcc no_mask_carry_3
    inc curr_index + 1

no_mask_carry_3:
    // Are we done?
    lda curr_index + 1
    cmp #>[end_bits - 1]
    bcc r_we_done_4     // <, if its less than, skip the second test

r_we_done_3:
    lda curr_index
    cmp #<[end_bits - 1]
    bcc r_we_done_4  // < Both of these mean we continue
    beq r_we_done_4  // = Both of these mean we continue

    jmp end_prg

r_we_done_4:

    lda bit_mask, X
    beq next_num_loop_2 // skip 0's

    // Now check the bit in that location
    ora (curr_index), Y

    cmp #$FF
    bne next_num_loop_2

    stx curr_mask_index

    // We now have the index and mask, convert to a hex value.
    txa

    sta index_and_mask_to_hex_work_mask
    lda curr_index
    sta index_and_mask_to_hex_work_index
    lda curr_index + 1
    sec
    sbc #>bits
    sta index_and_mask_to_hex_work_index + 1

    jsr index_and_mask_to_hex

    clc
    lda index_and_mask_to_hex_work_hex
    adc curr
    sta curr

    lda index_and_mask_to_hex_work_hex + 1
    adc curr + 1
    sta curr + 1

    lda index_and_mask_to_hex_work_hex + 2
    adc curr + 2
    sta curr + 2

    lda index_and_mask_to_hex_work_hex + 3
    adc curr + 3
    sta curr + 3

    lda #$00
    adc curr + 4
    sta curr + 4
    
    // Display something
    jsr display_curr

    ldx curr_mask_index
    jmp next_num_loop_2

end_prg:

    // copy the zp back
    sei

    // Let us see the io page.
    lda #$34
    sta $01

    // Copy the data back to the zp.
    ldx #$02
    copy_zp_3: lda $e000,x
    sta $00,x
    inx
    bne copy_zp_3

    // Renable the io page and basic.
    lda #$37
    sta $01
    cli
break:
    rts

display_curr:
    ldx #$05
    ldy #$FE
display_loop:
    dex
    iny
    iny
    txa
    pha
    tya
    pha
    lda curr, x
    jsr get_hex
    pla
    tay
    pla
    tax
    lda hex_hi
    sta calculating_location, y
    lda hex_lo
    sta calculating_location + 1, y
    cpx #$00
    bne display_loop
    rts

get_hex:
    // Take a byte in A, and stores the chars to display in x and Y
    tay
    and #$f0
    lsr
    lsr
    lsr
    lsr
    tax
    lda print_table,x
    sta hex_hi
    tya
    and #$0f
    tax
    lda print_table,x
    sta hex_lo
    rts

divide:
        ldx     #$11    // over at the same time as shifting the answer in, the
                        // operation must start AND finish with a shift of the
                        // low cell of the dividend (which ends up holding the
                        // quotient), so we start with 17 (11H) in X.
        clc
div_loop:
        lda     #$00
        sta     div_carry       // Zero old bits of CARRY so subtraction works right.

        rol     divisor+2       // -JMS- Move low cell of dividend left one bit, also shifting
        rol     divisor+3       // -JMS- answer in. The 1st rotation brings in a 0, which later

                                // gets pushed off the other end in the last rotation.
        dex                     //
        beq     div_end         // Branch to the end if finished.
                                //
        rol     divisor+4       // -JMS- Shift high cell of dividend left one bit, also
        rol     div_carry       // Store old high bit of dividend in CARRY. (For STZ
                                // one line up, NMOS 6502 will need LDA #0, STA CARRY.)
            
        sec                     // See if divisor will fit into high 17 bits of dividend
        lda     divisor+4       // -JMS- by subtracting and then looking at carry flag.
        sbc     divisor         // First do low byte.
        sta     divisor+6       // Save difference low byte until we know if we need it.
        lda     div_carry       // Bit 0 of CARRY serves as 17th bit.
        sbc     #$0             // Complete the subtraction by doing the 17th bit before
        bcc     div_loop        // determining if the divisor fit into the high 17 bits
                                // of the dividend. If so, the carry flag remains set.
        lda     divisor+6       // If divisor fit into dividend high 17 bits, update
        sta     divisor+4       // -JMS- dividend high cell to what it would be after
        bcs     div_loop
oflo:
        lda     #$FF            // If overflow occurred, put FF
        sta     divisor+2       // in remainder low byte
        sta     divisor+3       // and high byte,
        sta     divisor+4       // and in quotient low byte
        sta     divisor+5       // and high byte.

div_end:rts

display_ret:
.byte 0, 0

display_msg:
    pla
    sta display_ret
    pla
    sta display_ret + 1
    pla
    sta display_msg_work + 1
    pla
    sta display_msg_work
    pla
    sta display_msg_work + 2 // Length
    pla
    tax
    pla
    tay

    lda #$04
    sta display_msg_ptr + 1
    lda #$00
    sta display_msg_ptr

add_y:
    cpy #$00
    beq done_y
    
    clc
    lda #40
    adc display_msg_ptr
    sta display_msg_ptr
    lda #$00
    adc display_msg_ptr + 1
    sta display_msg_ptr + 1
    dey
    jmp add_y

done_y:
    txa
    clc
    adc display_msg_ptr
    sta display_msg_ptr
    lda #$00
    adc display_msg_ptr + 1
    sta display_msg_ptr + 1
    
    ldy #$00
write_to_screen:
    lda (display_msg_work),Y
    sta (display_msg_ptr),Y
    iny
    cpy display_msg_work + 2
    bne write_to_screen

    lda display_ret + 1
    pha
    lda display_ret
    pha
    rts

index_and_mask_to_hex:
    lda index_and_mask_to_hex_work_index
    sta index_and_mask_to_hex_work_hex
    lda index_and_mask_to_hex_work_index + 1
    sta index_and_mask_to_hex_work_hex + 1

    lda #$00
    sta index_and_mask_to_hex_work_hex + 2
    sta index_and_mask_to_hex_work_hex + 3

    // Multiply by 16
    ldx #$04
multi_loop_2:
    asl index_and_mask_to_hex_work_hex
    rol index_and_mask_to_hex_work_hex + 1
    rol index_and_mask_to_hex_work_hex + 2
    rol index_and_mask_to_hex_work_hex + 3
    dex
    bne multi_loop_2

    // Times 4
    lda #$00
    sta index_and_mask_to_hex_work_index + 2

    asl index_and_mask_to_hex_work_index
    rol index_and_mask_to_hex_work_index + 1
    rol index_and_mask_to_hex_work_index + 2
    
    asl index_and_mask_to_hex_work_index
    rol index_and_mask_to_hex_work_index + 1
    rol index_and_mask_to_hex_work_index + 2

    // Add in the mask index
    clc
    lda index_and_mask_to_hex_work_mask
    adc index_and_mask_to_hex_work_index
    sta index_and_mask_to_hex_work_index
    lda #$00
    adc index_and_mask_to_hex_work_index + 1
    sta index_and_mask_to_hex_work_index + 1
    lda #$00
    adc index_and_mask_to_hex_work_index + 2
    sta index_and_mask_to_hex_work_index + 2

    // Now add'em
    clc
    lda index_and_mask_to_hex_work_hex
    adc index_and_mask_to_hex_work_index
    sta index_and_mask_to_hex_work_hex

    lda index_and_mask_to_hex_work_hex + 1
    adc index_and_mask_to_hex_work_index + 1
    sta index_and_mask_to_hex_work_hex + 1

    lda index_and_mask_to_hex_work_hex + 2
    adc index_and_mask_to_hex_work_index + 2
    sta index_and_mask_to_hex_work_hex + 2

    lda index_and_mask_to_hex_work_hex + 3     // continue in case we carry?
    adc #$00
    sta index_and_mask_to_hex_work_hex + 3
    rts




clearing_memory_msg:
.text "clearmem   "
clearing_memory_msg_end:
calculating_msg:
.text "calulating:"
calculating_msg_end:
summing_msg:
.text "summing   :"
summing_msg_end:




// I should see if I find space in the zero page for this. 20 bytes
bit_mask_table:
.byte $00,          // 0
      %11111110,    // 1
      $00,          // 2
      %11111101,    // 3
      $00,          // 4
      $00,          // 5
      $00,          // 6
      %11111011,    // 7
      $00,          // 8
      %11110111,    // 9
      $00,          // 10
      %11101111,    // 11
      $00,          // 12
      %11011111,    // 13
      $00,          // 14
      $00,          // 15
      $00,          // 16
      %10111111,    // 17
      $00,          // 18
      %01111111     // 19
                       
print_table:
.byte $30,
      $31,
      $32,
      $33,
      $34,
      $35,
      $36,
      $37,
      $38,
      $39,
      $01,
      $02,
      $03,
      $04,
      $05,
      $06,
      $07

 

How to boot strap your C64 cheap!

So you decided to go retro and buy yourself a Commodore 64 and a disk drive. Welcome to the world of retro computing. But now what?

You start by searching the net for how to get software onto the c64. You quickly find out there’s an entire realm of ways to hook up the 1541 to your computer through a variety of cables. Some cables are newer and faster than others, and overall is a confusing mess of documentation. 

The other route is to buy an extra piece of hardware that lets you use memory cards, or USB, or something else to replace the disk drive, This will work, but it costs money, and well, it removes the disk drive from the picture! You want to go full retro!

I decided to go a different route. I built a ‘modern’ serial port for the commodore, following the instructions http://biosrhythm.com/?p=1136#comment-35325. Its mentioned there that you don’t need to hook up the VCC line, but in my case, I needed to. Without that VCC line, I was not able to transmit.

Once I had that, I spent a better portion of a day writing a BASIC application to handle a simple file transfer. I probably made this overly complicated, but it does have simplistic error checking (not very good, but better than nothing), and its pretty slow. But it does fit onto a single 64 screen! 

A few notes. First, the baud rate is set at 600. Not 300, not 1200. 600. This was the fastest baud I could go at, and reliably transfer files. If you want to try 1200, change the 7 in the first line to an 8. For 300 baud, change it to a 6.

To change the filename, edit line 1000.

In order to send the file from my desktop (A MacBookPro), I used a python application that looks like this:

#!/usr/bin/env python
import serial

s = serial.Serial("/dev/tty.usbserial-A100TLCU", baudrate=600)

# send the command to open the file.
s.write("\x01")

byte=1
with open("Handyterm.prg", "r") as f:
    while f:
        ret = -1
        data = f.read(1)
        if len(data) == 0:
            break
        s.write("\x02") # send data command
        while ret != data:
            print "sending byte: ", byte
            s.write(data)
            print "sedning: ", ord(data)
            ret = s.read(1)
            print "rec'ed: ", ord(ret)
            if ret != data:
                s.write("\x03")
            else:
                byte+=1
                break
    s.write("\x04")

To change the serial port, edit the Serial() line. You’ll also need to change the open() to serve the file you want.

Start the c64 side first, then start the desktop side. The file will be written to disk on the c64 side. If it completes successfully, both sides will exit cleanly. When i was attempting things at 1200 baud, it seems the c64 side would miss a character and ‘lock up’.. At that point, I would have to stop both sides, and do the following on the C64 side:

close1
open1,8,15,”s:zterm”;close1

That will close the file that was being written to, and then delete the incomplete file. You then need to start over.

I would transfer a decent terminal program over first. I tried CCGMS, but that seems to have a buggy Xmodem transfer. Novaterm is too many files to transfer to get working. So I settled on HandyTerm from http://www.zimmers.net/anonftp/pub/cbm/c64/comm/. HandyTerm’s Xmodem seems to work, at least at 300 baud for me (no choice for 600). 1200 failed. I suspect my serial cable is flakey.

Note that you need to transfer .prg files. Compressed files (.arc, zip) and .d64 won’t work.