Two Stop Bitsnew | comments | tags | ask | submitlogin
Optimizing a 6502 image decoder, from 70 minutes to 1 minute (6502 appleii) (colino.net | ia)
7 points by starac 51 days ago | 9 comments
  • seclorum 49 days ago
    This article is an absolute treasure - a work of art, in itself.

    I never dived into the C64 video architecture much back in the day when C64 was fresh - I instead had an Oric-1/Atmos and thus had other thorns to deal with - but I have always respected the levels to which the C64 has been pushed.

    In the Oric-1/Atmos world, we too have strange attributes to exploit and derive new tricks, never once considered feasible, for the platform.

    I have often thought of what it'd take to add a camera to the Oric, and thought to just glom an ESP32 with a realtime libpipi [1] or Pictconv [2] implementation, generating LDA/STA's for the Orics very humble HIRES mode straight into its DATA lines.

    There is a great deal of satisfaction in seeing such insane optimisations being developed with 21st century optimism for the 'retro' computers. Making them do insane new things makes them new again.

    [1] - http://caca.zoy.org/wiki/libpipi/oric [2] - https://www.osdk.org/index.php?page=documentation&subpag...

  • bmonkey325 51 days ago
    Love how this chronicles the instruction count at 301 million and then for each optimization and compromise it cuts xx million instructions of the runtime.

    I think the 6502 final would need to be run in an emulator to get the retired instruction count. On 586+ cpu such a function is baked into the hardware.

    • colinlm 50 days ago
      Thanks :) I do time the 6502 code using the MAME emulator, its debugger's trace feature, and a profiler I made. It's far from perfect (gets very confused by tricks like pha/pha/rts in the IIe ROM) but works under IIc emulation and allows me to precisely count cycles in my code: https://www.colino.net/wordpress/en/a2trace-debug-and-profil.../
    • Trixter 49 days ago
      RDPMC was introduced with the Pentium; was there some other instruction or method you were thinking of for 386 or 486 CPUs?
      • bmonkey325 48 days ago
        Good catch. Those were only in p5 and p6+ I was watching baseball in the easy chair when I wrote that. Fixed.
    • ddingus 51 days ago
      Yeah, it was a good optimization effort.

      My preference was to work in cycles. Many systems have a timer one can use to get the cycle counts. There isn't one on a stock Apple 2. Many cards have the PIA chip, 6522, which does have two timers, though they are only 16 bit.

      Or, a quick hand timing gets fairly close. On that, the only real difficulty is finding a task that scales well with our perceptual slowness.

  • bmonkey325 48 days ago
    Today I was thinking about this more. Yes. The optimization is cool. But the original 70 minutes runtime. 70 minuets to know if your code works. On average this is about 73 pixels a second. Like watching paint dry.

    No chance this ran correct the first time. How many times it ran before it was like oh crap. Try again. That’s some serious dedication.

    • colinlm 47 days ago
      Yes, it was very painful at first. The things that offset the pain is that at the start, the code was strictly the same (C code) for x86-64 and 6502, so I could iterate bugs out with the x86-64 decoder before trying with the 6502 code, in a few milliseconds. Afterwards, once I started going assembly on the 6502 code I iterated over that using MAME's Apple IIgs with 16MHz ZipChip, making the wait ~5 minutes instead of 70 at start, and less and less as I progressed.
      • bmonkey325 47 days ago
        What an amazing time to be alive where MAME is a development environment.
lists | rss | source
Search:
Two Stop Bits is a discussion web site about retro computing and gaming.