Yeah, it was a good optimization effort.

My preference was to work in cycles. Many systems have a timer one can use to get the cycle counts. There isn't one on a stock Apple 2. Many cards have the PIA chip, 6522, which does have two timers, though they are only 16 bit.

Or, a quick hand timing gets fairly close. On that, the only real difficulty is finding a task that scales well with our perceptual slowness.