Yes, it was very painful at first. The things that offset the pain is that at the start, the code was strictly the same (C code) for x86-64 and 6502, so I could iterate bugs out with the x86-64 decoder before trying with the 6502 code, in a few milliseconds. Afterwards, once I started going assembly on the 6502 code I iterated over that using MAME's Apple IIgs with 16MHz ZipChip, making the wait ~5 minutes instead of 70 at start, and less and less as I progressed.