I used to write software 3D renderers in the 90s and I think of all the time I spent optimizing the assembler, but now I look at stuff that we thought was optimized at the time and people are discovering all sorts of new speed-ups. There are a bunch of videos on YouTube about optimizing N64 games where they found tons of stuff the developers missed, e.g. https://www.youtube.com/watch?v=t_rzYnXEQlE