“Hacks” 4-10 could easily be replaced with “use numpy.” Performance gains from doing math better in pure Python are minimal compared with numpy. It’s not unusual for the numpy version of something to end up taking 0.01x as long to run.
A lot of interesting math can't be done in numpy, sadly. At that point you might be better off writing the initial version in Python and translating it to something else.
A friend of mine asked me to translate some (well-written) number theory code a while back, I got about a 250x speedup just doing a line by line translation from Python to Julia. But the problem was embarrassingly parallel, so I was able to slap on an extra 40x by tossing it on a big machine for a few hours for a total of 10,000x. My friend was very surprised – he was expecting around a 10x improvement.
I 'wrote' (adapted from the Rich project's example code) a simple concurrent file downloader in Python; run 'download <any number of URLs>' and it goes and downloads each one, assuming that the URL has what looks like a filename at the end or the server response with a Content-Disposition header that contains a filename. It was very simple; spawn a thread for each file we're downloading, show a progress bar for each file we're downloading, update the progress bar as we download.
I ended up rewriting the whole thing in Rust (my first Rust project) solely because I noticed that just that simple process - "get some bytes from the network, write them to this file descriptor, update the progress bar's value" was churning my CPU due to how intensive it was for the progress bar to update as often as it was - which wasn't often.
Because of how ridiculous it was I opted to rewrite it in another language; I considered golang but all of the progress bar libraries in Golang are mediocre at best, and I liked the idea of learning more Rust. Surprise surprise, it's faster and more efficient; it even downloads faster, which is kind of ridiculous.
An even crazier example: a coworker was once trying to parse some giant logfile and we ended up nerd-sniping ourselves into finding ways to speed it up (even though it finished while we were doing so). After profiling this very simple code, we found that 99% of the time in processing each line was simply parsing the date, and 99% of that was because Python's strptime is devoted to being able to parse timezones even if the input you're giving it doesn't include one. We played around with things like storing a hash map of "string date to python datetime" since there were a lot of duplicates, but the fastest method was to write an awful Python extension that basically just exposed glibc's strptime so you could bypass Python's (understandably) complex tz parsing. For the version of Python we were using it made parsing hundreds of thousands of dates 47x faster, though now in Python3 it's only about 17x faster? Maybe less.
I still use Python all the time because usually the time I save writing my code quickly more than outweighs the time I spend having slower code overall; still, if your code is going to live a while, maybe try running it through a profiler and see what surprises you can find.
Yes I didn't realize anyone actually did anything numerical without numpy. I don't think I've ever imported python's math module once. Who in their right mind is making a 1e6 long python list