>The reason that lazy evaluation leads to performance problems is that programmers don't know how to use it correctly
It's because it's objectively harder to reason about performance with lazy evaluation as performance becomes non-local. You can no longer reason about a function's performance just by looking at its body, instead you have to know the nature of the inputs passed to it (whether they're thunks or materialised values).
Each single uses of lazy evaluation also imposes a latency overhead as the thunk/value must be behind a pointer. Yes the compiler can remove unnecessary thunks, but that's "unnecessary" in the sense of "unnecessary given the semantics of the program", not "unnecessary given the problem the program's trying to solve". I.e. lazy by default will generally result in someone writing code that's lazier than necessary unless they're very careful with annotating to the compiler everywhere laziness isn't needed.
> It's because it's objectively harder to reason about performance with lazy evaluation as performance becomes non-local.
This is off-repeated, but has two interpretations. If you mean performance as in number of steps needed to reduce an expression, lazy evaluation is superior to strict evaluation. If you mean the indirection needed to access a thunk, well someone -- either the producer (strict) or the consumer (lazy) -- had to do that anyways. If GHC can determine that the demand for the value is strict, it won't generate the thunk in the first place. Instead it will provide a specialize calling convention via the worker/wrapper transform, which is standard. On the un-optimized case we still have dynamic pointer tagging (which is a transgression on the T of the STG machine) which serves as a tag to check whether we are dealing with a value or a thunk quickly. So if by non local performance you mean the indirection, modern GHC shows that is not true.
If you means that space leaks are a non-local property and they affect performance, well you are sort that right. But as with all programming, we have defensive patterns against that.
There are 2 types of space leaks: liveness leaks and strictness leaks. Only the first one are a non-local property, ie the appear as a consequence of the composition of different code pieces. But given the default purity of Haskell, those can only appear on long lived data references with are syntactically marked by:
- IORef, MVars, TVars
- get/put pairs over a state environment
So what you do is to specify in the types that values stored on those environments are to be evaluated before being stored, so references don't leak. I speak about this on this
This reply neatly captures that functional programming tends to consider performance/efficiency in terms of number of reductions. Sadly other programmers tend to view it as wall clock time and the two don't correlate very well.
If a client comes to me and says one of the parts of their application was slow, and I come back to them saying I lowered the number of reductions with zero change in the wall clock time, they'll fire me, and rightly so.
To be nitpicky, wall clock time isn't the same as CPU time or power draw. You can increase efficiency at the same time as you increase wall clock time because they aren't necessarily the same metric.
I think you're overcomplicating this. In a language with lazy evaluation, you can't know what gets evaluated without looking at the whole program (in the worst case). It's in that sense that performance becomes non-local.
Here's a simple specific example. Take the Haskell expression [1..n], for some big n. There is no general answer to the question "what would be the performance implications of replacing [] with [1..n]" – it depends on the context of [].
In a strict language, you can say (roughly at least) that replacing [] with [1..n] will add at minimum a certain number of extra CPU cycles and a certain amount of additional allocation. This kind of reasoning is pretty rough and ready, but it is accurate enough to be useful for reasoning about performance in practice.
I note that Simon Peyton-Jones has said these exact words in a public talk:
"Laziness makes it much, much harder to reason about performance."
I think it's extremely unlikely that he's mistaken or confused on this point.
> In a strict language, you can say (roughly at least) that replacing [] with [1..n] will add at minimum a certain number of extra CPU cycles and a certain amount of additional allocation.
It's not at minimum, it's always the worst case cost of N in strict languages, whereas the lazy setting provides you with amortized complexity.
I think you might be misunderstanding what I meant by "at minimum". I'm talking about the case of replacing a [] constant in some arbitrary piece of code with something like [1..10000]. Given strict semantics you'll of course incur at least the time and space costs associated with constructing the list (that's the "at minimum" bit), but you might also incur additional costs depending on what the rest of the code does with the list. For example, it might execute some arbitrarily expensive computation if the list has more than 20 elements, or whatever.
I think you might have thought I was saying that given a strict semantics it was somehow still possible that you wouldn't necessarily incur the full time and space penalty for constructing n elements (which of course is not the case).
Interesting comment, but many parts of it flew over my head. Would you by chance have some suggestions on where could I better my knowledge of Haskell internals/FP runtimes? Of course your blog has been added to my to-read list :)
I used to know Haskell to an “intermediate level”, but haven’t used it in years, but I am more familiar with “traditional” runtimes/compilers, like the JVM as a reference.
It's because it's objectively harder to reason about performance with lazy evaluation as performance becomes non-local. You can no longer reason about a function's performance just by looking at its body, instead you have to know the nature of the inputs passed to it (whether they're thunks or materialised values).
Each single uses of lazy evaluation also imposes a latency overhead as the thunk/value must be behind a pointer. Yes the compiler can remove unnecessary thunks, but that's "unnecessary" in the sense of "unnecessary given the semantics of the program", not "unnecessary given the problem the program's trying to solve". I.e. lazy by default will generally result in someone writing code that's lazier than necessary unless they're very careful with annotating to the compiler everywhere laziness isn't needed.