Humaneval needs careful consideration though. In the GPT-4 technical report, the...

Humaneval needs careful consideration though.

In the GPT-4 technical report, they reported contamination of humaneval data in the training data.

They did measure against a "non-contaminated" training set but no idea if that can still be trusted.