Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In all my GPT-4 API (python) experiments, it takes 15-20 seconds to get a full response from server, which basically kills every idea I've tried hacking up because it just runs so slowly.

Has anyone fared better? I might be doing something wrong but I can't see what that could possibly be.



Streaming. If you’re expecting structured data as a response, request YAML or JSONL so you can progressively parse it. Time to first byte can be milliseconds instead of 15-20s. Obviously, this technique can only work for certain things, but I found that it was possible for everything I tried.


Run it in the background.

We use it to generate automatic insights from survey data at a weekly cadence for Zigpoll (https://www.zigpoll.com). This makes getting an instant response unnecessary but still provides a lot of value to our customers.


Anthropic Instant is the best LLM if you're looking for speed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: