Given the token count limits on input and output, I've been wondering how folks work on these long-running projects? Is it just not a problem, or are there some tactics? I find in trying to build up lengthy outputs that it starts truncating or leaving out things from way before.
I think this is a good question. Screenshots are hard on HN, but I could write up a blog post about it all once I have a demo of the game.
I've been using ChatGPT and GPT-4 since they came out, so I'm pretty aware of their limitations. You can't build an entire project in a single project. GPT loses the context at one point.
With that in mind, break down the project into chunks and make each one a conversation. For me, that was:
1. Initial Setup
2. Add a randomly generated tilemap
3. Character Creation and Placement
4. Movement
5. Turn-based execution
6. Fog of War
7. Enemy Monsters
Each one of these is a separate thread. They start with:
> You are an expert game developer using Phaser 3 and TypeScript. You are going to help me create a game similar to Pixel Dungeon using those technologies.
I then copy in relevant code and design decisions we already made so it has the context. If it gets the answer totally wrong, I'll stop the generation and rephrase the question to keep things clean.
Lastly, I'll frequently ask it to talk about design first saying something like:
> We are going to start with the world map and dungeon generation to start. We'll add a character sprite and movement next. What's the best way to do tilemap generation and store the data? Don't write any code, just give me high level suggestions and your recommendation.
This lets me explore the problem space first, decide on an approach, and then execute it. I'll also ask for library suggestions to help with approaches, and I'm generally surprised by the suggestions. Like all software, once you know the design, the pattern, the goal, and have good tools to help you achieve it... the coding isn't all that hard.
I use the API and have a generic prompt prefix for each programming language and type of task to try to get as much bang for my buck with as small of a token count as possible, and then tossing in a single query with a bit of project-specific code fleshes out the rest of what I'm asking it. Prompt tokens are usually around 300-1000, leaving over 3000 for the response (which it rarely fully uses).
It's garbage at pulling in things from all over the codebase, so as a sort of co-evolution I write code that mostly only needs local context and only ask it questions that only need local context. That works great at home but doesn't suffice on the existing code at $WORK.
It's still helpful here and there at work (e.g., given this example string please output the appropriate python datetime formatting codes), but only because it's a wee bit faster than synthesizing the relevant docs and not because it's able to consistently do anything super meaningful.
You have to input just the pieces you want help with. It can't do a whole project, but you can tell it the tech stack, and maybe give it a few of your classes, and it can then write a new class for you.