It’s basically called “reinforced learning” and it’s a common technique for mach...

suddenlybananas · 2025-11-15T09:09:45 1763197785

How do you define the goal? This kind of de novo neural program synthesis is a very hard problem.

hnlmorg · 2025-11-15T09:40:46 1763199646

Defining the goal is the easy part: as I said in my OP, the goal is unit tests passing.

It’s the other weights that are harder. You might want execution speed to be one metric. But how do you add weights to prevent cheating (eg hardcoding the results)? Or use of anti-patterns like global variables? (For example. Though one could argue that scoped variables aren’t something an AI-first language would need)

This is where the human feedback part comes into play.

It’s definitely not an easy problem. But it’s still more pragmatic than having a human curate the corpus. Particularly considering the end goal (no pun intended) is having an AI-first programming language.

I should close off by saying that I’m very skeptical that there’s any real value in an AI-first PL. so all of this is just a thought experiment rather than something I’d advocate.

macleginn · 2025-11-15T10:46:29 1763203589

With such learning your model needs to be able to provide some kind of solution or at least approximate it right off the bat. Otherwise it will keep producing random sequences of tokens and will not learn anything ever because there will be nothing in its output to reward, so no guidance.

hnlmorg · 2025-11-15T11:27:53 1763206073

I don’t agree it needs to provide a solution off the bat. But I do agree there is some initial weights you need to define.

With a AI-first language, I suspect the primitives to be more similar to assembly or WASM rather than something human readable like Rust or Python. So the amount of pre-training preparation would’ve a little easier since syntax errors due to parser constraints.

I’m not suggesting this would be easy though haha. I think it’s a solvable problem but that doesn’t mean it’s easy.

nrhrjrjrjtntbt · 2025-11-15T09:26:16 1763198776

1. Choose set of code challenges (generate them, leetcode, AOC etc.)

2. LLM generates python solution and seperate python test (as in python test calls code as black box process so it can test non python code)

3. Agent using skills etc. tries to write new language let's call it Shark.

4. Run Shark code against test. If fails use agentic flows to correct until test passes.

5. Now have list of challenges, working code (maybe not beautiful) for training.

A bit of human spot checking may not go amiss!