Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why would the fact that it failed to follow one instruction increase the likelihood that it failed to follow others within the same response?




Because the LLM is not a cognitive entity with a will, it is a plausibility engine trained on human-authored text and interactions.

So when you tell it that it made a mistake, or is stupid, then those things are now prompting it to be more of the same.

And only slightly more obliquely: if part of the context includes the LLM making mistakes, expect similar activations.

Best results come if you throw away such prompts and start again. That is, iterate outside the function, not inside it.


It has a fixed capacity of how many different things it can pay close attention to. If it fails on a seemingly less important but easy to follow instruction it is an indicator that it has reached capacity. If the instruction seems irrelevant it is probably prioritized to be discarded, hence a canary that the capacity has been reached.

> It has a fixed capacity of how many different things it can pay close attention to

Source, all the way down to the ability to "pay attention to" part.


I suggest you take a look at Bayes's theorem in probability.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: