Parser generators generally suffer from at least speed, recovery, error handling...

Zababa · on Aug 21, 2021

I think you have to see them a bit like regexes. It may be a bit annoying to learn the grammar at first, but it's cross-language learning in a way, as you can use parser generators libraries in any language.

chrisseaton · on Aug 21, 2021

But it's still one extra language to learn, for in practice no benefit.

Zababa · on Aug 21, 2021

Copy/pasting the grammar of a language and using a library will be faster to do than any hand-written implementation, so I don't agree that there are no benefits.

chrisseaton · on Aug 21, 2021

But you can't do that because in practice these grammars have lots of imperative action code inserted all over the place because the parser generator's model fundamentally doesn't match the language class you're parsing and it needs to be worked around.

For example I work with a parser definition that's supposedly shared between C and Java, but it's a massive nightmare because it's 50% imperative actions.

Zababa · on Aug 21, 2021

That depends on the language you're trying to parse. If it has a LL or LR grammar, it'll work well. If it's something else that can't be described with something like that or something less powerful, you're going to have a bad time.

chrisseaton · on Aug 21, 2021

> That depends on the language you're trying to parse.

Yes... but most practical languages do not have simple LL or LR grammars. Hence this blog post and discussion and these problems.

Zababa · on Aug 21, 2021

That's not true. You can have a LL or LR grammar, and use a hand-written parser to get better speed, better error reporting, any number of reasons really. Java is LL(1), but javac seem to use a hand-written recursive descent parser. Now that I think about it, Java being omitted is weird.

UncleEntity · on Aug 22, 2021

> Java is LL(1)...

Java hasn't ever been able to be parsed with one token look ahead, the earlier versions had a section on how to modify the grammar to be LALR(1)[0] but it was dropped in later versions of the specification -- probably due to added features which made it unfeasible like generic classes.

It's actually quite a good resource since they explain the reasons why the parser needs to have more than one token to figure out what's going on.

[0] http://titanium.cs.berkeley.edu/doc/java-langspec-1.0/19.doc...

chrisseaton · on Aug 21, 2021

What’s not true?

I said ‘most’ not ‘all’, and nobody here said the only reason to hand-write a parser was context sensitivity.