I took the compilers class at Stanford and never really understood the algorithm...

userbinator · on Aug 22, 2021

I recommend you also teach "precedence climbing", which is basically a slight refactoring of RD to parameterise the recursive functions by level so that grammars with many levels don't require proportionally deep call stacks and it also massively reduces the duplication of code at each level:

https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing

dataflow · on Aug 21, 2021

I still recommend spending the time to (re-)learn bottom-up parsing though, even if you never use it again. Specifically, LR and then GLR (no need to bother with LALR/SLR/all those variants). Maybe even recursive ascent too, or recursive ascent-descent if you find those interesting. It's painful and confusing and takes a long time to pick them up, but it's quite gratifying once you do learn it. The algorithms are super clever and cool, and help you gain a lot of appreciation and intuition for what is/isn't possible to do (whether efficiently or inefficiently).

Using recursive descent for everything is like using Python (or Go or pick your favorite easy language to learn) for everything. Yeah you can totally do it, and a heck of a lot of people do precisely because of how nice and convenient it is, but it's good to have a sense what else is out there and how you can think about a problem differently IMO, even if you rarely use other languages in practice.

recursivedoubts · on Aug 21, 2021

i agree, but for an undergraduate understanding of parsing and grammars, recursive descent is perfect

i think the more complex parsing algorithms were good for producing papers, which explains their success in academia but relative lack of success in industry

samus · on Aug 22, 2021

Recursive parsing is a completely valid concept, but it's unnatural at first to hand-write parsers with it. I have tried it - even wrote a prototype DSL in Haskell (the CPS monad is very useful for that), but what I end up doing is using the DSL to write down the LR state machine, bottom up. This is eerily familiar to embedded and GUI programming where internal state matters and the program reacts to input. It might even be a better way to incorporate ad-hoc code that has side effects. However, most people think about programming languages top-down and use grammars to describe them.

recursivedoubts · on Aug 21, 2021

as an aside, the parser for https://hyperscript.org is recursive descent, and our motto is: "remember what our parser professors taught us, then do the opposite of that."

There's some ugly stuff in there, but it's fun, all because we wrote it in recursive descent and can do whatever we darn well please.

Taniwha · on Aug 22, 2021

I think the best way to understand it is that you're creating an AST (a tree describing the program), for a bottom up parser you are recognising and making little treelets and stitching them together into larger treelets until you reach the top/root, while for recursive descent you're building them from the top down from the root.

The main difference is that LR (bottom up) family grammars can describe more languages than LL (top down) ones can largely because bottom up has slightly more contextual information available when it puts stuff together (often small treelets rather than just the next token)

exdsq · on Aug 21, 2021

I took a compilers class at an ex-Polytechnic in the UK that taught recursive descent - I now wonder how old the textbook they were using was!

deterministic · on Aug 23, 2021

Yep that’s the way to do it. Teaching LL/LR and it’s variations is a complete waste of time.

Koshkin · on Aug 21, 2021

I hope you are not making the same mistakes as your teacher at Stanford did (and leave students clueless).

recursivedoubts · on Aug 21, 2021

i hope so too

atarian · on Aug 21, 2021

Any resources you recommend for someone who is interested in learning about recursive descent?

recursivedoubts · on Aug 21, 2021

Yes, the second part of crafting interpreters has a great introduction to the topic and is available online for free:

https://craftinginterpreters.com/contents.html

atarian · on Aug 21, 2021

Thanks!

mhh__ · on Aug 22, 2021

Get a lexer (like, just steal one if you don't think you'll be able to write one without getting bored, they're usually fairly self contained) and just starting writin' code. They're basically that simple.

The lovely thing about a nicely written recursive descent parser is that the code pretty much is the grammar (topologically), so any grammar you can actually imagine that can be parsed by recursive descent doesn't take much thought to make work.

twic · on Aug 22, 2021

IMVLE, a lexer is just a big regular expression, with a branch at the top level for each type of token. Are there cases where this doesn't work?

samus · on Aug 22, 2021

Python and Haskell are interesting because the indentation is crucial to figure out the correct nesting. Haskell allows to use curly braces and semicolons to completely remove indentation, so the lexer only has to add virtual tokens into the token stream. For Python, I don't know how it deals with it.

C/C++ have plenty of difficult syntax edge cases. For example, >> can represent a single operator in an expression or two successive > in a template declaration.

twic · on Aug 22, 2021

Python has some preprocssing to inject INDENT and DEDENT pseudo-tokens into the lexing stream:

https://docs.python.org/3/reference/lexical_analysis.html#in...

rhdunn · on Aug 22, 2021

Some things require (or are more easily implemented) using state-based lexing, such as identifying escaped characters in JSON strings or XML attributes, or documentation comments. These can be thought of as a Finite State Machine (for the state) driving the different sub-lexer regex token matching.

tiborsaas · on Aug 21, 2021

I know it's not Reddit but username checks out :)