Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I took the compilers class at Stanford and never really understood the algorithms of bottom up parsing, or even really how grammars worked. I just made the tool work.

I then went to work at a private company, and an older guy who had gone to a state school that taught recursive descent (his was the last class to teach it) taught me how to do it. In a month or so I had learned more about how grammars actually work, what ambiguity is, and so forth, than in my whole class at Stanford.

I now teach compilers at a university, and I teach recursive descent.



I recommend you also teach "precedence climbing", which is basically a slight refactoring of RD to parameterise the recursive functions by level so that grammars with many levels don't require proportionally deep call stacks and it also massively reduces the duplication of code at each level:

https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing


I still recommend spending the time to (re-)learn bottom-up parsing though, even if you never use it again. Specifically, LR and then GLR (no need to bother with LALR/SLR/all those variants). Maybe even recursive ascent too, or recursive ascent-descent if you find those interesting. It's painful and confusing and takes a long time to pick them up, but it's quite gratifying once you do learn it. The algorithms are super clever and cool, and help you gain a lot of appreciation and intuition for what is/isn't possible to do (whether efficiently or inefficiently).

Using recursive descent for everything is like using Python (or Go or pick your favorite easy language to learn) for everything. Yeah you can totally do it, and a heck of a lot of people do precisely because of how nice and convenient it is, but it's good to have a sense what else is out there and how you can think about a problem differently IMO, even if you rarely use other languages in practice.


i agree, but for an undergraduate understanding of parsing and grammars, recursive descent is perfect

i think the more complex parsing algorithms were good for producing papers, which explains their success in academia but relative lack of success in industry


Recursive parsing is a completely valid concept, but it's unnatural at first to hand-write parsers with it. I have tried it - even wrote a prototype DSL in Haskell (the CPS monad is very useful for that), but what I end up doing is using the DSL to write down the LR state machine, bottom up. This is eerily familiar to embedded and GUI programming where internal state matters and the program reacts to input. It might even be a better way to incorporate ad-hoc code that has side effects. However, most people think about programming languages top-down and use grammars to describe them.


as an aside, the parser for https://hyperscript.org is recursive descent, and our motto is: "remember what our parser professors taught us, then do the opposite of that."

There's some ugly stuff in there, but it's fun, all because we wrote it in recursive descent and can do whatever we darn well please.


I think the best way to understand it is that you're creating an AST (a tree describing the program), for a bottom up parser you are recognising and making little treelets and stitching them together into larger treelets until you reach the top/root, while for recursive descent you're building them from the top down from the root.

The main difference is that LR (bottom up) family grammars can describe more languages than LL (top down) ones can largely because bottom up has slightly more contextual information available when it puts stuff together (often small treelets rather than just the next token)


I took a compilers class at an ex-Polytechnic in the UK that taught recursive descent - I now wonder how old the textbook they were using was!


Yep that’s the way to do it. Teaching LL/LR and it’s variations is a complete waste of time.


I hope you are not making the same mistakes as your teacher at Stanford did (and leave students clueless).


i hope so too


Any resources you recommend for someone who is interested in learning about recursive descent?


Yes, the second part of crafting interpreters has a great introduction to the topic and is available online for free:

https://craftinginterpreters.com/contents.html


Thanks!


Get a lexer (like, just steal one if you don't think you'll be able to write one without getting bored, they're usually fairly self contained) and just starting writin' code. They're basically that simple.

The lovely thing about a nicely written recursive descent parser is that the code pretty much is the grammar (topologically), so any grammar you can actually imagine that can be parsed by recursive descent doesn't take much thought to make work.


IMVLE, a lexer is just a big regular expression, with a branch at the top level for each type of token. Are there cases where this doesn't work?


Python and Haskell are interesting because the indentation is crucial to figure out the correct nesting. Haskell allows to use curly braces and semicolons to completely remove indentation, so the lexer only has to add virtual tokens into the token stream. For Python, I don't know how it deals with it.

C/C++ have plenty of difficult syntax edge cases. For example, >> can represent a single operator in an expression or two successive > in a template declaration.


Python has some preprocssing to inject INDENT and DEDENT pseudo-tokens into the lexing stream:

https://docs.python.org/3/reference/lexical_analysis.html#in...


Some things require (or are more easily implemented) using state-based lexing, such as identifying escaped characters in JSON strings or XML attributes, or documentation comments. These can be thought of as a Finite State Machine (for the state) driving the different sub-lexer regex token matching.


I know it's not Reddit but username checks out :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: