Hacker Newsnew | past | comments | ask | show | jobs | submit | FjordWarden's commentslogin

This is like the difference between an orange and fruit juice. You can squeeze an orange to extract its juices, but that is not the only thing you can do with it, nor is it the only way to make fruit juice.

I use tree-sitter for developing a custom programming language, you still need an extra step to get from CST to AST, but the overall DevEx is much quicker that hand-rolling the parser.


Every time I get to sing Treesitters praise, I take the opportunity to. I love it so much. I've tried a bunch of parser generators, and the TS approach is so simple and so good that I'll probably never use anything else. The iteration speed lets me get into a zen-like state where I just think about syntax design, and I don't sweat the technical bits.

> extra step to get from CST to AST

Could you elaborate on what this involves? I'm also looking at using tree-sitter as a parser for a new language, possibly to support multiple syntaxes. I'm thinking of converting its parse trees to a common schema, that's the target language.

I guess I don't quite get the difference between a concrete and abstract syntax tree. Is it just that the former includes information that's irrelevant to the semantics of the language, like whitespace?


TS returns a tree with nodes, you walk the nodes with a visitor pattern. I've experimented with using tree-sitter queries for this, but for now not found this to be easier. Every syntax will have its own CST but it can target a general AST if you will. At the end they can both be represented as s-expressions and but you need rules to go from one flavour of syntax tree to the other.

AST is just CST minus range info and simplified/generalised lexical info (in most cases).


In this context you could say that CST -> AST is a normalization process. A CST might contain whitespace and comments, an AST almost certainly won't.

An example: in a CST `1 + 0x1 ` might be represented differently than `1 + 1`, but they could be equivalent in the AST. The same could be true for syntax sugar: `let [x,y] = arr;` and `let x = arr[0]; let y = arr[1];` could be the same after AST normalization.

You can see why having just the AST might not be enough for syntax highlighting.

As a side project I've been working on a simple programming language, where I use tree-sitter for the CST, but first normalize it to an AST before I do semantic analysis such as verifying references.


That's correct.

Yeah, you can even use tree-sitter to implement a language server, I've done this for a custom scripting language we use at work.

I've been using it for semantic chunking in RAG pipelines. Naive splitting is pretty rough for code, but tree-sitter lets you grab full functions or classes. It seems to give much better context quality and keeps token costs down since you aren't retrieving broken fragments.

N00b question: Language parsers gives me concrete information, like “com.foo.bar.Baz is defined here”. Does tree sitter do that or does it say “this file has a symbol declaration for Baz” and elsewhere for that file “there is a package statement for ‘com.foo.bar’” and then I have to figure that out?

You have to figure this out for yourself in most cases. Tree sitter does have a query language based on s-expressions, but it is more for questions like "give me all the nodes that are literals", and then you can, for example, render those with in single draw call. Tree sitter has incremental parsing, and queries can be fixed at a certain byte range.

Ah, I think I found the reason as to why WebAssembly (in a browser or some other sandboxed environment) is not a suitable substrate for near native performance. It is a very ironic reason: you can't implement a JIT compiler that targets WebAssembly in a sandbox running in WebAssembly. Sounds like an incredibly contrived thing to do but once speed is the goal then a copy-and-patch compiler is a valid strategy for implementing a interpreter or a modern graphics pipeline.

This is true. A multi-tier JIT-compiler requires writable execute memory and the ability to flush icache. Loading segments dynamically is nice and covers a lot of the ground, but it won't be a magic solution to dynamic languages like JavaScript. Modern WASM emulators already implement a full compiler, linker and JIT-compiler in one, almost starting to look like v8. I'm not sure if adding in-guest JIT support is going in the right direction.

> you can't implement a JIT compiler that targets WebAssembly in a sandbox running in WebAssembly

That's not completely true. With dynamic linking (now supported in WASIX), you can generate and link Wasm modules at runtime easily.


The CM DB group YT channel is good place to learn about the basics and advanced topics: https://www.youtube.com/@CMUDatabaseGroup


Challenge accepted, but no way I can finish this in 7 days even with a head start of a few months.


So you are not accepting the challenge then. Sounds like you are over thinking or over scoping the problem.


or just keenly aware of their own strengths, weaknesses, and time budget for this? the leap to “over scoping” is wild.


I was just meaning, that if you think you can't do it in the timeframe, then you are making it too big for yourself. The rules are so loose that you could literally make a programming language that has a single command `run_my_awesome_game()` and fully impliment the logic etc in your language and library of choice. Obviously a trivial/useless example, but take it up a few notches and you could have something interesting. A DSL inside JSON can be very powerful.


Polders have personhood in some jurisdictions. The government reclaimed the land from the sea, sold it to multiple people, levies taxes on them and now the dykes need to be maintained.

This is just legal fiction, technology developed and applied cross industry.

The mere concept of water rights implies obligations must lie someplace. All this talk about reified gods takes away much of how mundane the concept is.


Imagine if you spent those years building something else.


Yes, like renewable energy infrastructure (which China does, and would be highly useful anyway in case generative AI does live up to its promise).

Even if generative AI lives up to its hype, with current US administration there's no way America is going to lead the race for long. There's just not enough energy available, when those in power oppose developing many of the energy projects that make most economical sense.


After coming down from the shock of learning there are people like you I was even more amazed that one of the founding engineers of Pixar, and a giant in computer graphics, also has this condition. He even did a survey that found his artists where more likely to be on the aphantasia spectrum than managers. Dunno, maybe some people are so driven to create what they cannot think or see.


I’ve heard about that! My partner and I have both been learning to draw this year. I’m pretty decent at drawing observationally / from reference, but I haven’t tried much from memory. I imagine she’d be much better at that side of things. I’ve also noticed I’m not great at coming up with initial ideas or visual concepts, but once I have a topic or direction, I can absolutely run with it.

I also think it makes sense why a lot of software engineers (myself included) have aphantasia. Being “rational” is arguably easier when you’re not influenced by the emotional weight of images. Maybe we’re even less predisposed to PTSD, since we can’t visually relive things in the same way. My mind still races at night like anyone else’s, but it’s all non-visual. Just endless inner monologue instead of a reel of images. Couldn't count sheep if I tried!


From the paper:

Structured State Space Models and Mamba. Models like Mamba [Gu and Dao, 2023] can be in- terpreted within GWO as employing a sophisticated Path, Shape, and Weight. The Path is defined by a structured state-space recurrence, enabling it to model long-range dependencies efficiently. The Shape is causal (1D), processing information sequentially. Critically, the Weight function is highly dynamic and input- dependent, realized through selective state parameters that allow the model to focus on or forget information based on the context, creating an effective content-aware bottleneck for sequences.


> The irony, of course, is that if you've read this far, it may mean you’ve already mastered a rare skill: sustained attention in a world of distraction.

No, sorry I read the first and last sentence. This is why I like the short format more then the long forms, it often boils down to the same clever narrative trickery without waisting 3 hours of your life.


So you didn’t read that far then. You intentionally skipped it because you assumed to know the value. However by skipping the article you didn’t gain any value and hence why you’re in the comments section trying to “gotcha” the author of the article. You missed the point entirely and not as clever as you think.

It did not take me 3 hours to read that article.


Wow there is so much spacing after the "of" that I read it as "U.S. Department of space war"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: