Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Arena Allocation in SBCL (github.com/sbcl)
215 points by tmtvl on Oct 28, 2023 | hide | past | favorite | 32 comments


I've been meaning to look into it, so these are mostly notes, that others might find useful,

arenas are x86_64 only, not sure how involved are ports, but there's at least a VOP, that as of right now exists only on x86_64. (the build obviously fails on other systems)

I'm trying it on m1, so I can build with

    arch -arch x86_64 sh make.sh --with-system-tlabs
and then run with

    arch -arch x86_64 sh run-sbcl.sh
it's pretty raw, so for example allocating object larger than arena results in ldb. there's a lot of sample code in tests/arena.impure.lisp

and the simplest possible test code just to see what's up,

    (gc :full t)
    (room)
    ;;;    7,084,208 bytes for  36,307 simple-vector objects
    (progn (make-array 10000000) (values))
    (room)
    ;;    86,991,136 bytes for  36,080 simple-vector objects
    (gc :full t)  
    ;;     6,847,824 bytes for  35,865 simple-vector objects
    (use-package :sb-vm)
    (defvar a (new-arena 100000000))
    (with-arena (a) (make-array 10000000) (values))
    (room)
    ;;     7,050,080 bytes for  36,233 simple-vector objects
well, so at least we know the array is put somewhere. finally one calls (destroy-arena a), but you can still access the data, so presumably there's no checking involved, and being undisciplined about retaining arena pointers will create all kinds of interesting bugs.

neat!


This is always the tradeoff with arenas. Either the runtime keeps it safe and postpones actually destroying the arena if there are references leading to it. Or it trusts the programmer, which will eventually result in fun late-night debugging sessions. TA indicates that SBCL is rather of the former variety since it describes debugging tools to find references leading from the heap to the arena.

I guess it is best practice to bury arena management inside some sort of framework* and don't fiddle with them in business code.

*: Edit: more likely, a macro


Oh man, we’ve needed a rock-solid arena model in C++ forever. There’s stuff but it really needs compiler support to be clean.

Jai just seems to be going from strength to strength with arena-by-default.

The key insight is that a modern malloc isn’t terribly different from a modern GC: people seem to vastly overstate the difference IMHO.

Arenas are like a bumping nursery but better. I hope we keep seeing more of this.


I don't see why a language would need special support for arenas, unless it is GCed. It's a simpler alloction model than malloc, and C++ doesn't really have builtin support for malloc either, it just calls free() in a bunch of destructors in the STL.


In fairness std::allocator has always been a wreck, and in fairness RAII let’s you do a fairly low friction arena: but you run to the same set of problems that the C folks have with things as common as doubly linked lists. There are plenty of good ones, but every project has a different one.

Jai just throws you a bump-allocated heap arena with a (tunable) default bigger than 8MB that you can hit with idiomatic syntax in every basic block.

None of this is impossible in C or C++ or Rust or Java: but idiomatic defaults and notation matter.

Most per-method allocations should be pointer bumped in an arena (we’re already paying for page tables and a TLB, might as well get our money’s worth), and the languages just don’t encourage it.


> In fairness std::allocator has always been a wreck

std::allocator is just fine - it's no more and no less than a wrapper around malloc/free and that's what it really only needs to be. If you need a different allocation scheme that your workload is going to benefit from: (1) you first need to come up with it by finding out what scheme is going to benefit you, (2) you then implement it through a custom allocator.

> Most per-method allocations should be pointer bumped in an arena ... and the languages just don’t encourage it.

System malloc implementation will do that by themselves whenever they see fit. And this is not the programming language problem to begin with at all.


Pascal in its various variants had support for arenas already into the language.

C++ new isn't required to call malloc()/free() per ISO C++, underlying OS APIs can be called instead, there are the whole allocator infrastrcture, and placement new as well.


I'm guessing this is something they're finding useful at Google? Since Dougk is at Google.


Yeah, it's even an option in the proto IDL to generate C++ code that uses arena allocation. So this is probably for protos, in particular.

I believe ITA Software's QPX (which now forms a major part of Google Flights' backend) uses CL.


The heavy usage of arenas for protobuf RPC allocations at Google is also I think the reason why they were experimented with in Go


Does Google use SBCL or lisp in general? News to me. I thought maybe they were just porting the arena paradigm from c++ to sbcl


Google Flights came about, in particular, from the acquisition of ITA Software which was a Common Lisp shop. Much of that system is, reportedly, still in Common Lisp.


Dougk is one of the SBCL maintainers. I think they inherited its use from ITA Software.


Are arenas internal to sbcl only or do user programs have access? The current sbcl manual (v2.3.9) doesn't mention arenas.


Based on the commit message [0], and the references to "user code" in this document, my guess is that user programs have or will have access, but it's not finalized enough to be documented.

That being said, I suppose if you're developing an internal API for a compiler/interpreter, your "users" could be other parts of the project rather than language users.

https://github.com/sbcl/sbcl/commit/7f65522a16d857e41aa61cd0...


An interesting idea for sure. I spent too much time worrying about fragmentation on Linux and ended up with jemalloc (based on Firefox not the default).

I also had a local thread based pool I hand wrote for fixed sized objects that were generated en masse.

I did look at the boost library pools too.


There's not much context here that explains why this is noteworthy. Arena allocation by itself isn't anything to write home about- is anyone able to explain why this is special?


The standard that you're describing is not really the standard that most HN'ers hold submissions to. The guidelines[1] suggest the following:

> On-Topic: Anything that good hackers would find interesting. That includes more than hacking and startups. If you had to reduce it to a sentence, the answer might be: anything that gratifies one's intellectual curiosity.

1 = https://news.ycombinator.com/newsguidelines.html


someone might find a boots-on-the-ground, in-context treatment of arena allocation particularly interesting, moreso than just reading about it in the abstract on wikipedia

someone might have never heard of anything of the sort

someone might be interested to see lispers present a use case where they care about reaching down to the finer points of memory allocation in a language that typically benefits from being above that; see also the wealth of info on the garbage collector in the ocaml documentation

the number of users on this website who might get some sort of kick out of this link is nonzero which is the generally agreed-upon criterion for being worth posting


Based on the number of posts I’ve seen about arena allocators on here the last month or so, I’d say the audience is much higher than nonzero.


I’m curious, what’s higher than nonzero? Since any positive, finite number is nonzero, are we looking at an ℵ₀ audience?


I knew there'd be at least one pedant! And I almost altered what I wrote to pre-satisfy you!

Begone pedant, and feast thine eyes on the surreal numbers [0]!

> In mathematics, the surreal number system is a totally ordered proper class containing not only the real numbers but also infinite and infinitesimal numbers, respectively larger or smaller in absolute value than any positive real number.

(Also, go look at the `fil` unit in TeX for a good practical application of this ;).)

[0] https://en.wikipedia.org/wiki/Surreal_number


Region-based allocation by itself is definitely something to write about.

https://en.m.wikipedia.org/wiki/Region-based_memory_manageme...

The SBCL implementation is a very self-documented example, which is helpful if you wish to make arena-based allocation available in your language.


That Wikipedia article is confusing to me, because I’m really used to “region” being used to refer to type-system-level allocation tracking, Tofte&Talpin style (even if the T&T hope of inferring everything didn’t really work out, and the Rust tradition decided to rename the concept to “lifetimes”). “Arena” seems to be a more common term for a purely runtime construct.


i'm not sure how it's going to work in CL application code, but if you can re-use fixed memory and avoid runtime allocation with it like you can in C, it seems like it would be really nice for lisp game development, where avoiding GC pauses can be unintuitive or annoying


re-using memory is old trick in Common Lisp application code.

This is more "create a separate arena where we can allocate a ton of stuff and then drop it without actually going through collector".

In fact, this is very similar to features provided in at least Symbolics Genera, which provided developer-created arenas as well.


there have been other posts related to Arena based allocation on hn recently and I was grateful because it wasn’t something I was aware of. I found this interesting - I see this post is part of a trend.


Right, how is this top of HN? Something is off, as there is nothing newsworthy about this. If I submitted to HN a random documentation page from LWN or Linux kernel, it would surely not be #1.


There's nothing edifying about yucking other people's yums. If many others are upvoting something, and you don't see why it's interesting, maybe it's OK for the situation to be as simple as that: it's interesting to others even though it isn't interesting to you.

Also, there is absolutely no requirement that HN submissions all qualify as newsworthy. As others have pointed out in this thread, the gist of what's considered appropriate is quite simple and broad: "Anything that good hackers would find interesting."


> yucking other people's yums

that is gross please do not refer to sbcl memory management features as 'yums'


> as there is nothing newsworthy about this

Since when do HN submission have to be "newsworthy"? People regularly post old articles.

> If I submitted to HN a random documentation page from LWN or Linux kernel, it would surely not be #1.

If that page is particularly interesting and you're lucky, why not?


LWN write-ups on fairly niche subjects are regularly featured on HN front page.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: