PythonBPF – Writing eBPF Programs in Pure Python

alexgartrell · 2025-09-15T08:22:21 1757924541

I did something similar a long time ago https://github.com/facebookresearch/py2bpf

It was definitely a toy, I transliterated from python bytecode (a stack based vm) into bpf. I also wrote the full code gen stack myself (bpf was simpler back then)

But using llvm and not marrying things to cpython implementation makes this approach way better

varunrmallya · 2025-09-15T14:59:28 1757948368

Thank you! Ours is a toy for now as well, but I think the idea is pretty good, so we'll continue to work on it. (This was actually a hackathon project, so the code is pretty messy and not something I am proud of)

farnulfo · 2025-09-15T09:34:06 1757928846

For java, Johannes Bechberger has made a lot of articles about writing eBPF in java : https://mostlynerdless.de/blog/2023/12/31/hello-ebpf-develop... https://mostlynerdless.de/blog/category/computer-science/ebp...

xbar · 2025-09-15T14:42:13 1757947333

I missed these last year. Finding them now is truly very useful in my work.

indigo945 · 2025-09-15T08:45:42 1757925942

The "How it works under the hood" section raises more question than it answers. What is the difference between step 3 and step 4? As described, step 3 goes from LLVM IR to BPF (via llc), and step 4 - goes from LLVM IR to eBPF bytecode? That's nonsensical.

varunrmallya · 2025-09-15T14:54:43 1757948083

I'm the co-author.The code is in a very very bad state right now, but the architecture is pretty ok to explain. In step 3, we translate from the Python frontend to the LLVM IR. In step 4 we compile it down to an object file using the LLVM backend `llc`. This object file gets loaded into the kernel and it is what actually contains the eBPF bytecode.

indigo945 · 2025-09-15T15:07:46 1757948866

You may want to edit the blog post, then, because that's not what it says.

bieganski · 2025-09-15T12:27:00 1757939220

that's really cool. to gain traction i would start with reimplementing all the tools from https://github.com/iovisor/bcc/tree/master/libbpf-tools in PythonBPF.

varunrmallya · 2025-09-15T14:59:49 1757948389

This is actually what we plan to do too!

the_duke · 2025-09-15T09:02:51 1757926971

So this is a "inline" Python to eBPF transpiler/compiler.

Which is cool!

But the description could be a bit clearer.

pimterry · 2025-09-15T10:00:13 1757930413

Does anybody know if something similar exists for Node.js? I'd love to be able to integrate BPF into some of my Node projects with the same kind of approach.

lloydatkinson · 2025-09-15T10:28:21 1757932101

Please no...

ranger_danger · 2025-09-15T20:26:08 1757967968

Dear god... why

drivenextfunc · 2025-09-15T10:32:58 1757932378

Writing C for eBPF is cumbersome and you'd like to avoid it. Okay, that's reasonable. But I don't think it would be a good idea to write a compiler that emits eBPF binary from (a tiny subset of) Python. Why not just write code in pseudo-Python (or whatever language you're comfortable with) and have it translated by an LLM, and paste it in the source code? That would be much better because there would be fewer layers and a significant reduction in runtime cost.

tecleandor · 2025-09-15T10:41:46 1757932906

I don't understand...

So, instead of having a defined and documented subset of Python that compiles to eBPF in a deterministic way... use an undefined pseudo language and let the LLM have fun with it without understanding if the result C is correct?

What would be the advantage?

drivenextfunc · 2025-09-15T10:59:00 1757933940

The behavior of CPython and a few other implementations of Python (such as PyPy) is well documented and well understood. The semantics of the tiny subset of Python that this Python-to-eBPF compiler understands is not. For example, inferring from the fact that it statically compiles Python-ish AST to LLVM IR, you can have a rough idea that dynamic elements of Python semantics are unlikely to be compiled, but you cannot know exactly which elements without carefully reading the documentation or source code of the compiler. You can guess globals() or locals() won't work, maybe .__dict__ won't as well, but how about type() or isinstance()? You don't know without digging into the documentation (which may be lacking), because the subset of Python this compiler understands is rather arbitrary.

And also, having an LLM translate Python-ish pseudo code into C does not imply that you cannot examine it before putting it into a program. You can manually review it and make modifications as you want. It just reduces time spent compared with writing C code by hand.

tecleandor · 2025-09-15T13:59:49 1757944789

But then we have to write the pseudocode anyway (that cannot be corrected by my IDE, so I don't know if I have pseudomistakes [sorry for the pun]), the LLM 'transpile' (that's not understood at all), and you have to review the C code anyway, so you have to know eBPF code really well.

Would that represent a time advantage?

Twirrim · 2025-09-15T16:58:34 1757955514

Are you seriously asking why someone might want to do something guaranteed to behave exactly as they defined it, when they could have an LLM hallucinate code that touches the core of their system, instead?

Why would anyone go with the inaccurate option?

otabdeveloper4 · 2025-09-15T13:10:12 1757941812

LLMs will never be able to write eBPF code.

eBPF is a weird, formally validated secure subset of C. No "normal" C program will ever pass the eBPF validation checks.

nickysielicki · 2025-09-15T13:20:22 1757942422

LLMs can easily already write eBPF code. Try it.

otabdeveloper4 · 2025-09-15T17:38:02 1757957882

> tell me how you never actually developed an eBPF program without telling me you never actually developed an eBPF program

nickysielicki · 2025-09-16T00:49:53 1757983793

Just try it. Here’s an example that I know it will work flawlessly for, because I used it for this: at $formerjob, all laptops come with a piece of malware called “connections”, which obnoxiously pops up at some point during the day (stealing window/mouse focus) and asks you some asinine survey question about morale on your team and/or the company values. There are a few good ways to solve this: apparmor/selinux (but this runs the risk of your config file conflicting with the rules shipped by IT), a simple bash script that runs pkill every 5 seconds (too slow and it still steals focus, too fast and your laptop fans start spinning), etc. A better way is to use a bpf hook on execve.

Ask an LLM to write a simple ebpf program which kills any program with a specific name/path. Even crappy local models can handle this with ease.

If you’re talking about more complicated map-based programs, you’re probably right that it will struggle a bit, but it will still figure it out. The eBPF api is not very different than any other C api at the end of the day. It will do fine without the standard library, if you ask it to.

otabdeveloper4 · 2025-09-16T08:36:29 1758011789

By eBPF I mean things like XDP network filters.

The issue here is the static formal validation the kernel does before loading your eBPF program.

(Even humans don't really know how it works. You need to use specific byte width types and access memory in specific patterns or the validation will fail.)

nickysielicki · 2025-09-16T14:38:58 1758033538

Respectfully, you don’t know what you’re talking about.

1. If you meant XDP, you should have said XDP, not eBPF.

2. The kernel does that validation on all ebpf code that it loads, regardless of whether XDP is involved.

3. Humans know how it works.

vrighter · 2025-09-15T13:49:18 1757944158

"translated by an llm"

smh my head

setheron · 2025-09-16T04:26:39 1757996799

I had similar idea ! Loved seeing it here. We thought about doing it with ChocoPy to make the types more consistent.

atoav · 2025-09-15T07:53:15 1757922795

Looks cool, I like the use of decorators as a means to use essentially turn python into some sort of DSL.

One nitpick: Please include a paragraph/section/infobox explaining what eBPF is and what problems should be solved using it. I am a huge fan of making our tech world more accessible and as such we should think to some degree about people who don't know every acronym.

varunrmallya · 2025-09-15T14:57:40 1757948260

To be honest, this was really a hackathon project. The code quality is very very bad right now. We will be continuing to work on this to make it much better and we'll be adding documentation as we go as well. Thanks for taking a look :)

njharman · 2025-09-15T08:19:17 1757924357

Putting tldr; at the bottom defeats purpose of tldr.

Guessing this is BPF https://en.wikipedia.org/wiki/Berkeley_Packet_Filter But, reader shouldn't have to guess. That is the link that should be in your Introduction. Just after tldr;

indigo945 · 2025-09-15T08:43:22 1757925802

Not the original BPF, but its successor in the Linux kernel called eBPF [1]. eBPF's virtual machine has additional registers, and crucially, eBPF programs can make some syscalls, which BPF programs can't.

[1]: https://lwn.net/Articles/740157/

robertlagrant · 2025-09-15T16:16:00 1757952960

Step 1: import numpy

grantseltzer · 2025-09-15T11:16:23 1757934983

bcc hasn't been relevant for years.

_bobm · 2025-09-15T13:00:50 1757941250

I have been a bit out of the loop. what is relevant these days for writing ebpf code? what about ebpf code in python?

grantseltzer · 2025-09-15T14:13:03 1757945583

Writing it in C, compiling with clang, and loading with either C(libbpf), Go (cilium/ebpf), or Rust (Aya).

You can also write bpf in rust with Aya but i'm not sure how feature complete it is.

For very simple use cases you can just bpftrace.

nickysielicki · 2025-09-15T13:28:07 1757942887

bpftrace is nicer to work with and can replace bcc in most cases for debugging.