Revy – proof-of-concept time-travel debugger for the Bevy game engine

hoten · on March 5, 2024

I work on a game engine (see profile) - I added a input recording/replay system that tracks a hash for each graphics frame for regression testing. We have ~17M frames of tests (80 hours of real playtime), and when something fails it generates a custom HTML report[1] where you can navigate the failing frames and some frames before/after for context. It's just graphical, so if the error isn't obvious you must delve into a debugger (and maybe wait ~10m for the engine to reach that point...), so I'd love to get something more powerful to assist in debugging.

rerun looks very interesting, I'll check it out to improve our debugging, thanks for sharing!

[1]

https://hoten.cc/tmp/compare-report-example/ , an artificial failure where I commented out the player drawing code

https://hoten.cc/tmp/compare-report-screen-draw-refactor/ , a report for reviewing a subtle refactor and intentional modification of the drawing order

SeanAnderson · on March 4, 2024

What's the performance of this like? It seems really appealing. I would love to be able to use it to debug https://github.com/MeoMix/symbiants because I use RNG heavily to add variance to the world and that, combined with indeterminate execution order of systems, can really leave me scratching my head sometimes.

However, I'm building using a tilemap that's 144x144. So I've got ~21000 entities to log. It seems impractical to snapshot the world every tick, but maybe if it were able to snapshot deltas or something?

teh_cmc · on March 4, 2024

Revy already works with snapshot deltas (see other comments scattered around this section for more details, but basically we only sync components that changed during the previous frame -- Rerun stitches everything back together at runtime)... but at 21k entities, I'm afraid you'll be facing much bigger issues on the Rerun-side of things :D

Rerun was originally designed for few (i.e. dozens up to hundreds) massive entities (e.g. it's common for a single entity to have a few million 3D points and color values attached to it).

While we're slowly working towards improving the many-entities use-case, the correct thing to do in this case would probably be for Revy to identify that all these entities are really just different instances of the same batch (either automatically, or by exposing a marker component or something).

So, say, you'd set a marker component on all your tiles, Revy would then snapshot them as a single batch of 144^2 instances, and then in Rerun you'd see a single entity `/tiles` which would be a batch of 144^2 instances (each with their own set of components, that's fine!). From Rerun's point-of-view, this would be similar to a point cloud, and at 21k instances you'd be easily running at your monitor refresh rate with a lot of margin.

But by any means, try it! Not the web version though, you're definitely going to need multithreading :D

Nice project btw; I'll keep an eye on it and probably use it as a benchmark for the many-entities use-case!

SeanAnderson · on March 4, 2024

Thanks for the response! :) Great to hear deltas work. Yeah, sounds like it's the sort of thing that would need to run natively until multithreading is supported in the web.

Veserv · on March 4, 2024

Is a generic time travel debugging solution too much overhead? A good multithreaded time travel debugger (not deterministic replay based) should only incur ~100% overhead in the memory bandwidth bound case. If you are not saturating your memory bus without instrumentation then the overhead should be proportionally less.

SeanAnderson · on March 4, 2024

Nah, that'd probably work, too. I think the key here is multithreading. I do most of my development in a WASM context where Bevy doesn't support multithreading yet. I switch to native debugging when I want breakpoints (or in this case, when I'd want multithreading).

It's not the greatest workflow to default to WASM, but it makes it easier to treat web as a first-class development target. Still not sure that's worthwhile overall, but giving it a shot for now.

Veserv · on March 4, 2024

Wait, which is the hard one you wish you had time travel debugging on, the single threaded WASM context or the multithreaded native context?

The multithreaded native context is the one that is harder in principle, but should only incur ~100% overhead for any program including ones not using Bevy. Though I do not know about the general availability of these products in your field.

A single-threaded context is vastly simpler and can be done with similar overhead without platform support or ~1-10% overhead with platform support. Though I do not know is anybody has implemented efficient WASM support or if anybody with efficient multithreading implementations has ported to WASM.

Likely the only available ones are the inefficient 1,000% overhead or the hilariously bad 100,000% overhead ones like the default gdb implementation. To be fair, these implementations are much easier to write. Even ~100% overhead in the single-threaded case is more common amongst extant solutions since getting down to ~10% requires some serious optimization. Still should be perfectly adequate for development work.

SeanAnderson · on March 4, 2024

Sounds like you know a lot more about this area than I :)

I would like an efficient way of time travelling in a single threaded context.

As you describe it, it makes sense that supporting multithreading would make the problem space much more challenging to navigate. I wasn't thinking about that, but it's clear once you point it out. I was just considering the overhead of maintaining the undo state without being able to delegate it to a separate thread.

As OP mentions, they use change detection to calculate/store deltas, but Bevy's ECS change detection isn't very performant. You still have to iterate over all components and check a component's value to learn changed state rather than being able to filter on a `Changed` archetype. It kind of makes sense, though, because adding/removing Changed components from tons of entities every tick would also be expensive. Either way, change detection feels like a sore spot when working with tons of entities in ECS. I'm not super confident there's a way around that without manually maintaining some data structures outside of the ECS paradigm, but was thinking that if I could at least run the change detection on a separate thread that it might be tolerable.

Veserv · on March 4, 2024

If you are okay with single-threaded Linux native as a debug platform (i.e. you have a build that you reproduce bugs on) then you can probably use rr. undo.io has also been in the field for a long time. I hear they can also do multithreaded Linux native in some capacity as well. One of the people from undo frequently pops into time travel debugging threads when they appear, so they could give you more info if they drop by.

If you are on Windows, Microsoft has some form of time travel debugging, but I am pretty sure they do a instrumented emulator which is a 10-20x slowdown approach. I do not know of anything else on Windows.

The only efficient multithreaded time travel debugging I am aware of is all in the embedded field, so unlikely to be applicable. Most of the “multithreading” solutions otherwise available work by serializing your execution to a single thread, so they do not really count. Maybe there is something else out there, but not really sure.

roca · on March 5, 2024

rr and Undo are about the same here: they support multiple threads but run all threads on a single core.

Veserv · on March 5, 2024

Hm, thought they did more than standard replay. Do you know anybody outside of embedded that can do true multithreaded time travel debugging? I do not keep too much up to date on Linux native solutions. Most of the new ones I see are either just wrappers around rr or really hacky replay-based and should really just be wrappers around rr.

roca · on March 11, 2024

Microsoft's TTD handles multiple threads on multiple cores, because they instrument reads. But that's high overhead.

I'd like to know more about what the embedded tools do. I see SourcePoint supports multicore x86, but the data volume looks high so I suspect they can't handle workloads that run for minutes or hours. Also it's not clear to me which of these tools can reproduce the state of memory at every point in the past.

Basically I don't think you can do low-overhead record and replay without hardware support like Intel's QuickRec project. "Capture all memory traffic" is not going to scale.

LarsDu88 · on March 4, 2024

This kind of reminds me of the article: https://spacetimedb.com/blog/databases-and-data-oriented-des...

Where basically the ECS boils down to what is essentially a relational database, and here it looks like that's being leveraged to do snapshotting and point-in-time queries!

teh_cmc · on March 4, 2024

Oh for sure, there's a lot of overlap between traditional relational databases and ECS designs. As always, in the end the hard part is to match the performance requirements.

If you squint enough, most ECS out there are pretty much very specialized relational databases that trade off flexibility in favor of performance for common gamedev use cases (very wide joins, very deep hierarchies (e.g. transform trees), full-table filters, etc).

Rerun's ECS goes one step further and makes time a first-class citizen, allowing for efficient joins across different components across different timestamps.

This is what makes it possible to only log diffs in Revy (we only snapshot the components that were modified during the last frame), rather than having to full snapshots every frame, which would be prohibitively expensive (both time and space). Rerun then stitches back everything together during visualization, in real-time!

tbillington · on March 5, 2024

This is another excellent read https://ajmmertens.medium.com/why-it-is-time-to-start-thinki...

teh_cmc · on March 4, 2024

Author here; we had some fun building this last week.

Feel free to ask me anything!

ordinaryradical · on March 4, 2024

Given that Bevy’s systems scheduler is nondeterministic (for everything not explicitly ordered), do you foresee issues coming from that? Or does this approach sidestep that as an issue?

teh_cmc · on March 4, 2024

Revy is frame-based: it runs as the last system at the end of the frame, with exclusive access to the `World`, and synchronizes the state of the Bevy database with the state of the Rerun database at that point in time (it keeps track of 3 timelines during that process: the wall-clock time given by the OS, and the frame number and simulation time given by Bevy itself).

So non-deterministic scheduling is just not an issue by default.

You could of course access the Revy logger from any system (it's just a `Resource` after all) and log arbitrary data to Rerun from there (the resource is basically a handle to the Rerun SDK). This still wouldn't be a problem. The data would once again be logged to the 3 same timelines (wall-clock, frame number and sim_time) and you would be able to visualize in which order the different systems doing the logging were scheduled during each frame.

tbillington · on March 5, 2024

Basing the recording off it's 24.6s runtime and 3.58MB download size it comes out about 145KB/s which is honestly really decent.

What format is it stored as (eg protobuf etc?) Is Rerun doing compression on the "raw" game data to achieve that?

Also had a good laugh at the cranky job in bacon.toml, might steal that :D

I also noticed the elements in the recording were all clickable, is that a Rerun feature, and did you have to manually reconstruct all the Bevy elements in a Rerun specific format?

Just curious how long this took you to make?

Super cool demo btw :) do you have more demos/PoC/examples listed somewhere I could peruse? Cheers

EDIT: Can your twitter post this so I can retweet :D

Tycho87 · on March 5, 2024

We posted it from the Rerun account today: https://twitter.com/rerundotio/status/1765031236492259751

mysterydip · on March 4, 2024

With Bevy being in early development still, are you worried about frequent maintenance to fix breaking changes?

teh_cmc · on March 4, 2024

As mentioned in the README, Revy is not meant to be a polished / properly maintained project -- it's just a proof-of-concept. I've talked more about how and why it came to exist in the first place in this thread [1], if you're interested.

That being said, I do intend to publish updates when new versions of either Rerun or Bevy land; if only to experiment with new APIs as they come online.

Now, to answer your question, I've been using Bevy since the 0.1 release and, in my experience, keeping up with the changes upstream has always been pretty painless. Their organization nand release process is top-notch, with some of the most high quality changelogs and migration guides I've ever seen in any project, and releases are rare enough (~about once a quarter) to just not be an issue.

The community maintains compatibility matrices such as this one [2], and things generally just work :tm:.

[1] https://www.reddit.com/r/rust/comments/1b6bqv1/revy_proofofc...

[2] https://github.com/rerun-io/revy?tab=readme-ov-file#compatib...

indigochill · on March 4, 2024

I'd guess from "It is not a full-fledged, properly maintained thing" in the README, probably not.

diggan · on March 4, 2024

This is really awesome! I recently picked up Bevy and Rust to resume my attempt at making games and hopefully publishing something worthwhile. This is something that I felt was missing since day 2 of learning Bevy.

My own personal workaround have been to dump "user actions" to a ndjson file, which I can load at runtime when I want a "replay" but obviously missing being able to move forward/backwards, it just plays the actions.

Would love to see it working with bevy_xpdb, although I'm not sure how deterministic it is and if that gets in the way (I assume so?), it does have a `enhanced-determinism` flag that says "Enables increased determinism", but the lack of "complete/full determinism" terms doesn't give me a lot of hope.

teh_cmc · on March 4, 2024

Whether the physics engine is deterministic or not doesn't matter here -- Revy (and more importantly, Rerun) doesn't replay anything: it just stores state, every single frame, and then visualizes that state at every timestamp available.

Check out e.g. the live demo of the breakout example for example [1]: if you click on the pallet and then go to its parent node, you'll see that we just store that node's final transform (i.e. post-physics) every frame.

Happy gamedev!

[1] https://app.rerun.io/version/0.14.1/index.html?url=https://s...

3836293648 · on March 5, 2024

Well, floating point isn't deterministic in its rounding on a per-op scale. And doing tonnes and tonnes of floating point operations isn't going to help that. The only way to get deterministic floating point is to start tracking state at boot and not have a scheduler switch away and mess with FPU state while your program isn't in control.

anthk · on March 4, 2024

'TIme travel'. Ah, capturing the state and rolling back. Something a Z-Machine interpreter had 40 years ago I think, if not more, with the 'undo' command at the prompt :D.

One day the OS' shells will have an undo command for everything, but they will waste tons of CPU cycles. And not by virtualizing. Altough if you run your OS under a light hypervisor such as xen, that funcionality might be able to be called from the userland and some kernel driver/hardware hook. Who knows.

mathteddybear · on March 4, 2024

Bill Lewis, I presume, called it more or less like that

https://arxiv.org/abs/cs/0310016

https://www.youtube.com/watch?v=xpI8hIgOyko

Soon later, "debugging backwards in time" morphed into "time-travel debugging"

Veserv · on March 4, 2024

No, time travel debugging is almost certainly a root, but comes from a different lineage.

https://jakob.engbloms.se/archives/1564

The Green Hills Software Time Machine product for time travel debugging was commercially available by September 2003 [1] which is at least contemporaneous with that paper by Bill Lewis (i.e. terminology could not have been derived from it).

Given the alternative terminology frequently used for the technology up to and after that point such as bidirectional, reverse, reversible, omniscient, replay, record-replay, etc. time-travel debugging as a term almost certainly originates/was popularized by Time Machine as the first successful time travel product (yes, I see the Lauterbach CTS is listed as existing first, but it was not commercially distinguished and successful and obviously has no terminology lineage).

[1] https://www.ghs.com/news/20030930_best_of_show.html

roca · on March 5, 2024

There's a clear difference between "omniscient" debugging and "time travel" debugging. The latter almost always refers to debuggers that let you move backwards in time but only let you access state at the "current point in time", and moving forward or backward has a noticeable cost. Omniscient debuggers (e.g. Pernosco or Lewis' work) give you approximately instant access to states at different points in time.

Veserv · on March 5, 2024

No, what you call “omniscient debugging” is what Time Machine does and has always done. Instant temporal random access, call stacks over time with click to seek, variable graphing over time, etc. Given that is almost certainly where “time-travel debugging” came from, that seems more like people copying it poorly and watering down the phrase.

Unless you mean something really narrow like clicking a variable and it showing you every previous value of it in one screen? I am not sure if it has that built-in. But really that is just visualization layer stuff. The recorded log has everything you need to reconstruct every past state, so you can get anything missing from the built-ins by just querying the log directly.

cmrdporcupine · on March 4, 2024

Does Revy depend just on the ECS crate, or does it bring in other parts of Bevy? I see a blanket dep onto bevy, but is it really using more than the ECS? I like the idea. I might try it out.

I've been playing with Bevy the last couple weeks, and in general from my first impression I'd have to say that the bevy_ecs crate seems more mature than the rest of it. It's not a bad ECS framework, and actually quite useful independent of Bevy itself. I'd like it if they cleaned up their crates deps a bit, but it's pretty good standalone and not just for games, but for any concurrent data driven application.

ECS has weird nomenclature when viewed outside of the games industry. What it really has if you pan out, is queries and binary relations/tables/facts/properties, but calls them 'systems' and 'components'. "Components" outside of games & ECS usually means something else, so it's a bit of a head scratcher at first.

I think if you dig past the surface what you actually have is a high performance version of what we used to call "tuple spaces", a good model for managing state in parallel data-driven applications, esp where there's lots and lots of bits of state (e.g. vehicle autonomy with vision detection, or robotics, etc.)

rcxdude · on March 5, 2024

I think that impression reflects on the priorities/ordering that the Bevy team has. They seem to have focused on building a very solid foundation of an ECS framework before focusing elsewhere, so the lower-level stuff is going to be more polished than the higher-level stuff (which I think they have now moved their focus onto)

tbillington · on March 5, 2024

> cleaned up their crates deps a bit

What specifically do you mean by that, as in it includes unnecessary dependencies?

Flecs is also a great library to check out if you want more relations/power.

jasonjmcghee · on March 4, 2024

So. Freaking. Cool.

Awesome stuff.

teh_cmc · on March 4, 2024

Thanks!