SimpleVisor – Intel VT-x hypervisor in 500 lines of C code

userbinator · on May 20, 2021

A V86 hypervisor is relatively simpler, and a lot of DOS utilities in the late 80s throughout the 90s took advantage of V86 mode, like the (in)famous EMM386 and debuggers like 386SWAT and CUP386. It is a little-known (or perhaps known but little-appreciated) fact that Windows 3.x Enhanced Mode and Win9x are actually architecturally based on a V86 hypervisor and "library OS" concept, instead of a traditional OS like the various Unices, Linux, and the NT line. They also perform "hyperjacking" of the DOS environment at boot time.

I do wish Intel had simply extended V86 mode to a "V386" mode, as this interesting discussion suggests, instead of adding completely new and different instructions and data structures: http://www.os2museum.com/wp/an-old-idea-x86-hardware-virtual...

all while containing the ability to run on every recent version of 64-bit Windows

The presence of that one little extra word "on" changes everything...

anyfoo · on May 21, 2021

The article also suggests that the idea was in summary a bit more akin to VT-x than to vm86, but saving things in the TSS instead of a new VMCS. A fully straightforward extension of vm86 to "vm386" wouldn't work, because vm86 mostly got away with its simplicity due to the fact that 16bit x86 did not know about segment descriptors, TSSs, and all that jazz.

And in the end, long mode finally got rid of most of the segmentation, "hardware" tasks including TSSs, and other cruft in x86 that had long been abandoned (the latter at least already being regarded as a bad idea during the design of the 386). So it's probably a good thing that some "vm386" mode didn't require keeping more of that around.

KMag · on May 21, 2021

I agree that is was good the designers of the 386 got rid of some cruft in designing 386 protected mode. It's unfortunate they didn't lean harder into BIOS calls to hide implementation details (such as allowing only the BIOS to manipulate page tables, and enforcing this by having the BIOS be the only ring0 component), effectively making the BIOS a hypervisor that only supported one guest kernel at a time. This is what the DEC Alpha did with its PALCode, and would have given a smaller interface that needed to be maintained in the future.

It would have made writing a BIOS more complicated, but would have made future kernel and hardware work more simple. It would also have meant all 32-bit and 64-bit kernels would have been paravirtualized from day 1, greatly reducing the need for hardware emulation in multi-guest hypervisors.

Of course, hindsight is 20/20, and it a business risk to depend so much on BIOS implementors unless you provide them a robust liberally licensed reference implementation to tailor to their hardware.

bpye · on May 21, 2021

Would the perf cost of such an abstraction have been acceptable on early 386 hardware? I lean towards no but that is just a feeling without data…

anyfoo · on May 21, 2021

Yeah, it sounds like KMag essentially wanted the BIOS to be a sort of microkernel, with other kernels being implemented under it. Beyond performance concerns, given the PC's history I'm also not sure whether e.g. Linux developers (which initially was, well, Linus) would have been happy with whatever microkernel and API had been given there.

anyfoo · on May 21, 2021

Oh, I actually meant that long mode, i.e. 64bit x86, got rid of a lot of 386 (and 286) cruft.

anyfoo · on May 20, 2021

Kudos, VT-x is full of funny little corner cases. The surface is just immense.

But if you read this having planned to write a hypervisor yourself, don't be discouraged: IIRC a lot of that complexity and corner cases aren't that important if you don't focus too much on performance at first (which is easily done for getting something to work, and addressable later), or just not implement some features at all (which gets you far enough).

google234123 · on May 21, 2021

I would guess that this implementation probably also suffers from some of these corner cases. The Intel manual is simply too big :P

Jedd · on May 21, 2021

The licence reference in the README points to a 2016 copyright date - marking a decade since Rusty Russell released his proof of concept hypervisor 'lguest', which was quickly re-dubbed the RustyVisor.

http://lhype.ozlabs.org/

eatbitseveryday · on May 21, 2021

Note, a possible reason for the low line count:

> SimpleVisor does not do any such error checking, validation, and exception handling. It is not robust software designed for production use, but rather a reference code base.

lmilcin · on May 21, 2021

I don't know by which measure the 500 lines were calculated, even excluding the comments and empty lines it seems to have much more code than that.

This one says 1.1k: http://line-count.herokuapp.com/ionescu007/SimpleVisor

And this says 2.3k: https://codetabs.com/count-loc/count-loc-online.html

owenfi · on May 21, 2021

Would a hypervisor like this allow running 2+ operating systems simultaneously? Or am I misunderstanding the premise?(I see the target of this is mainly security testing)

alksjdalkj · on May 21, 2021

Probably not, I think that would require a lot of complexity that I assume this doesn't implement (e.g. resource partitioning and time sharing).

I think the premise is just that there's not a lot of simple hypervisors available for learning. VT-x is on its own a lot to comprehend so this lets you not worry about code complexity and focus on understanding the workings of VT-x mode.

marcan_42 · on May 21, 2021

Not in practice. Running multiple OSes requires virtualizing the hardware, which is where most of the complexity is.

You can run multiple OSes with a rather dumb hypervisor iff, say, you dedicate specific bits of hardware and CPU cores to each. You don't even need a hypervisor for that, even. But that's a pretty limited use case. At that point it's not a VM, it's just partitioning existing hardware between OSes. How practical this is depends on stuff like whether the IRQ controllers are per core; anything behind shared buses and IRQ controllers that aren't just transparent is going to be hard to share.

Wowfunhappy · on May 22, 2021

...I thought I knew what a hypervisor was, but I guess I don’t? What is it, if hardware virtualization is actually something else, and two OS’s could share a single machine without one?

marcan_42 · on May 22, 2021

If two OSes use strictly different sets of physical hardware directly, and different CPU cores, then they can run simultaneously on the same machine without a hypervisor. Think about it, if they share absolutely no resources other than bus bandwidth, they don't care about the other OS. Of course, whether this is possible or not and to what extent depends on the system design; some systems will only have one of a critical resource (e.g. interrupt controller) and that limits what you can do using this approach.

This is already a common-ish thing to do on embedded systems with Linux. You can tell Linux to only use a subset of the available CPU cores, and then run your own bare-metal code on the remaining ones. This can be useful to, say, perform hard real-time tasks that are not amenable to running under Linux. For all intents and purposes there, your code (which might as well count as an OS, as it is bare-metal code) and Linux are sharing one machine there.

Similarly, you'll find that many systems are designed with multiple CPU cores sharing memory for different tasks. For example, on the Wii and Wii U, the main CPU (that games run on) and the IOP (security/IO CPU) share RAM, but run entirely separate OSes and each accesses a subset of available hardware. The CPUs aren't even of the same architecture. On some phones, the mobile baseband processor and main processor are also on a shared bus, but also run completely separate OSes. Sometimes both OSes are even Linux! And even on regular PCs, you could argue that certain add-on hardware with a coprocessor that has bus/DMA access is, effectively, a slice of the machine running another OS - say, for example, you might think of your NVMe controller this way.

The job of a hypervisor is to virtualize the hardware indeed, but what I meant with that is virtualizing things other than the CPU (e.g. peripheral devices). At the bare minimum, a hypervisor has to virtualize the CPU, which in practice means it runs the guest at a privilege level lower than itself.

In real life, hypervisors can range from "almost nothing, really" to a full blown virtual machine (which is what we normally think of, e.g. Qemu/KVM, VMWare, etc.). SimpleVisor is closer to the first - it gives you a platform to do things to the guest OS, but it's missing a lot that would be required to, say, actually share the screen between two OSes.

I'm building a thin hypervisor for the Apple M1, for debugging and reverse engineering purposes - the ultimate goal is to run macOS on it, so we can learn how it uses the hardware and then write Linux drivers for it. The way virtualization on ARM works, you can turn on VM features progressively. In the beginning, all my hypervisor did was execute the guest at EL1 (guest OS level), instead of EL2 (hypervisor level). There was literally nothing other than a few instructions to switch to EL1 and jump to the guest code. It's still enough to boot my own loader and then load Linux as a guest, and I used it to test that we supported running Linux as a guest correctly (since there are a few subtleties in the interrupt handling there). Does that count as a "hypervisor"? Then I started adding features; I added proper exception handlers (so I can perform actions when the guest does certain things), enabled traps on certain guest operations like accessing certain CPU registers, eventually set up page tables and virtual memory, then added code that can trap and emulate peripheral devices (which also involves emulating a tiny subset of the ARM instruction set in software), and it's slowly getting debugging features (notably missing is the ability to interrupt the guest on manual request, that's coming ~next, as well as virtual serial port support so we can get debugging output over USB instead of needing a custom serial cable). It's still a minimal thing that can't run two OSes at once (there is no context switching) and only supports one CPU core, but it can virtualize some hardware and let me debug and inspect the guest OS. At what point did it become a "real" hypervisor? That's up to you :)

owenfi · on May 25, 2021

Yes! Thanks so much! (I'm now stalking your other comments and learning a lot...)

The Wii example of non-matching architectures looking at the RAM together is especially wild to me.

If I wanted to run my x299 Hackintosh + Windows (+ Linux?), would you be able to recommend an existing hypervisor that could do it? I'd be okay dedicating cores/ram/storage directly, but thinking it through I guess to be "useful" (in the sense of making testing things/switching more convenient) at least some IO would need to be shared/handled by the hypervisor and passed along; and perhaps the complexity of setting it up would never outpace the finite number of reboots I could do instead. (And once the higher-perf M chips arrive, I'll probably stop booting the hackintosh side anyway; and linux in a standard VM is fine for my purposes.)

If your hypervisor is available publicly (now, or later) I'd love to take a peek; sounds like very impressive work.

owenfi · on May 25, 2021

Ahh, found it/a starting point: https://www.youtube.com/watch?v=rdhD1tinF8c

Wowfunhappy · on May 22, 2021

Thank you for the amazing answer, that makes much more sense!

ncmncm · on May 21, 2021

Why in the world would windos-oriented be a desirable feature in a hypervisor? People appear to have occasional need to run windos apps, but those run fine in windos itself running virtual. It seems like supporting Wine running virtual on Linux could be interesting.

pjmlp · on May 21, 2021

All those Linux and BSD VMs on Azure happen to run on top of Hyper-V.