When I first starting using AWS a few years ago, having known generally what it was for far longer, I was flabbergasted it was at how slow it was to get an instance booted. I expected much less, thinking about things from first principals, even if you're literally talking about cold booting a physical machine via IPMI. But it seemed like everyone accepted that as the way it was and now I do too. So I'm glad people are still interested in making things fast.
Right now I'm doing Postgres stuff (RDS) and dealing with taking 10+ minutes to boot a fresh instance. I'm tempted to try out fly.io and their Postgres clusters but I'm afraid I'd be spoiled and hate my life after (my job has me stuck in AWS for the interminable future).
I would be interested to know where all that time is being spent in on the AWS side. To be a fly on the wall seeing their full, unfiltered logging and metrics.
EC2 has historically not focused much on instance boot time. We did for GCE and drove it down pretty heavily. The post here from fly has a good set of sequence diagrams for "what are the various phases of creating an instance from scratch" that are generally applicable.
I'll note though that different users have different targets. Some people care about "time from request to first instruction ticks over" while others only care about "time from request to ssh'able from the public internet". There's an interesting middle ground of "time from request to being able to talk to other services like GCS or S3".
It's not clear to me what the networking / discovery story is for a Fly Machine that is stopped and then starts. That is, how long does fly-proxy take to update (globally? within a metro?) to add and remove the new Fly Machine? I vaguely recall that only external endpoints support IPv4, so I assume Fly is reserving and registering the internal IPv6 endpoints in the more expensive "create" step and then "start" is just about propagating liveness.
My main gig runs workloads primarily on AWS, but I work with a small company as well that is completely on GCP and I gotta say the difference is night and day in instance allocation and startup time. GCE is so much faster it's infuriating when I've gotta go back to work and sometimes have to wait more than 10 minutes in the worst case for an EC2 instance to finish booting in an EKS cluster that's already running!
> I'll note though that different users have different targets. Some people care about "time from request to first instruction ticks over" while others only care about "time from request to ssh'able from the public internet".
This is the same target: a machine (that usually only has single app on it) shouldn't take more time to boot than a general-purpose consumer PC/laptop.
The reason it takes so darn long to start in so many cases is just how horrendously overcomplicated the whole cloud setup is internally and externally (sometimes for good reasons, sometimes because we don't know better, sometimes just because it really is just overly complicated and overengineered)
> shouldn't take more time to boot than a general-purpose consumer PC/laptop.
That's an incredibly easy target. VMs can and should boot much faster than that - just look at firecracker hypervisor.
Even with KVM, if you replace systemd with something small and simple [0] (which you totally should, for single-app VMs), boot times of couple of seconds are within reach.
I'm still sad nanokernels or nanokernel-like never took off.
I also remember clicking around Ling (Erlang on Xen, sadly no longer active [1]) where the whole VM could boot up, service the request, and shut down in less time than it takes a cloud to start spinning up an instance :)
> We did for GCE and drove it down pretty heavily.
As a heavy GCE user, something weird about GCE to me is that instance boot times can be extremely variable — predictable within a given instance group, but unpredictable with even very small changes to e.g. instance sizes within the same instance family. (And this isn’t the instance blocking on getting scheduled onto a hypervisor; I can recognize that point, because it’s when any quota limits hit to potentially kill the instance provision. That phase of the delay is very stable.)
The variable delays also seem to apply to “reset” of the instance (which won’t involve an even-temporary deschedule, as reset keeps NVMe state) — but not to kexec-reboot, if one opts for that.
Is GCE built on a hybrid of two different hypervisor systems with wildly differing boot-time performance characteristics, where subtle tweaks of your instance config determine which hypervisor you get? Maybe one that relies on a hardware offload for something (PXE kernel signature verification?) that the other does purely in software?
One thing that’s clear to me is that instance types introduced after a certain point (e.g. all n2d instances) are always of the fast-booting type. So I’m guessing this is just old hypervisors being stuck on some legacy config because of long-term customers partially pinning those host machines with workloads that somehow prevent live migration (workloads with NVMe or GPU), and so make those hypervisors unable to be drained for a hard-upgrade.
There's not much about how our system operates that we're unwilling to talk about. But as you're probably aware, the broad questions of how networking and discovery work are big, so I wouldn't know where to start. Feel free to shoot questions at us, though!
I'm not intimately familiar with all of the AWS infrastructure around a running EC2 but I imagine a lot of time is spent on creating the associated Elastic Block Storage, allocating an Elastic IP Address, creating an Elastic Network Interface, creating a default security group etc, then attaching all of those things to the EC2, then attaching the required resources to an Internet Gateway etc.
I think that's an artifact of RDS specifically. It's dreadfully slow. An EC2 instance will launch and have SSH connectivity in 17-22 seconds in my testing. (I was testing this a fair bit a while back for a silly idea I had)
There's something about the tone and content of fly.io blog posts that makes it impossible for me not to root for them. (It also helps that the DX is so great.) I've only had a chance to deploy toy apps to Fly.io, nothing at scale, yet, but it checks all my boxes.
Fuck no. If you asked me whether I valued Fly.io more than my HN account, I'd have to think about it. I have, uh, an HN problem. We actually hoped not to see this post on the front page! We have a mode of writing for HN ("has to be interesting for people who will never use Fly.io") and there's a Fly Machines post in the works that fits that model. At any rate: I had very little to do with this post; if you liked it, you like Chris Nicoll, who writes for us professionally. And Kurt, of course, who wrote the original guts of this post, and also has been beating the Fly Machines drum inside of Fly.io for most of the last year.
The DX is great until you think of using their rest api (not for machines) and the link Google gives you to their docs is a very very incomplete page with even the base url obselete and you're left browsing the forum to understand why your requests don't work ;)
Now they've got my attention. This is incredibly difficult to execute on. Kudos to the team there who figured it out. If fly is or can become profitable then they've got a chance at being around for a long time. I can see them as the new cloudflare.
> Fly Machines will help us ship apps that scale to zero sometime this year.
I think this is what will make Fly really exciting. Right now (if I understand right) you need to be paying for a VM 24/7 in every region you want your app available in, because it only scales down to 1. So it runs apps in regions close to users that you're willing to pay for 24/7. If they make scale-to-zero work in every region, then maybe you can just make every app global and if you have some occasional users in Australia then it can just spin up over there while you're getting requests. I think it's what will make many-regions feasible for every app.
I honestly don't understand what's going on here. I thought we turned to Docker/containers because VMs were too heavy? Now we've got VMs that run Docker? (Not trying to be dense - what is the advantage?)
IMHO with fly.io their use of containers is more for the dev experience. It's incredibly easy and popular to whip up a Dockerfile, test it locally, ship it to a registry, etc. Anyone can learn the Dockerfile syntax and be productive with it in an afternoon.
The tooling for proper VM creation on the other hand is in the stone-age comparatively--there are just a few tools like packer or a frankenstein of ansible scripts and neither are as nice or easy as Dockerfile creation.
It's a frankenstein system of provisioning though. You have to write a config file in Ruby and embed bash scripts, which may or may not do things like invoke ansible scripts (written in YAML). The layers of complexity are immense, and that's before you have to start janitoring virtualbox's always half broken state. The inner loop of change vagrant config -> see result in VM is painfully slow too, waiting for the VM to tear down and setup again.
Docker is much, much simpler. Write a Dockerfile that's mostly just bash or shell code. Build and run in seconds to immediately see the results. Once it's working push the container image to a public registry and you can distribute it to anything.
Firecracker is a thin layer on top of KVM. It essentially implements just a handful of devices and it boots in milliseconds. Fly bakes a Docker image into the format that Firecracker expects and then boots it, alongside a bunch of anycast networking magic. You get the security guarantees of KVM with the developer experience of docker.
Not quite mentioned in other answers, but historically, the VMs we used to run were heavy. A typical qemu-kvm machine needs to actually boot up, initialise, start a number of services, etc. Firecracker is not that - it essentially gives you a kernel that already knows the environment and can do bare minimum before executing the provided image. It's like a halfway point between unikernels and independent VMs. The VM technology itself is not necessarily heavy - it just depends how you use it.
Everybody hypnotized themselves into believing that containers are not secure and can never be made secure so they run one container per VM. Instead of investing in making containers secure, the industry decided to invest in making VMs ligher, so VMs are now efficient enough that you can run one container per VM.
Why run Docker in VMs instead of using VM images? Because Docker's build tools are more popular than Packer-style tooling.
I don't know if hypnosis works or not, but quitting smoking is good whether or not you hypnotize yourself to do it, and so is avoiding multi-tenant Docker. The broad kernel attack surface is much too scary to expose directly to multi-tenant workloads, and there have been fairly recent kernel LPEs that would have avoided any sane system call filter you could come up with.
It's a moot point, because this is a solved problem. Use containers for single-tenant workloads; use micro-VMs, whichever flavor you like best, for multi-tenant.
What is the status of hardware acceleration for micro-VMs? I know that Firecracker doesn't have GPU support yet, are there any other options that handle this?
It's a good question. GPUs are in some ways a difficult case for the Firecracker model, which prizes a minimal, mostly memory-safe attack surface that (critically) is easy to reason about. We'd very much like to get an instance or machine type that supports GPUs, but we perceive it as a Big Project. We might not even use Firecracker to do it when we finally get it rolling.
If you're reading this and have big thoughts on how we might do GPUs at Fly.io without keeping us up at night about security, you should reach out; we're hiring.
> hypnotized themselves into believing that containers are not secure
They provide any extra layer of indirection which helps with usual exploit attempts, but also introduce new scope. We've had exploits specifically targeting the namespaces API already.
> We've had exploits specifically targeting the namespaces API already
Well, isn't that what happens when you put a shield into place? Someone tries to break it. Why have people concluded that it can never be made properly secure?
Because the broad kernel attack surface is huge, and the shield has to reliably protect all of it, or all you've done is create a jungle gym for vulnerability researchers. The win with virtualization is that it drastically scopes down the amount of kernel code exposed to untrusted code.
Everybody hypnotized themselves into believing that daemons are not secure and can never be made secure so they run one daemon per container. Instead of investing in making daemons secure, the industry decided to invest in making daemons heavier, so each daemon can now be provided with its own hardware/OS abstraction layer.
Why run daemons in containers instead of using proper process isolation? Because containers absolve the system administrator from understanding their systems.
I don't necessarily disagree with "the linux kernel boundary is porous and probably not possible to secure in depth", but IIUC the hypervisor is part of the kernel too, so wouldn't it have the same problem?
The win would be in the attack surface area. For hypervisors there's a good layer of abstraction to pivot over, whereas with containers it's a much thinner wall.
KVM presents a very narrow, well understood surface that is much much less of a moving target and changes at a much slower rate. Qemu has traditionally been a problem which is why Amazon and google have moved away from it.
The container kernel surface is just insane by comparison.
Which is an alternative approach to providing a dedicated kernel to the VM/container. (Because that's what basically a hypervisor does). Gvisor effectively implements a Linux kernel in user space, written in a memory safe language. A kernel that insulates all system calls from the host kernel, by literally implementing wrappers around host system calls.
Containers are still more lightweight (tho not by a lot these days) but they are hella insecure for untrusted workloads. Plus people like and depend on docker workflows hence taking docker container (basically just a tarball+json manifest) and making a VM out of it
Isolation would be the biggest advantage so they can host multiple clients on the same machine. Fly uses a lightweight vm (someone chime in if they have better details).
I am so excited about the future. We are seeing a bunch of announcements from multiple companies that make it possible for a single developer or small team to fairly cheaply run a global service without spending a whole lot of time on ops.
I am excited to see what people will come up with.
>"We're not done. You need something to run, right? Firecracker needs a root filesystem. For this, we download Docker images from a repository backed by S3. This can be done in a few seconds if you're near S3 and the image is smol."
I feel like I am missing something. If an S3 bucket is a requirement and I was interested in the isolation provided by Firecracker why wouldn't I just use AWS Fargate or Lambda which are both powered by Firecracker? If low latency was the concern, I can't imagine there being any lower latency than having my workload and storage being colocated in the same AWS Availability Zone.
That is talking about our S3 buckets, when you use these you don't know you're using S3.
Fargate and Lambda are not as consistently fast to boot VMs. Fargate, in particular, can take minutes to get a container launched.
This is not because they're bad services, it's because they make different tradeoffs than we do. When you ask for a Fargate container (or a new Lambda "instance"), AWS actually moves other containers/lambdas out of the way to get you running. Most of the wait time is their infrastructure doing orchestration magic to match your thing to their available compute.
Fly Machines don't do any of this. If you try and start a machine and there's no capacity for it, you get a very fast error response instead. This works well for our early customers. Most of them want to start a process quickly enough for a good UX. Fast errors give them a chance to do that.
I was really excited when reading this, but realized the lack of a faster "warm" start makes this less ideal for my highly latency-sensitive use case on Lambda. Lambdas start much faster than 300ms when warm IME, and I'm hoping with enough sustained traffic (be it real or artificial), most requests will be warm.
I'd love to be able to supply some kind of memory snapshot in addition to the docker image to cut down on cold starts. Probably blocked on snapshot support in Firecracker according to another thread? Eagerly awaiting this since it could make Fly Machine the best of both worlds!
Not a fan of how Lambda makes me scale memory and compute in tandem, when my use case benefits so much more from compute than memory. I basically have to pay for 2+ gigs I'm never going to use to get the compute performance I want. Makes 0 sense.
> Lambdas start much faster than 300ms when warm IME.
My understanding is that Lambdas aren't ever really truly warm unless you have a completely steady traffic level. The first request in a while will hit the latency spike. But so will every increase in concurrency. So if you were serving 2 req/sec, and a 3rd concurrent visitor comes along then they will also get a cold start.
If you have a low-latency use case then fly.io's regular VMs are much better than either this or lambda. You get permanently running VM (the smallest of which is $2/month for 256mb/RAM), which can serve more than one request all by itself and will auto-scale with traffic.
I do actually already use Fly for pretty much everything else.
For this use case though, I forgot to mention that it needs _much_ faster autoscaling than what Fly's regular VMs offer, with unbounded concurrency, and not ideal to run concurrently in a single VM due to each request being compute heavy and needing full isolation from each other since they run arbitrary customer code.
It's true that with Lambda, some amount of cold starts are probably inevitable with extreme spikes in traffic. But I'm hoping to mitigate most of that by sending artificial concurrent traffic on a schedule to keep a decent buffer of warmed up Lambdas above the current real traffic level. Still to be seen if that plan works out in practice.
Hmm... If you're willing to run the scheduler and/or something sending artificial traffic yourself (and are willing to pay for a few warm instances), then it seems like you might be willing to get something very low-latency with these Fly Machines. You could maintain a pool of a few booted and ready to go (but idle, waiting for a request to come in) at all times. When a request comes in you pass that off to an already-warm machine and boot a new one ready for a later request.
Though I think ultimately it's going to be impossible to have all 3 of:
For my use case, the main benefit of using manually pre-warmed Lambda instances over Fly Machines is the fact that "warm" Lambdas cost practically nothing (only have to pay for the warming events that are billed for milliseconds every few minutes). :)
This is why I want to see a similar mode of operation for Fly Machines, possibly through memory snapshots, so I can manually provide a "warm" suspend state for it to unsuspend into. In fact this would be even better than the lambda model since there would be _no_ cold starts.
as far as i understand this will let me run VMs with specified Docker images?
i'm thinking of using something Fly.io to offer a dedicated hosting for my upcoming product, so when the customers sign up they get a new machine with an individual endpoint
the workload that needs to be running on those machines is quite intensive (like crawling web pages) and not very scalable when sharing resources
also can you give more details about your Nomad stack?
i was actually thinking of using Kubernetes or Docker swarm as API to deploy these workloads
This is separate from the Nomad stack. When you run `fly launch` you get a Fly App that's orchestrated by Nomad and manages something-like-fly-machines for you. Fly Machines are nearly orchestration free.
Machines are designed to work well for your customer hosting! You can install a machine for them, and then turn it on when they push a button, or have it turn on automatically when they visit a URL.
I'm happy to talk about it more. Feel free to send me/us an email!
sounds great! the question about Nomad was not related to Fly Machine
btw., i've sent a mail already (mish at ushakov), but only got an automated message
i'm already using free fly.io for wikinewsfeed.org and very happy so far
would be awesome if you could send me a tip how i could use fly.io to deploy instances for my customers?
in my use-case i want to run many instances of Chrome at the same time
running like 100x instances of Chrome on a single machine is too resource-intensive and having a bigger host machine won't do the job, so your only option is to have a dedicated vm for each Chrome instance
That looks promising, but I don't want to handle any of this myself :/
I want a service where I can start a new VM fast by posting to an API, have it run some long running JS server code, and the VM should close itself when it's done.
My use case is often CPU bound, so a small VM with a single CPU is just fine.
Do you have recommendations for stateful workloads? Would the answer always be 'connect to an external DB/API for all state'?
E.g. if I need to run a bunch of processing, would it be
A) spin up the micro-VM and pull from a queue service
B) embed SQLite
C) use some kind of in-memory store
TBH I've been waiting for years for someone to do 'firecracker as a service'. I must have searched that exact term about once per month.
Fly can do this without the new Machines functionality, it's a big selling point for them.
You set a min and max number of VMs to run. Set the watermark for users per machine. And they automatically scale up and down depending on the number of connections. It will even figure out where in the world all the connections are coming from, and spin up the new VM in a region near to them.
So if your video encoding was requested via HTTP(S) connections then this would be trivial.
Right, I was wondering about this too. There are mentions of instances vs machines, so I'm hoping it's possible to spin up multiple instances of the same machine to run tasks concurrently, but I haven't found this explicitly confirmed anywhere yet.
What's the DB / compute break-even for this use case? I assume if you app uses 90% of CPU cycles on DB access, this is not the way to go. And if your app is 90% compute this is a nice solution.
If you mean moving (or copying) VMs to another region/host, sorta. Stateless compute is easy; our apps platform (orchestrated by Nomad) already does it. Volumes complicate things because they live on a specific host, and moving volumes between hosts is a slow and tedious process. Solving this is high on our priority list. We need super fast volume forking and host migrations like yesterday.
Ah! I've had volumes on the top of my mind all week... Firecracker supports snapshots in a dev preview but we're not using it yet. We can't really do anything like that on Nomad, which is one of the many reasons we're keen to get off it.
We do not. I'd like to, but we have a lot more infrastructure work to do before it's even possible. At the moment, we don't even migrate persistent disks or IP addresses between hosts.
Do Fly Machines support POSIX/System V shared memory? It is a giant pain in the ass for us because lambda does not implement these shared memory mechanisms which many multiprocessing libraries use to communicate. Makes it hard to utilize multiple lambda cores when running python code. You need to use multiprocessing due to the python GIL, but most of the python multiprocessing IPC uses shared memory.
We managed to hack a solution which uses pipes for IPC but it would be nice not to have to do this.
I haven't tried using shm on Fly, but I don't see why they'd not work.
You could already run multiple processes in a guest vm [0] or run multiple guests vms on the same host [1]. If the guest kernel Fly boots your app into doesn't have required modules, I guess you could consider requesting (nerd-sniping 'em) specifically for those, like so: https://twitter.com/dave_universetf/status/14262218974072422...
interesting, i was more pointing to the cold startup time on lambda with docker images vs this.
Deploy App Servers
Close to Your Users
is there a timeout limit to functions? This piques my interest but I can't tell if fly is a serverless function provider or some way to deploy my docker closest to my user (which is what I am looking for right now)
What guarantee is there in terms of average latency for my users? Is there a looking glass of sort where I can ping/see all the locations where my docker images will be running?
a dedicated 4-core 8gb ram is $124.00/month which is 4~6x more expensive than running on KVM vps so I want to know what I am signing up for
edit: I see the list of locations and it makes me think, aren't I already doing what fly.io is doing? I spin up a VPS instance at one of the locations that is closest to my user. It takes about 30~120 seconds. It's far far cheaper
Oh I see! Fly Machines are lower level than what most people use us for. They're designed for people building platforms.
Most people run "apps" on Fly. You control which regions they run in, we load balance to the nearest. We have guides for launching some frameworks here: https://fly.io/docs/getting-started/
The difference between us and a VPS provider is: your app runs in as many of those regions as you want. So does your database, if you're using our Postgres. And we route writes to the appropriate place: https://fly.io/blog/globally-distributed-postgres/
The other difference is probably CPU. The 4 cpu, 8GB RAM instances are 2 dedicated AMD EPYC cores + hyper threads. They're relatively expensive. You may not need them! VPS providers typically run cheaper CPUs and over provision them.
We're shipping shared CPU options with more memory soon. They should be closer to what you see from VPS providers, though still more expensive.
I see, so is it routing requests to a single instance or is it routing it to the nearest instance to the user? How will it scale if there are lot of users in a particular city?
Lambda is fundamentally message oriented, you send an invoke request, lambda will either route the request to an existing warm instance, or will boot a new instance, it processes the request, then once the request is processed it will suspend itself.
Fly VM's are just VM's that can start quickly.
They don't seem to currently support a request/response based VM lifecycle.
If you wanted to use a fly VM in a lambda-like way, seems like you would need to have some kind of proxy to coordinate the work, ie start the VM via the API, have your VM process start a web server, once it's booted, send it a HTTP request, once the request is finished, shut down the VM via the API.
Also seems like fly can't suspend a running process, your process needs to start up every time you start a fly VM.
Lambda will suspend a VM between requests, keeping the process in memory for a few minutes.
Sending a subsequent request to a warm lambda is much faster then booting a lambda from scratch, particularly for JIT based language runtimes.
> We're not done. You need something to run, right? Firecracker needs a root filesystem. For this, we download Docker images from a repository backed by S3. This can be done in a few seconds if you're near S3 and the image is smol.
Lmao props to the team for getting this copy out unsanitized by (potentially) unchill bosses.
He and Chris Nicoll snuck it past me; I was too tied up in family business to moderate the tone and add the business-friendly grace we're so known for. But this is just a feature announcement; I don't think we really hoped this would be a front page discussion. We have some meaty stuff to say about how Fly Machines work coming up.
I know some prominent HN users work for fly.io, and they seem to be doing some interesting work, but the absolutely glowing response that every blog post gets here on HN seems a bit nepotistic.
I would hope people like the blog posts that we tend to "chart" with on HN, because we write them deliberately for HN and not to check marketing checkboxes. This post is not that; it's a straight-up feature announcement. You'll see me elsewhere on the thread, and many of us on Twitter, remarking that we weren't in a rush to see this on the front page.
We're painfully aware that we get a limited number of bites at the HN apple, and we try to spend those on things like Litestream.io, which is an open source project that benefits people who won't ever use Fly.io. Several of our last few blog posts were about stuff we've done "wrong"; so, we've also got no qualms about charting on HN with a post about how much trouble we've had with Raft, or user-mode WireGuard.
Dan Gackle has said a bunch of times that he wishes more companies got the lovey-dovey reaction we seem to get from HN. I've got the cheat codes, if you want them: write posts for the HN audience, and throw your marketing goals out the window. I'm not going to bullshit you and say that we don't benefit from those kinds of posts too, but I hope it's at least clearer why they're received more warmly than a lot of tech company product announcements: we don't write them to be product announcements. (Unlike this post!)
If there was a "No HN" meta tag we could set on our posts, this post would have had it.
Ordinarily I'd be squeamish about dragging us into metacommentary like this on one of our stories, because I'd rather argue about whether you can scale a modern full stack app entirely on SQLite than about our marketing. But, like I said, we've got no skin in how this post ranks here.
This is really really exciting! I hope it enables more products built on top of full VMs with fast UX/DX.
I just wish I knew about this earlier because from what I read, I think we at Devbook [1] built pretty similar service for our product. We are using Docker to "describe" the VM's environment, our booting times are in the similar numbers, we are using Nomad for orchestration, and we are also using Firecracker :). We basically had to build are own serverless platform for VMs. I need to compare our current pricing to Fly's.
Right now I'm doing Postgres stuff (RDS) and dealing with taking 10+ minutes to boot a fresh instance. I'm tempted to try out fly.io and their Postgres clusters but I'm afraid I'd be spoiled and hate my life after (my job has me stuck in AWS for the interminable future).
I would be interested to know where all that time is being spent in on the AWS side. To be a fly on the wall seeing their full, unfiltered logging and metrics.