Forking, Snapshots, and Checkpoints - Oh my!

When you check out a sandbox vendor for your shiny new agent, most of the time the vendor is going to be building on fundamental linux technology - the kernel virtual machine, or KVM. KVM is lovely technology, born in 2006 it's almost old enough to buy a beer, and its everywhere. Indeed, the mighty AWS EC2 uses the Nitro Hypervisor which is based on KVM and AWS Lambda is famously built on top of KVM as well.

Developers of agents have coalesced on the need for isolation for running untrusted, agent generated code and the developers of those isolation systems have largely coalesced on the use of KVM for providing that isolation at scale. Still, even within those parameters there are still some questions the buyer needs to answer:

  • what kind of density do I need?
  • do I need a stateful or ephemeral VM?
  • what tools need installed in the VM?
  • what systems does the VM need to run on?
  • what networking controls do I need?
  • how much computational power do I need?
  • do I need shell access?
  • what kind of artifacts are produced?

That's a lot of questions but you've probably thought of some of them already. Like, do you need to run the VMs on your own infra? If "yes", the density, artifacts, compute, and statefulness are probably top of mind. If "no", you are probably primarily concerned with what your agent has to play with (tools), networking, shells and (again) compute.

In the simplest use case you only need to execute some snippets in JavaScript or Python - the world is your oyster, go forth and pick the cheapest, fastest vendor. Otherwise, you likely need a specialized toolchain installed, specific libraries available, code pre-injected, outbound internet access, potentially even inbound network access, etc, etc, etc. There really just isn't a one size fits all option unfortunately, which is why there is such a proliferation of companies selling VM runtimes. The remainder of this post will focus on customizability and how we can accomplish the goal of making a VM fit your agentic use case.

Customized Artifacts

Artifacts is a vague term so lets talk about the common ones for virtual machines. Depending on the virtualization technology you use, you will likely be dealing with kernels, ISO's (optical disk images), and/or root file systems. So lets define each:

  • kernel - the Linux kernel is a set of APIs that connect software and hardware
  • ISO - an "image" for a virtual machine; operating systems developers distribute these to install their OS into the VM so that you can get, e.g. a running (blank) Ubuntu or Alpine VM immediately
  • root filesystem - for technologies like Firecracker / KVM, you create the bones of the VM you want to boot into and pair it with a kernel. While an ISO will install a vanilla operating system into the VM which you can muck around with after booting, the root filesystem can be customized so that software and files are available on boot; these typically look like an Ext4 file system

Why go into artifacts? Well, if you need a customized sandbox environment, you probably will need to create your own artifacts or modify existing ones to fit your agent. For instance, if you need a bunch of Python tooling, say anaconda and its friends or scipy and the required C packages, you probably want to have these preinstalled on the VM, otherwise you will likely need a fully featured operating system that allows your agent to install its own tools - not to mention you will want that to be stateful so the agent doesn't run a bunch of apt install ... commands on boot every single time. Here you probably want to create a root filesystem that reflects your preferred environment. Conversely, if you have a general purpose agent that evolves with the work, you probably want a fully featured operating system that's stateful (e.g. changes you make stick around after reboot).

Concurrency at Scale

Most likely, you are going to need more than one VM for your agent. You may need one per customer or one per operation or maybe one per subagent. This can be considered a "pets vs cattle" type of problem - it's hard to scale up if the agent has to make a bunch of discreet changes, inject the same code, and perform a bunch of standard operations everytime it gets a new sandbox. A lot of companies offer the abillity to create "snapshots" - capture the state of a running VM to spawn more VMs. The snapshot itself will be one of the artifacts we talked about earlier. Think about AWS lambda - they give you pre-built runtimes and inject your code. Most vendors will do this and depending on the virtualization technology, this is as simple as creating files after boot or can be more complex like mounting a directory at runtime.

Then for more complex workflows, say implementing a "code factory" pattern in which an agent orchestrates subagents to make changes according to a spec and additional agents review the changes and provide feedback in a loop. Here, you need to copy a specific sandbox and spawn multiple copies for subagents - here the pets/cattle analogy fails us since we now need to take a "pet" and turn it into "cattle". So workflow wise we:

  1. Start with a base image
  2. The agent makes updates inside the VM
  3. We snapshot a specific sandbox
  4. Deploy multiple copies of sandbox, turning the sub-agents loose

This is the "forking" concept at work, also useful for parametric analysis via agents where we start a number of similar sandboxes, each pursuing a variation of the same plan or goal.

Ready, Set, Go

The sandbox is just a VM. Like a VM, it can serve a number of different roles, an isolated execution environment is just one of the things we can do with it. In a code factory pattern, it's useful to configure the VM to act as a continuous integration environment. If you are creating AI assisted work that needs to be shared, the sandbox becomes a useful artifact in its own right - just ship the whole thing to your colleague! And if you are creating software with an agent, the VM can also serve as your distribution point - deploy the whole box and wire up networking and you have a running app on the interwebs. For some use cases (distribution and CI), you might have a long boot process for the software running in the VM. In these cases it can be useful to use checkpointing. Checkpointing is the concept of saving the running state of a process in memory to the file system, and later restoring the process in memory to the same state, sometimes this is called "memory snapshotting". This is how the LLM inference vendors scale their fleets - it takes so long to load model weights into the GPU that it bottlenecks the ability to autoscale so instead we checkpoint the process and immediately restore from the checkpoint when the machine comes on line. You can take advantage of the same workflow - sandboxes will generally have some sort of TTL on them in which the box will spin down when its idle to save resources (and increase density for the vendor, which makes the service cheaper for you), then when accessed, the VM spins back up. If you need some services inside the sandbox to be running or are exposing an app to the internet, checkpointing reduces the time it takes to go from "cold" to "hot".

Obligatory Disclaimer

I am the CEO of a company that sells VMs and publishes a VM control plane (which you can use for free on your own metal, just signup https://app.heyo.computer/signup). We support multiple virtualization backends, P2P networking, image creation and snapshotting, forking, and checkpoints. Not to mention we automatically generate URLs and handle TLS certs if you want to run in our cloud. Feel free to reach out ✨