Sam Currie

Yet Another Sandbox Company

2026-04-07T00:00:00Z

You may have noticed a couple of things in the last few months. One, I started a company that sells sandboxes. Two, that there are a bunch of new companies selling sandboxes. So why did I start yet another sandbox co?

Well, let's step back for a minute. What's a sandbox and what does it provide? A sandbox is just an isolation mechanism for things you don't trust, it gives you a place to run code written by LLMs or customers or install things safely. How the isolation is created matters and there's a whole bunch of ways to do it - file system isolation, cgroups, namespacing, virtual machines, etc. The strictest form of isolation is going to come from a virtual machine, of which there's a lot of options as well.

In the world of agents, most apps use a coding agent under the hood - the agent is generally tuned for its specific task and market but its still writing code and scripts and then invoking them most of the time. Naturally, we want some isolation for that agent to protect us and prevent issues from the agent spilling over and affecting other customers (noisy neighbor problems). There are sandbox companies solving for this problem specifically (actually most of them are dialed into this problem). The core thing these companies will publish is an SDK for other developers to integrate into their apps. The most common isolation technique in this space is a virtual machine monitor called Firecracker which makes use of Linux's KVM (kernel virtual machine) to spin up tiny stripped down VMs. KVM itself is really wonderful technology and is now about 20 years old and is quite well understood by now.

I first started messing about with Firecracker several years ago at Flatfile. We had a fleet of compute for asynchronous workers that would grab jobs off of a queue. We were running the workers as containers in Kubernetes and because autoscaling just isn't instantaneous, we were generally always running a bit overprovisioned so that some core services wouldn't get bogged down waiting for capacity to come online. Originally, we were just seeking to offload the jobs into AWS lambdas but controlling the invocations became complex (something has to tell the lambda to wake up and check the queue), we couldn't just shove the service into the lambda cleanly, and the developer experience left a lot to be desired. I experimented with building custom root filesystems for Firecracker to solve some of the issues but there wasn't an off the shelf control plane at the time to do manage at scale across multiple machines so I packed the project up and put it on the shelf.

The agentic era made me grab it back off the shelf but alas, no real control planes still. More than that however, the more side projects I bootstrapped with coding agents the more I wanted extra features that don't come out of the box with firecracker: I wanted stateful VMs, I wanted to plug multiple agents into the VM, I wanted to use the VM locally and not in someone else's cloud, I wanted to share the VM (more accurately I wanted to share what I built inside of it), I wanted to ship it to the internet and expose a service. I wanted to manage many VMs across multiple machines. I wanted VMs on my Mac and my Linux box; I wanted to run a service in a VM on my linux box and hit it from my Mac.

I wanted a lot.

Personally, I don't think the hard part about what we're doing is the VM. We're putting a wrapper around 20 year old linux tech after all. Making VMs behave like containers in agentic apps is challenging, and there's value in that (indeed, try heyvm --api to run it as a REST API for building with). I think however, that if we were to stop there we would be leaving a lot on the table. I tend to think about agentic development as centering on the sandbox - it's your isolation but its also where all of your artifacts live. It's almost like a folder on your hard drive - create a folder for a new project and stuff everything about that project into it. The sandbox has the potential to be the folder for AI, except instead of just a container it can also power the thing you're building. Now, for that to truly be valuable, it also needs to be fairly portable. There's a reason that file sharing apps were all the rage during the cloud boom - most things we work on or create aren't done in a vacuum, they have a purpose and they have people that they are prepared for. So what if the sandbox we're using for AI assisted work can additionally be the folder, the compute, and access layer all at the same time? That's the reason I started another sandbox company, and its the reason that Heyo Computer looks a lot different from our peers (several of whom are doing awesome stuff and I'd love to support as backends in heyvm). When the sandbox becomes a first class citizen we unlock a lot of potential (we also unlock a lot of hard problems but those will be follow up posts).

PS - if you haven't already, head to https://heyo.computer and sign up for the mailing list to track our progress.

How to Build Internal Software in 2026

2026-04-03T00:00:00Z

The agentic era allows us to make software fast and cheap - it's not just SaaS companies that should be taking advantage of it. Internal software has always been about solving a hyper specific problem; the core issue is that it was usually prohibitively expensive and had to provide a huge amount of value in order to justify building and supporting it. I tend to think that a really targetted solution tends to beat a general solution any of day of the week, so if the cost of creating that software are close enough zero, why go buy a license from a vendor that only solves 80 or 90% of the problem? Software companies themselves are currently ditching vendors in droves and preferring to build in house solutions. As that trend accelerates and as non-software organizations gain access to the right tools, its a reasonable bet that the future of software lies in hyper specific internal software.

So with that in mind, here's my top 10 for making internal software. We'll start off with what is probably a hard one if you aren't an engineer - tooling.

1. Tools: Use a coding agent, git worktrees, and a decent terminal

The real concept here is have a cohesive setup. If you are a software engineer, you probably have strong opinions and an existing setup. If you are reading this in 2027, this is probably outdated but all you should take away is that you need a development environment that's ergonomic, doesn't get in your way and that you can develop muscle memory for. The rest of the points in this blog are going to be more about architecture, identity and workflows so if you have a setup already, skip to #2.

Anyways, back to tooling.

I use Claude Code, it's fairly cheap for what you get out of it. Cursor is pretty good and the agent is built into a VS Code clone, although TBH you probably aren't gonna look at the code much. There's a bunch of open source agents that you can plug whatever provider in - the Vibe CLI from Mistral is great for this, I use it to wrap ollama for doing work with local models.

Git worktrees make working with agents a ton nicer and there's lots of tools out there for making it ergonomic. The basic premise is to create a repository and have an agent scaffold your app out, then create a worktree for each feature you want to add, run an agent in each worktree and merge then back in as agents complete. This requirement implies that you need git - and you do, it's what engineers use for version control and enables multiple people (or clankers) to work on the same codebase. Plus it's also how we share code. Apps like Emdash do a great job of abstracting the technical bits away.

The terminal is something you should be comfortable working in. Software tooling is usually native to the command line (e.g. a CLI, command line interface), so you will be lost if you can't work in your terminal. Warp is a great one - they build an agent right in it so you can always just ask it for the command you need. It also has nifty autocomplete.

2. Multi-tenacy is hard, avoid if possible

Multi-tenant software is predominant in SaaS because the entire idea of SaaS is to spread the cost of the application over as many customers as possible. If your app is multi-tenant, there is a whole list of complications that come up:

logical separation of data (e.g. customer A can't see customer B's data)
user data structures go a level deeper - users and accounts
authentication and authorization (identity) needs to be fancier
database structures contain more relationships
etc

So you know, just avoid. Internal software is kinda magical these days; its cheap to make and once you strip away the complexity of commercial SaaS, relatively easy to create something that solves your problems. As far as identity goes, you probably already have some ways of verifying the user. Company email works great, just go send a magic link with a token in it. Maybe your company has an internal intranet, pop it in there and you know that the user was already authorized. If your company uses gsuites, just slap a Google login in front of it.

3. Modular Monoliths

We call an app a monolith if a single program runs everything that your software does. The opposite is microservices where we break tasks out into dedicated programs like authentication or event publishing. The hard part of making your app out of microservices is sharing state, that is, keeping all the different parts of your app on the same page with respect to what users are doing on it.

I recommend something in between these architectures - small but complete applications. Coding agents will have an easier time creating an updating a small application and monoliths are way easier to host - it's just one app after all.

4. Use a lightweight data layer

There is probably no need for a Postgres cluster. You probably don't need Redis. High availability? Doubt that too.

Most production level, commercial software products are a distributed system - multiple physical machines are involved in delivering different aspects of the software, the services themselves likely all have multiple copies running at once (hundreds or thousands of copies even), and there are usually several pieces of software making up one "app". This complexity introduces several things that make your life harder than it needs to be and one of those is how to share state across everything. One way is to use a database and let everything connect to it and query for the application state. Another method is called "consensus" where each copy of a service communicates state with each other. Both are probably more complicated than your four person team at work needs to generate reports on weekly orders for pizza boxes. Design the app so that it only ever expects one copy of the application running at any given time - keep your data in memory and flush it to disk in a JSON file or a CSV every now and then. Back up to S3 or similar if it's really important.

5. Use a Sandbox

Get ready for a shameless plug (Heyo Computer sells sandboxes, although you can create and run them on your own hardware for free).

In all seriousness however, you want a sandbox for a couple of reasons and uses. First off, its convenient. Very likely during the development of an app, you will be installing software and tooling - either as dependencies of what you're building or even just as helpers. Polluting your own machine can have downsides and if you have multiple different projects going at once, you need a way to separate concerns. Sandboxes come in many different forms but are essentially just an isolation system for your project. A sandbox makes development easier, but choosing the right kind of sandbox also makes deployment and sharing your app easier. In a perfect world, when you need to share your app with someone else, you can just move the sandbox to the internet or send the entire thing to your coworker.

Another use of a sandbox: making an agent. If you are building an agent to do some hyper specific task for you, you will want an isolation layer for the agent. Typically, an "agent" is a loop of LLM inference calls and tool calling; the LLM requests a tool call by responding in a specific format and we use traditional software to run the tool (which is just another piece of code). A lot of agents complete tasks by writing scripts and then calling the script. The sandbox isolates the LLM generated code when it runs, protecting you from side effects.

6. Think about how much you trust your users

Is your only user you? Great, don't bother building RBAC (role based access control, a very enterprisy feature). Is it machine to machine and you control both machines? Great, use bearer tokens. The point is really, get away with the least complicated identity system you can, ideally relying on another, more authoritative system.

If you are distributing on the web or over HTTP, the simplest form of authorization can be done at the header level on requests; it's so easy that it would be weird if you did nothing else.

7. Plan Ahead for Distribution

Who needs to access the app and where do they need to do it from? Will everyone using the app have the source code?

Not everything needs distributed over the public internet. Distributing securely on the web and doing so on an internal intranet are different ballgames - having identity built into your distribution mechanism reduces your surface area. A private git repo and a shared secret in a password manager can go a long way.

Desktop apps generally require signing and other privacy measures if you want to distribute widely, and have a far different distribution mechanism then web apps - notably, that a desktop or CLI user gets a choice when they update.

Web apps have the disadvantage of being, well, web apps - having public entrypoints to your software introduces its own class of problems. At a minimum you now need to have and configure a web server, web application firewall, and internet gateway. A lot of platforms will do this for you but you'll usually want to control the DNS as well.

8. Make a Pipeline

Regardless of where you distribute your app, a build and deploy pipeline will save both time and tears. That is, these steps should be automated: you push a change, the change makes it to the users. Deploys can be manually gated; e.g. someone has to click a button or run a command to kick it off, but builds shouldn't, especially if you have multiple contributors - breaking the build sucks for everyone, and the person who breaks it should be informed of the matter as soon as possible, because they are also the most likely person to fix the issue. Automated pipelines are also nice for any alerting you might want to integrate for important services - "this deployed successfully" can stay in your build system, but "failed to deploy" and "health check failed" are really helpful to know about when they happen.

9. At the minimum, have an agent investigate security

The same advantages that make creating software fast and cheap in the AI era also make it fast and cheap for bad actors to create and take advantage of vulnerabilities.

Access control is likely the most common security failure out there - coding agents are great at writing tests, have it make a bunch of end-to-end tests (that is, tests that work through an entire feature or code path), that attempt to compromise access control.

Then, inspect your supply chain - ideally everytime you add a package or upgrade something should do a sweep on your dependencies. Socket.dev is great for the JavaScript ecosystem.

10. Build for updates

There's a couple of themes to tie up here - distribution and identity. Identity governs who you can trust and how well you can trust them while distribution is kind of the mechanics of getting it to a user - build it, copy the artifacts somewhere, make a container image, deploy it to production, migrate data, etc.

Traditional software goes through a review and approval process, what compliance frameworks call "change management". Reviewing code written by the clankers can be onerous, so some teams elect to have another clanker review a pull request (a bunch of related changes that we want to merge into the working branch of source control) but be warned, clankers cannot be held accountable, only you can.

After reviews and approval, you need a path to update your project. If you distribute the app via source code and its repository, you are done. Otherwise, you will need to push an update through to whereever the app is running. One aspect to be careful with is the data layer. It is very common to have to change the shape of the data you are storing (e.g. adding a column to a database), it is also very common for these data migrations to cause incidents and it can be difficult to roll back changes which in the event something goes wrong you can easily find yourself in a place where you are restoring from a backup (if it exists!) or you are manually doing surgery on your data. Neither is very much fun. If you have a change to your data shape to roll out, test it locally on data that looks like "production" (in a compliance constrained environment, pulling production data onto a local machine for testing is typically a no-no). If you use a traditional database and the data is critical, you should be taking backups anyways and ensuring that you know how the write ahead log works. Take the 2 minutes to ensure that you can recover before pushing out a data change.

During deployments, it's likely that services will go down - you can architect for high availability but that adds a fair amount of complexity for internal software and you should be pragmatic about what you actually need. Give your users a heads up.

Finally, have a way to verify that a fresh update is working as intended. The simplest form of this is simply opening the app and running through core features. You can add simple automation scripts or something like Playwright to make things more consistent (always a good idea) and basic health checks (ping test) and alerts go a long way.

How Many Vendors does your Vendor Have?

2026-03-31T00:00:00Z

If you asked me for a list of how to run a secure software or IT department, number one on that list would be reduce the surface area.

Reducing the surface area all boils down to this: if there are less places to attack you, you will get attacked less and it's easier to defend what's left.

That's not to say that things like encryption, intrusion detection, and RBAC aren't valuable; I'm saying that the most pragmatic thing to do is to limit your vulnerabilities in the first place. Realistically, no one is going to break encryption, but someone might discover a key in a really old commit in a public repo or some credentials accidentally pasted into a public Slack used for customer support - in these cases, the breach will almost certainly look like legitimate access in which your encryption doesn't matter and at best your IDS will be a forensic log.

One thing that matters a lot in these cases is where the compromised service lives. Is it a public multi-tenant SaaS app? Ope, that sucks. How about a cloud vendor? Dang, sucks to suck. If instead the compromised key belongs to an internal application running inside of a private network without a public entrypoint? Nice bullet dodge. (still tho, clean up your keys).

Keeping the surface area small is a core part of infrastructure architecture - put a load balancer in a public internet and have it control traffic to services inside of a private network. Large companies recognize it as part of how you responsibly run a business, which is why we have concepts like federated identity. Which brings us back to the topic at hand...

How many vendors do you have?

The SaaS era birthed so many domain specific tools, most folks probably don't know off the top of their head how many tools it takes to run their team or business. There are your big ones like email, documents, storage, and hosting. There are your team's specific needs like sales tools and CRMs. There are the quiet ones that run in the background like data exfiltration detection. There are boring company wide vendors helping do things like HR and payroll. Have contractors? They probably have their own stack of tools too.

Now ask yourself, how many vendors does a given vendor have? Don't know? Check out a company's privacy policy, most companies are nice enough to list the major vendors that act as "data subprocessors" (that is, their vendors that they send your data to). Most at least. Here's what OpenAI says:

To assist us in meeting business operations needs and to perform certain services and functions, we disclose Personal Data to vendors and service providers, including providers of hosting services, customer service vendors, cloud services, content delivery services, support and safety services, email communication software, web analytics services, payment and transaction processors, search and shopping providers, and information technology providers.

If you actually wanted to know who those vendors are, you need to head on over to their trust center and fire off some requests for documents. So this is where the trouble begins - if OpenAI is a major, critical vendor for your company, your surface area is pretty vague, at least until they respond to your requests (get the SOC2 Type 2 report at a minimum).

Companies selling agents tend to use a lot of vendors for delivering their services; most are a little nicer than OpenAI and will tell you in plain language where your data gets shipped off to. Take a peak at Genspark's policy https://www.genspark.ai/privacy. You get Azure, Stripe, Google Workspace and then inference providers: Anthropic, xAI, Google again, ElevenLabs, and our old friends OpenAI. So if Genspark is a vendor that means OpenAI is a 3rd party, now their vendors are 4th parties to you. It's concievable that the prompts you slam into the app are now stored not only on OpenAI but on a telemetry vendor too and all of a sudden a data breach at any of three different companies could result in sensitive info getting leaked. This is the real surface area you are dealing with.

How do you reduce the surface area?

There are two obvious paths here, both have traditionally been expensive - roll your own or self host.

"Roll your own" refers to building out your own solution to a problem. The SaaS era was predicated on an app solving a class of problems and then spreading the cost of it out among as many customers as possible. Building internal software projects means there is only one company to absorb the cost of the project; that was a problem once upon a time, now however coding agents have made this cheap and fast.

"Self host" refers to taking off the shelf software (open or closed source) and running it on your own hardware or cloud. Even if the software is open source, you will spend engineering time deploying and maintaining it. This approach is probably the most familiar to anyone who has worked on a team or sold software to a company in a regulated industry or one that deals with sensitive data. Most of the time, you see companies that run infra via a cloud vendor or hybrid with some racks that they own and access control tends to be locked down, but otherwise is not a dramatic departure from running infra on a typical public cloud. You will still find that self hosting means maintenance but when you aren't engineering for a public SaaS platform it makes a lot of decisions much easier - skip Kubernetes, schedule maintanence and don't bother with high availability deployments if you don't have to. Sometimes you can even forget about the whole pets vs cattle thing and just run a VM configured with a few bash scripts - the level of sophistication should match the need but don't over do it.

Either approach (or both at the same time) gives you the ability to limit access by design which is rather nice! Put things in private networks, only build for a specific identity provider, design for single tenacy - all the things that make your life easier and the surface area smaller. The key takeaway is that this is a pragmatic choice; reducing the surface area can and should simplify your stack at the same it makes you more secure.

Slack doesn't care about your "Slack Killer"

2026-03-28T00:00:00Z

Slack isn't going to sweat it if Garry Tan invests in a couple of kids building out a Slack killer with the help of some clankers.

The thing that kills slack probably won't even be a single application or company. I'm unsure if we're having a "can't see the forest through the trees" moment or what, but I think the future of software looks a lot like the past - internal software that is hyper focussed on a single company's problem. In that world, conventional SaaS isn't competing against a hot YC startup, they are competing against Claude Code making simple apps that are just good enough.

Flatfile, where I ran the Infra team, started to experience this at the start of the AI transition. Formerly, many of our deals were essentially competing against Microsoft Excel in the sense that Excel was where the workflow was currently happening and in many cases it was "good enough". We had to show that our product provided an incredible amount of value over a home grown solution that consisted of emailing Excel spreadsheets back and forth. That all started to change about a year and half or so ago - instead of a deal dying because Excel was good enough, deals started dying because customers were pasting data into ChatGPT and asking for it to restructure it and then sending the results to their coworkers (this is not a commentary of the wisdom of pasting in sensitive data into ChatGPT, that's for a different article). For a complicated case, Cursor or Claude Code could vibe out a script that was precisely tailored to your data, which beats a complex generalized solution all to hell.

I've used Slack for a long time so when I started a new company, Heyo Computer, one of the first things I did was spin up a free Slack workspace. Then at some point I needed automation. Ok, well Slack is $18 a month per user for a business plan and the company is currently running on my personal credit card. I was on the bus going from where I live, deep in the Rocky mountains in the Arkansas Valley, to San Francisco to see my cofounder one day and had some time so I vibe coded out a Slack clone. Now, I know that I am not going to just vibe code Slack quality chat and business functions on my 3 hour bus ride, but at the same time, that really just wasn't necessary. I just need my founder and I to be in a chat app that has some webhook functionality. So good enough for us is pretty simple; single tenant app that needs stupid simple auth, a KV store, files API, and a webhook feature that uses bearer tokens. That ain't ever going to kill Slack, so Saleforce isn't going to sweat it, but they ain't going to get $18 per month per person for the four of us in the chat.

And that I think, is the real danger to Slack and other traditional SaaS companies, which Wall Street seems to agree: its the evolving nature of what getting to "good enough" actually costs a business.

What does software look like when it costs nearly nothing?

2026-03-23T00:00:00Z

There have been a lot of takes on this recently, not to mention a lot of businesses like Replit and Lovable making big bets on the outcome. Personally, I think its weird that a lot of these bets just seem to be "regular software but cheap". Indeed, go ask an agent from one of these CO's to build you an app and it will make something that looks like normal software in 2024 — there's a full stack app with a UI, backend and a data layer, probably postgres. There's some CRUD forms and an integration or two. There's a CSS file that looks strikingly similar to all of the other ones the agent spits out.

Conversely, if you were on a team making internal software a decade or so ago, you know that when the cost of a project is spread over a single company and isn't the primary revenue driver, that it looks pretty messy: spreadsheets, some JSON files for configuration, a couple of bash scripts, maybe an HTML page generated from python with a report, and a crontab to run it all at 7 AM every day. It's messy but its also hyper specific, which is the real point.

Why wouldn't some vibe coded fix for a hard scheduling problem that only Bob in logistics deals with that cost $46 in credits be any different? It doesn't need to be sophisticated - it solves the problem, it's easy to modify by the person who knows the problem space the best; what would adding an agent, Postgres and a Telegram integration do to improve it? Probably not a lot, so Bob won't be taking that demo for your AI native app, Bob already moved on.

The SaaS era didn't arrive because a beautiful UI with a multi-tenant, microservice based, container fleet solves every problem perfectly. It was powered on the premise that an application could help solve an entire class of problems and became cost efficient when spread out over many customers. Bob might have bought a license to your app once upon a time, but that doesn't mean he was in love with it, he just did the math: (value of time saved + additional value - cost of license) > $0 . Now in 2026 we not only take that equation and swap the license cost for token cost; the "additional value" changes a bit as well - a hyper targeted solution is likely going to save more time or create a better outcome, so it creates more value. So we're left with the "time saved" being the real variable here; if you cannot use a terminal and have no desire to learn the basics of a coding agent from YouTube then perhaps your "time saved" is actually negative infinity, in which case a SaaS app is still really valuable. If the problem is very complicated or requires enoughb stakeholders, then the "time saved" may be sufficiently negative to also choose a SaaS app. For the rest of folks out there, the learning curve ain't that steep however.

So that's my take on this one — making full stack apps is fun but the future of software looks a lot like a warp speed version of internal software from 10 years ago - highly targeted and functional but messy, and everyone just gets back to working on the thing that's the actual revenue driver.

Self Publish, Syndicate Out. For the AI Era.

2026-03-19T00:00:00Z

Known as "POSSE" on the Indieweb ("Publish own site, syndicate everywhere"), the concept of having your own website to distribute the things you write and talk about is an old one. The once ubiquitous language "PHP" originally stood for "personal home page". The concept goes that self publishing on your own site establishes permalinks that give attribution and ownership to the author, not a platform.

That's all well and good but there's a reason platforms exist - the critical mass of eyeballs that they have attracted make two things really easy: publishing and discovery. You need to put something that you created on the internet with the least friction as possible, and someone else hopefully wants to find and consume what you created. Platforms have done a really good job of establishing moats to protect themselves, such a good job that providing a nice experience for their users isn't really all that important; e.g. Facebook can allow a mass amount of slop in your feed because your grandma is already there and your small rural town gossip lives and dies by a Facebook group. Meta itself freely admits they make a ton of money on scams - perhaps as much as 10%, which represents about $7 billion. Personally, I don't much buy the perspective that corporations are evil - they are more like a huge boat that takes thousands of people to crew and a primary goal of staying afloat (making money), regardless of whatever the captain says it still takes thousands of people working together to change directions and they still prioritize making money regardless, it's just a big dumb ship trying not to sink. In that context, no surprise here that Facebook is a slop fest and Twitter is a goblin posting dumpster fire.

So no surprise that current platforms are a mess but what can anyone do about it? These things are dominant due to network effects right?

Well that's where things might get interesting in the AI era. Agents are quickly becoming the number one user of the internet — software documentation these days are primarily serving agent traffic even. In a world where content is consumed and distributed via autonomous agents, the discovery mechanism of large platforms becomes much less valuable. If agents are pushing users towards content instead of the platforms themselves, then the network value of a platform is greatly diminished and then what's left? Oh its super easy to publish on? Well geez, it's super easy to have Claude go make a website (e.g. like this one), why send my readers to an ad-tech hell hole at all?

Wow TikTok is Still in the News

2025-05-02T00:00:00Z

For months, every time TikTok is in the news for a potential ban or sale I complain to my wife. That complaint usually goes something like “There’s already a fix for this, I can’t believe we’re still talking about it.” For the uninitiated, the drama supposedly centers on protecting the personal information of Americans using an app created and controlled by a Chinese company. But privacy frameworks exist to different degrees in many jurisdictions and they form a well established set of legal controls for doing the very thing that US lawmakers think they are accomplishing by forcing the sale of TikTok or banning the app altogether.

Now, the current administration — once hailed as the savior of TikTok — has granted another extension for the sale of the app, so its quite likely that the story dies down for a while before coming back ‘round after a sale goes through or another extension, so we will see how this actually plays out. But the thing I would like to highlight is that this is a silly way for lawmakers to go about securing American’s privacy.

Today, the European Union shows us the way to go — using their existing and robust privacy framework called GDPR (General Data Protection Regulation, a legal framework that has been evolving since 1995), they slapped a $600 million dollar fine on ByteDance for illegal data export (data pertaining to citizens of the EU are subject to data protection agreements in order to be exported to a country that does not have equivalent privacy protections). It is absolute silliness for the United States congress to individually police tech companies (and indeed, I’m not convinced protecting American’s privacy is the goal with this saga) and if you think TikTok is the only platform that is harvesting and exporting personal data (quite legally at that) you are very much mistaken. The real answer to this problem is to create a national framework and then enforce it with our existing legal infrastructure.

Granted, it’s true that the US simply doesn’t care very much about privacy. Sure, you can pontificate all you like about it, but since we lack a cohesive framework at the federal level we certainly aren’t doing much more than pontificating about it. Privacy is one of those things we can all agree is a “good” thing; it’s like being a “good person”, we can all agree on it but it’s vague enough where no one need do anything different. And there are a smattering of efforts and laws out there:

HIPAA — protects some forms of health and personal information and requires a data processor (that is, an app) to inform HHS of a data breach

FERPA — protects some forms of educational data

COPPA — restricts the collection of use of children’s information for online services

Dept of Commerce Privacy Framework — a good faith program to enable data export with countries that do take privacy seriously (and thus enable US countries to export services easily)

…And there’s a few other industry specific laws — no overarching, nationwide privacy and compliance framework that applies to all businesses however. These types of laws only exist at the state level (and there’s a bunch), making regulation onerous and compliance even more so since a company’s legal obligations may vary based on the zip code of the end user. It’s kinda like Colorado not having statewide building codes so a contractor in Denver might not be licensed to do business a few miles away in Jefferson County until they learn another set of codes and take another exam and register their business; all of which makes it more expensive to offer their services to everyone. That’s what we’re doing with privacy protections in the United States: we make it weaker and more expensive for everyone and ask the people in congress who likely don’t have the slightest notion of how the internet works to make a flashy show of it when it serves their political interests.

Another Era is Ending So Let's Break Up Big Tech

2025-04-17T00:00:00Z

The front page of the New York Times today features Google’s legal woes —

Google Is a Monopolist in Online Advertising Tech

And don’t forget that Zuck is sitting in court all this week trying his best to prevent Meta from breaking up.

But let’s party like its the 90’s.

Who remembers the Microsoft antitrust trials? They didn’t really result in a sea change by any means; Microsoft argued that a monopoly can actually serve the consumer in the case of software - in that a larger ecosystem and larger user base lead to a better product, and theoretically cheaper prices. Microsoft had some experience fucking up monopolies as well; Word and Excel weren’t always the entrenched products they are today, and the launched into a world where there were real monopolies in these spaces. There was WordStar and then WordPerfect. There was Lotus 1-2-3. And slowly but surely Microsoft wrested away market share until it dominated. That’s not to mention Internet Explorer; engineers may have a longstanding hatred of IE8 and 9, but there was a time when Netscape dominated the market before losing their lunch to Microsoft in the browser wars. When the antitrust trials started in 1998, Microsoft controlled a vast holding of the software landscape.

But.

But the PC was also becoming a commodity product when the trials started. The great mobile shift was beginning, which Microsoft famously missed the boat on. A new wave of tech rose up as the old guard was mostly relegated to commodity products (Apple going from near bankrupt in the PC era to the winner of the mobile shift).

It seems fitting that Google and Meta are in the hotseat now in 2025. I personally doubt the antitrust proceedings will result in anything astonishing; even if you break up Facebook, the constituent companies will still be wildly successful and do their best to keep a grip on their marketshare. But the vibe shift is underway anyways. AI native companies are introducing new ways to use computers that are dramatically changing how software engineers work — it’s mostly a matter of time before the wave catches the rest of the workforce and then the consumer. In this wave, we’re seeing OpenAI capturing the consumer and nontechnical market while Anthropic entrenches itself in the developer space; I doubt the old guard is catching up, which is likely why Microsoft itself is betting so heavily on OpenAI.

The DOJ is a little late, just like last time. Search, ads, and social media are commodity services. LLM’s might become a commodity themselves a little faster in this world but the next decade or so is bright, just watch out for the next antitrust fight.

Think About Privacy Like an Engineer

2025-03-26T00:00:00Z

There are lots of ways to think about privacy these days. If you happen to be a lawyer at Meta, you probably think about privacy in terms of all the bits of information that is being collected from users and is legally required to be listed in a privacy policy. If, by chance, you happen to be a member of the US Congress, privacy might mean the protected classes of data including things like medical and educational records. In the much more likely case that you are a normal user of various apps and websites, privacy probably just means that there’s a checkbox or a button that you need to click before you use the aforementioned app and/or website that says you must accept “cookies”.

Alternatively, if you happen to know many software engineers (and in particular, a software engineer that works in infrastructure) and ask them what privacy means, you may end up in a conversation about risk aggregation. That’s because risk is typically a currency in business and crucially in software businesses. You can bet that any software company with a stamp that says “SOC II” on it has a “risk register” that they maintain (at least they update it once a year when the auditors ask about it) and the purpose is pretty simple - gauge the likelihood that a certain event will take place as well as it’s downside impacts to the business. By events, we generally talk about things like data breaches or fraud, and the likelihood and impacts are the risk that the business takes. Take the entire registry altogether and that is the aggregated risk assumed by a business in order to offer its services.

Need an example? Pretend you are an executive at Equifax circa 2017. You might have a risk register that has a line item that looks like this:

Risk: Personal information database breach

Likelihood: Low

Impacts: Reputation loss, financial impacts due to exposure to class action settlement, regulatory oversight compresses margins

This risk exists because the product consists of collecting lots of personal financial data about individuals; if they don’t take this risk, they don’t have a product (arguably they have been such poor stewards of data they likely shouldn’t have this product anyways).

Most modern businesses rely on information flowing cleanly between different pieces of infrastructure and vendors (subprocessors); for example, when you buy a fancy espresso from the local coffee shop and pay with a credit card, the point of sale system sends data to an app hosted on a cloud provider like AWS, which interacts with a vendor like Square to process the payment (that’s two vendors, or subprocessors) and then the rewards system updates your history to record the purchase (a third subprocessor). What’s your risk registry look like for this transaction?

Your credit card number (high impact, low likelihood of exposure)

You like coffee (low impact, high likelihood of exposure via data sharing)

This contrived example isn’t terribly risky but its just one of hundreds of decisions we make subconsciously and the registry adds up over time — you are aggregating risk, just like Equifax as they accumulated personal financial information one person at a time, 147 million times over.

The way we use devices and the internet has evolved a lot over time (duh) and data has emerged as a hard currency of the information era. Sometimes you are paying money to manage risk (buying an app that runs locally on your laptop), and sometimes you are forking over data to get some service for free (e.g. Instagram, paid for by ads) and when that happens, you are aggregating risks. This is the part of the post where I should write a “call to action” so that you hopefully exchange cash for something from me, but I haven’t built that part yet. I can tell you what I want for myself however —

I want to use LLM’s on my own machine

When I need to do some one off task, I want to make an LLM do it for me on my own computer, and not sign up for yet another service

When I save a file, I want to save it to my local machine

When I send someone a file, I don’t want to save it to some cloud drive

When I need to buy pants I don’t need my algorithm updated

When I publish things, I don’t want companies scraping it for financial gain without permission

I could really keep going on, but I think you get the point — I want to use my computer like its 2002. What are you nostalgic for in today’s computering world? How do you want the internet to change?