background blurbackground mobile blur

1/1/1970

How to Migrate Off Firestore in 2 Days

Firebase. That one word can easily cost you 10-20x what a cheaper database costs, and can constrain your database design.

So you can imagine how relieved I was when I finished migrating off Firestore in just 2 days. Well, 3 days if you count the time I spent building my initial "PostgresSynchronizer".

My initial estimate for this project was an entire month of work! But, thanks to some awesome LLM prompts, and a friend who taught me about CDC, I was able to get through this significantly faster.

If you're looking for pub/sub, Firebase or Supabase might sound very tempting. After all, not many databases support pub/sub out of the box. But this simple architecture below is all you need for most small-business use cases (and should scale nicely up to about 10k-100k ops / second).

[Image showing Postgres WAL -> CDC -> Redis Pub/Sub -> Gateway -> WebSocket clients]

Before we get into the details of how the pub/sub works, though, I'm gonna go over the steps I took in order.

First, I made a prompt in Cursor to build a PostgresSynchronizer. This class has a function, handleWrite, that takes a Firestore path, the type of operation (create, update, delete), and the data. It then upserts the data to Postgres. I then wrapped my Firestore store instance so that any changes to Firestore would call handleWrite as well.

This isn't 100% perfect, of course--if the server disconnects or crashes, it's possible that we don't persist some changes. But it's good enough for Foony's use case.

Next, I created and ran a backfill that iterates over all the Firestore collections and calls handleWrite for each document. This ensures we get all historical data from Firestore. For speed, I limited this to only registered users (sorry guests--you should've signed up). This works well, and it's safe to run multiple times.

Now that I had a way to populate the data in Postgres, and keep it (mostly) in sync with Firestore, I was able to tackle the big, scary problem: pub/sub.

How the hell do you do pub/sub with Postgres? Or any SQL database for that matter?

Change Data Capture (CDC) to the rescue!

CDC is a fancy word that just means "read the changes made to the database and publish them to a pub/sub system". In a big business, you might use something like Debezium + Kafka. But Kafka is a pain to set up, we already have Redis in place, and our database only gets about 30 ops / second. Even if we were to 100x our scale to ~100,000 concurrent users, we can likely still fit this in a single Redis instance. So that's what I did.

My friend, Eric, taught me about CDC and open-sourced his CDC code that reads Postgres' WAL (Write-Ahead Log--it's where Postgres writes every change it makes to the database) and persists it to a sink (e.g. Redis).

With a simple, detailed prompt in Cursor, I was able to mostly one-shot the realtime CDC and gateway code (with some minor tweaks and a thorough code review). This includes two very simple services:

  • a singleton CDC service that reads the WAL and publishes it to Redis Pub/Sub (can get more durability with Streams, but at the cost of complexity)
  • a horizontally autoscaling gateway service that handles JWT auth and websocket connections from web clients.

Both services are in golang, and I especially love the autoscaling gateway part. The web client subscribes in the same way it used to with Firestore, and gets back data in the same format. The client sees "usersPublic/" and "usersPrivate/" as two separate collections with separate permissions. The gateway is responsible for translating that request to the underlying Postgres tables (in this case users), and validating that the client has permission for that data.

This works shockingly well. The entire pub/sub system only costs us about $0.50 / month in compute + egress costs, and doesn't have many moving parts that can break (the CDC, gateway, and client code are all simple with no third-party dependencies outside of Redis / Postgres).

From a DevEx perspective, the new system is arguably simpler, too: developers don't need to know how to model data in Firestore or Firestore security rules. As long as they understand SQL, they can quickly make an edit to our schema.sql, add or modify a route in all.go, and they're all set. The documentation is also strong enough that LLMs have an easy time following it and making changes. From a security perspective, this is also a win as we're no longer exposed to a dangerous Wallet DoS attack since our new infra is fixed-cost and insanely cheaper.

In all, including the Postgres instance and daily S3 backups, we managed to go from $550 / month to just $40 / month in database infra costs. We also reduced the client bundle size by ~100KB, which is nice.

Now that I had the new system, I tested locally to make sure everything was working as expected. Then, I deployed it live. If there were any issues, it's easy to rollback the client since Firestore is still the source of truth at this point.

Future Work

Now that the pub/sub is working, the final scary step is to deploy a backend change that swaps out Firestore writes for Postgres writes directly. I'll give it a few more days to see if any unexpected issues arise. Otherwise, this is where Postgres will become our source of truth. After this is done, we can delete the PostgresSynchronizer and Firestore -> Postgres backfill logic, which will be bittersweet. I really like this code, but rules are rules: if the code is unused, it gets deleted. Besides, it'll always be in the git history if I really need it.

In the future, I'd also like to support patches in the gateway. Right now, the gateway sends back the entire document as JSON for every update. It's a bit wasteful, but our egress is effectively unlimited thanks to being on Hetzner. I'd implement this now, but I can't justify the added complexity quite yet.

8 Ball Pool online multiplayer billiards icon