Đã đăng vào thg 6 29, 3:36 SA 6 phút đọc

Self-hosting n8n in production: the ops tax the pricing page doesn't show you

n8n is "free if you self-host." After running 10 workflows on one box for months, here are the sharp edges that free actually buys you — and the ones worth budgeting for.

The headline number for n8n is that the Community edition is free: self-host it and you get unlimited workflows and unlimited executions for the price of a server. That part is true. We run our entire content pipeline on it — 10 workflows, around 209 nodes, on a single small 4-vCPU box, in Docker next to a Postgres database — and the software license cost is exactly zero.

But "free if you self-host" hides a second invoice that nobody itemizes: the operational one. The bill isn't where n8n costs you. The operations are. After months of running it unattended in production, here's the self-host tax — the sharp edges that don't appear on any pricing or marketing page, with the workarounds we settled on.

What "free" actually buys you

Standing n8n up is genuinely easy: a single Docker command plus a Postgres database, and you're on the canvas in about half an hour. The build loop is pleasant — drop a node, run it in isolation, inspect its output, then wire the next one. None of that is the tax.

The tax is everything that comes after "it works on my canvas": you now own the host, the database, the upgrades, the backups, and the monitoring that a hosted tool quietly absorbs into its price. On a single box that's cheap in money and not-free in attention. Below are the specific edges that cost us an afternoon each the first time.

Edge 1: upgrades can break things, so treat them like deploys

This is the real self-host tax in one sentence. A major-version upgrade is not a no-op. There's a well-documented community case where upgrading a self-hosted instance to v2.x in queue mode pushed the execution failure rate from 1–2% up to 5–20% with no infrastructure change — traced to a removed retry variable. Whether or not that exact regression is patched today, the lesson is permanent: major-version upgrades deserve a staging test before they touch production.

What makes this survivable instead of catastrophic is the single best property of n8n for an ops-minded team:

Edge 2 (the upside): your workflows are JSON, so they live in Git

Every workflow exports to a plain JSON file. That one fact changes how you operate n8n entirely. Our automations live in version control next to the rest of our code — each workflow committed, changes reviewed in a pull request, deploys done by a script instead of clicking around a dashboard.

It's also your rollback. When an upgrade goes wrong, you pin the previous container image and your automations come back exactly as they were, because their definitions were never trapped inside the app's database in the first place. A bad upgrade becomes an inconvenience, not an outage. Zapier and Make keep your logic locked in their dashboards with no real diff and no revert; once you've worked the Git way, going back feels reckless.

Edge 3: strict mode blocks `$env` in expressions

Here's the first gotcha that the marketing pages will never mention. You'll reasonably try to read a secret into an HTTP header with an expression:

Authorization: Bearer {{ $env.RENDER_TOKEN }}
// → "access to env vars denied"

n8n's strict mode refuses it. The fix isn't to loosen a setting — it's to stop fighting the model: store the secret as a predefined credential (an httpHeaderAuth credential, in this case) and reference that from the node. n8n deliberately keeps credentials out of free-text expressions, and once you understand that, the credentials system is actually one of its better designs — set a service up once, reference it everywhere, rotate it in one place. But the first time, you lose an afternoon to a one-line error message.

Edge 4: the Code-node task runner has a hardcoded allowlist

Want to use a Node built-in like crypto inside a Code node? You'll find the environment variable you'd expect to control it — and discover it does nothing, because the task runner ships with a hardcoded built-in allowlist that overrides the env var. Enabling the built-in means editing a config file on the host, then restarting the runner. It's the kind of thing you only learn by hitting it, because the obvious knob is a decoy.

Edge 5: webhooks need a manual re-save after you toggle a workflow

A small one you simply learn to expect: after you toggle a workflow active or inactive, its webhook sometimes doesn't re-register until you open the editor and manually re-save it. Harmless once you know it; baffling the first time a "live" webhook returns 404 right after you re-enabled it.

Edge 6: the Postgres JSONB trap that passes every test

n8n's Postgres nodes return JSONB columns as real JavaScript objects, which is lovely — until a query that omits a JSONB column leaves a downstream Code node referencing undefined rather than an empty value. It's the classic shape that passes in testing and fails on the one production row that's different. The fix is boring discipline: have your SELECT explicitly include every JSONB column anything downstream will touch, even the ones you "don't need" on the happy path.

Edge 7: a single box has a ceiling, and you will meet it

Self-hosting on one server is cheap and simple right up until two heavy jobs want to run at once. On our small box, two headless-Chromium renders cannot coexist — the machine simply can't take it — so we serialize them behind a Postgres advisory lock: each render grabs a named lock before it starts and releases it when it's done, turning a crash-or-thrash situation into an orderly queue of one.

-- Serialize an expensive job across workers on one box
SELECT pg_advisory_lock(hashtext('chromium_render'));
-- ... run the single heavy render ...
SELECT pg_advisory_unlock(hashtext('chromium_render'));

The same ceiling shows up on deploys: a heavy rebuild can saturate the CPU enough to cause transient timeouts in live workflows mid-deploy, so you learn not to ship during a busy window. The real fix for outgrowing one node is queue mode — separate worker processes plus a Redis instance — which is exactly the configuration the v2.x reliability thread above shows can go sideways. n8n scales fine; the point is that the moment you pass a single box, the ops work steps up a level, and that's effort a hosted tool absorbs for you.

What keeps it all in the green

None of the above stopped n8n from being reliable for us. Pulling our live execution history from the n8n API, the last 93 retained executions came back 100% success, with no failures in the window. A big part of that is by design and cheap to set up: a single dedicated error-trigger workflow catches a failure anywhere in the pipeline and fires a Slack alert, so a problem surfaces in seconds instead of as a silent gap discovered days later. The platform hands you the hook; you decide how loud the alarm is.

So is the self-host tax worth paying?

If you have technical hands, yes — overwhelmingly. The compute bill for our whole pipeline is one small cloud box, a rounding error next to a stack of hosted automation seats, and every edge above is a one-time afternoon, not a recurring wound. The free Community edition really is the full product, not a stripped trial: unlimited executions, your keys and data never leaving infrastructure you own.

But "free" is only free if your time is. If nobody on the team wants to patch a host, test an upgrade in staging, or read an execution log node by node, the honest move is to pay for n8n Cloud (from €20/mo for 2,500 executions) and let them run the servers — or use a fully hosted tool and skip the question entirely. The self-host tax is real; it's just bounded, and for a technical team it buys ownership nothing else in the category offers.

I went deeper on who n8n is and isn't for — the learning curve, the fair-code-vs-open-source distinction, the full reliability data, and how it stacks up against Zapier and Make — in our full n8n review. The one-line version: the bill is the cheap part; budget for the operations, and it's the best automation tool a technical team can run.