Đã đăng vào thg 5 9, 9:13 SA 25 phút đọc

👨‍💻 The CTO Playbook 📘: From Best Builder to Best Bet - Part 2 ♟️

MayFest2026

A deep, opinionated, practical guide for the engineer-leader who has just been handed (or is about to be handed) the entire engineering organization. The mental models, decision frameworks, hiring tactics, board interactions, and anti-patterns that separate the CTO whose company outlearns the market from the one whose company stalls. Grounded in 2026 reality — AI-leveraged engineers, smaller teams per dollar of revenue, distributed-async by default, post-ZIRP cost discipline, and a regulatory surface that didn't exist five years ago.

If you read only one section first, read §2 Mindset, §4 The CTO/CEO Partnership, §7 Org Design, and §16 The Operating Cadence. Everything else is the implementation of those four.

Companion to 🧑‍💻 The Tech Lead Playbook: From Best IC to Multiplier 🚀 (the level below — read it first if you skipped the TL years), 🚀 The SaaS Template Playbook 📖 (how to build), 🤖 The AI SaaS Playbook (Practical Edition)📘 (AI overlay), 🦸 The Solo-Founder Playbook: Zero Hero 🚀 (the founder context), and 🏗️ Building High-Quality AI Agents 🤖 — A Comprehensive, Actionable Field Guide 📚 (agentic systems). This one is for the technical leader of an engineering organization of 10–250 engineers at a startup, a scale-up, or a fast division inside a larger company.

📋 Table of Contents

⚡ Read This First
🧠 The CTO Mindset
🎭 The Five CTO Archetypes
🤝 The CTO/CEO Partnership
🚪 The First 90 Days
🧭 Setting Technical Strategy
🏗️ Org Design
👑 The Leadership Team
🧑‍🔬 Hiring at Scale
📈 Performance, Comp & Calibration
🏛️ Architecture at Org Scale
🤖 The AI Strategy (2026)
🛡️ Security, Compliance & Risk
💰 Budget, Cost & Vendor Management
🏢 Stakeholders: Product, GTM, Legal, Finance, People
⏱️ The Operating Cadence
🔥 Incidents & Crisis at Exec Level
🏦 The Board & Investors
💬 Communication at the CTO Level
🧬 M&A, Acquihires & Integration
⚠️ The CTO Anti-Pattern Catalog
🗺️ The Phased Roadmap (Day 1 → Year 5)
🚪 When to Leave, When to Stay
📋 Cheat Sheet & Resources

Section 1 -> 8: Read Part 1 here https://viblo.asia/p/the-cto-playbook-from-best-builder-to-best-bet-part-1-Nj4vg8RqJ6r

9. 🧑‍🔬 Hiring at Scale

You don't write all the rubrics. You don't sit on every loop. But the hiring engine is your problem and you must own its outcomes.

9.1 The hiring funnel as a system

Treat hiring like a product. Measure every stage. Iterate.

Stage	Healthy conversion (mid–senior eng)
Sourced → recruiter screen	25–40%
Recruiter screen → tech screen	40–60%
Tech screen → onsite	30–50%
Onsite → offer	25–40%
Offer → accept	70–90%

If any stage is far off these, that's the bottleneck. "We're not hiring fast enough" is a useless diagnosis. "Our offer-accept rate is 50%" is actionable — comp is off, or the close is weak.

A weekly hiring scorecard:

Open roles: N
Active in pipeline: N
Recruiter screens this week: N (target N)
Onsites: N (target N)
Offers: N
Starts: N
Avg time-to-hire: D days (trend)
Top 3 funnel issues:

You read it weekly. Your VPE and recruiting lead own the actions.

9.2 What the CTO does in hiring (vs delegates)

You do:

Set the bar. Approve every leveling rubric, every onsite format, every interview question that goes into rotation. The bar drifts unless you watch it.
Hire your direct reports. Personally, deeply.
Close offers for principal/staff/director and above. A 30-min call from the CTO closes 10% more offers.
Calibrate. Sit on a hiring debrief monthly. Read every offer-decline reason. Re-read your loop's calibration every 6 months — it drifts.
Set the comp philosophy. (See §10.4.)
Be the public face for hiring brand. Conferences, podcasts, your written work, candidate-facing docs.

You delegate:

Loop ownership for non-leadership roles.
Recruiter management.
Day-to-day pipeline operations.
Most reference checks.
Written offer terms.

A CTO who's on every onsite is a CTO who's not doing the CTO's job. A CTO who's on no onsites at >50 engs is a CTO who'll wake up in 6 months wondering why the bar dropped.

9.3 The leveling system

Every engineering org >25 engineers needs an explicit leveling rubric. Without one, comp drifts, promotions feel arbitrary, and recruiting is chaotic.

The minimum-viable rubric:

Level	Common title	Scope	Autonomy	Influence
L2	Eng I (junior)	A task	Daily guidance	Self
L3	Eng II (mid)	A feature	Weekly guidance	Self + reviewers
L4	Senior	A project	Goal-level guidance	Their team
L5	Staff	A system or domain	Strategic alignment	Multiple teams
L6	Principal	Multiple systems / org-wide capability	Co-creates strategy	The org
L7	Distinguished/Fellow	Industry-grade impact	Drives strategy	Industry

For each level, write a 1-page rubric: scope, complexity, autonomy, influence, mentoring, communication. Same rubric for IC and management at each level (with appropriate manager-track facets). Calibrate twice a year.

The leveling rubric you steal from another company without rewriting will not fit you. Spend the 2 weeks to write your own.

9.4 Hiring loops in the AI era (2026)

Today, every engineer interviews with AI assistance available. Loops written for 2019 don't work anymore. The bar moved.

Don't ask:

"Implement linked-list reversal." (AI does this trivially. You're now selecting for typing speed.)
"Recall the syntax of X framework." (AI knows it.)
"Do this 4-hour algorithm puzzle." (Selects for the wrong skill.)

Do ask:

Code-review interview. Show a 200-line PR (some good, some subtly broken). 45 minutes: walk me through what you'd accept, reject, or push back on. This is the moat right now.
Spec-and-build interview. "Here's a fuzzy product requirement. Spec it as if you were briefing an AI agent. Then implement, with AI assistance allowed, with me observing your judgment." Score on spec quality and where they reject AI suggestions.
System design with cost. "Design X for 100K customers. Now design it for $200/month of infra." Cost-aware design separates senior from staff today.
Postmortem interview. "Tell me about a time something broke in production that you owned. Walk me through what you missed, what you learned, what you changed." Self-awareness is the senior signal.
AI fluency check. "Show me your AI-augmented workflow on a real task." (Some companies still skip this; they'll regret it by 2027.)

Live coding is fine but should be calibrated to judgment not typing: allow AI, observe how they use it, what they reject, when they read documentation, when they ask clarifying questions.

9.5 The closing playbook

Once you decide yes, call the candidate within 24 hours. Top candidates are in 2–3 loops. The slow process loses every time.

A standard close call:

Lead with enthusiasm. Specific. "Your design-doc thinking in the system design round was the strongest we've seen this year."
Walk the offer. Verbally; don't email-send. Numbers, equity, vesting, sign-on, comp ladder context.
Ask what would make this a yes for them. "What's the hardest decision in this for you?"
Address it. Not always with money — sometimes with team match, project, location flexibility.
Set a decision date. Realistic, not pressured.
Stay in light contact. Send the team's deck, a relevant blog post, an offer to chat with their potential teammate.

Negotiate honestly. If your bands are real, defend them. If they're flexible, be transparent. Candidates remember the posture of the negotiation more than the dollars; you're hiring someone who will negotiate inside the company for years.

9.6 Hiring brand — the multi-year compound

Your hiring brand is what candidates think of you before they apply. Built over years; lost in months.

Levers:

Engineering blog with real content. Not marketing fluff. Real technical posts from real engineers. 1/month minimum.
Open-source contributions — even small, even from individual engineers.
Conference talks — internal and external, by your engineers (not just you).
Glassdoor / Levels.fyi management. Don't game; respond honestly.
Alumni relationships. People you let go gracefully are your best long-term recruiters.
Candidate experience. A clean rejection letter beats a slow ghost. A detailed onsite debrief beats a cold "you weren't a fit."

The CTO who treats hiring brand as a slow-compounding asset will out-hire competitors with deeper pockets in 24 months. The one who treats it as a marketing problem will spend 5x and hire half as well.

9.7 Hiring across regions

Most companies now hire across at least 2–3 regions. You'll wrestle with:

Comp parity vs locality. No clean answer. Most healthy companies pick "leveled global comp with adjusted bands" — same level same range, with regional cost-of-living tiers.
Time-zone overlap norms. Aim for 4 hours of overlap per pair. Hire with this constraint explicit.
Cultural translation. A "senior engineer" in different regions has different norms. Calibrate carefully; don't import bias.
Tax & legal complexity. Use an EOR for the first few hires per country; in-house entity at ~10 employees per region.
Travel budgets. A team that never meets in person degrades. 2x/year offsites for fully-distributed teams; budget for it from day 1.

Async-first culture (see §16.5) is non-negotiable for cross-region orgs. Companies that are async-second and time-zone biased lose international talent in 12 months.

9.8 Onboarding

Hiring is 60% of the bet. Onboarding is the other 40%. Most engineering orgs underinvest in onboarding by an order of magnitude.

A real onboarding plan, by week:

Week 1: environment, access, intro 1:1s with 6+ people, read strategy doc + last 3 design docs + last 3 postmortems. Ship 1 trivial PR. No expectation of feature output.
Weeks 2–4: owned but small task. Daily standups. 1:1 with EM. 1:1 with onboarding buddy. Read deeper into one system.
Month 2: owned medium task. Lead 1 design discussion of their own work. Write 1 doc that updates the codebase's collective knowledge.
Month 3: owned project end-to-end. By end of month 3, fully-functional team member.
Month 6: stretch project. By month 6 you should be able to write a clear performance note that says either "exceeds expectations" or "needs intervention."

Each new hire has a written 30-60-90 plan signed by them, their EM, and their buddy. Reviewed at each milestone. Most hires that struggle at month 6 had a bad month 1 nobody caught.

9.9 The CTO as recruiter

You will be in active recruiting conversations every week, forever. Treat it as part of the job, not a tax:

1 candidate dinner per week (or a coffee, or a video call) with a senior or leadership candidate.
2–3 "alumni catchups" per quarter — the people you used to work with, loosely staying in touch.
1 conference / event presence per quarter where you might meet candidates.
Your written work and public profile is part of the funnel; treat it accordingly.

The CTO who recruits 2 hours/week wins the talent war over years. The one who only recruits when there's an open role hires from a worse pool every time.

10. 📈 Performance, Comp & Calibration

The calendar of consequence. Twice a year, sometimes four times, the whole org's compensation, leveling, and performance are decided. Most CTOs underweight how much of their leadership credibility is built or lost in these cycles.

10.1 The performance review philosophy

Your written performance philosophy, in a paragraph, posted internally:

"We give specific, written, evidence-based feedback. We give it twice a year formally and continuously informally. We never let an annual review surprise an engineer about their performance. We compensate at the top of our band for top-of-band performance, mid for mid, and have hard conversations early — not at review time."

Then live by it. The single most corrosive thing in an engineering culture is a leader who says "we give continuous feedback" and then drops a "you're underperforming" review on someone in November.

10.2 The cadence

A standard cycle that works:

When	What
Continuous	1:1 feedback, in the moment, every week
Quarterly	Lightweight check-in: am I on track for review? Any course-correct?
Twice a year	Full review: written self-assessment, peer feedback, manager assessment, calibration
Annually	Comp change tied to review; equity refresh; promotions

If you're at <50 engineers, run lighter (1× annually) but never skip the calibration.

10.3 Calibration — where leadership earns its money

The 2-day cycle every 6 months where directors and EMs come together with you and the VPE to calibrate ratings, promotions, and comp. This is where your leveling system either holds or collapses.

The format that works:

Each manager prepares written assessments + level proposals for their team.
Pre-read circulated 48 hours ahead.
Day 1 (4 hours): IC track calibration. Each "edge" case (proposed promo, proposed exceed-expectations, proposed below-bar) gets 5–10 minutes. Group decides.
Day 2 (3 hours): manager track + comp. Promo decisions for managers; comp adjustments.
Final ratifications by you + VPE that evening.

The room norm: "We're calibrating against the rubric, not against personal advocacy. The strongest written case wins, not the loudest voice." Repeat at the start of every session.

Write down every contested decision and why it landed where it did. The calibration record is the artifact for next cycle and for any disputed review.

10.4 Comp philosophy

You need a 1-page written comp philosophy, ratified by the CEO and CFO. Without it, every comp conversation is an ad-hoc negotiation and bias creeps in.

The minimum-viable:

COMP PHILOSOPHY

We pay at the 65th percentile of [target market] for our stage.
Our bands are:
  L3: $X–$Y base / $Z equity over 4y
  ...
Annual increases are tied to performance ratings.
Refresh equity is granted at year 2 for "meeting" or above.
Promotions move you to the new band's midpoint.
We do not counter-offer for retention; we re-set bands annually.
Bonuses are formula-based, not discretionary.

Decide each line deliberately. The "we do not counter-offer" rule especially — counter-offers are short-term wins and long-term cultural toxins.

10.5 Promotion mechanics

Three rules:

Promote by evidence, not advocacy. A documented track record of operating at the next level for ≥6 months. Not "they're ready." They have already been doing the job.
Promote at level boundaries, not annually for everyone. Most engineers don't get promoted in any given year; that's correct.
Communicate the gap, not the negative. Engineers don't get promoted not because they're bad but because the gap to the next level isn't yet closed. Frame as growth path, not deficiency.

The promo packet:

Scope (now vs 12 months ago)
Impact (specific, dated, quantified)
Influence (mentorship, design leadership, cross-team work)
Examples (3–5)
Gaps that closed since last cycle
Recommendation

Save evidence year-round. Promo cycle is not the time to scramble for examples.

10.6 The "regrettable attrition" metric

Track who quits and bucket them:

Regrettable: strong or top performers leaving for a competitor or growth move.
Neutral: mid performer moving on for life reasons.
Welcome: a person whose performance was always going to result in a transition.

Regrettable attrition rate is your most important talent metric. >10% annual is a fire; >15% is a four-alarm fire and the CEO should know. Below 5% is great; below 2% suggests stagnation (people aren't growing into their next opportunity).

The most predictive leading indicator: comp drift. When your bands are 1+ years out of date, you're paying 15% under market and your best engineers are taking calls. By the time the resignation hits, it's months too late.

10.7 Performance issues — the gradient

Same gradient as in techlead_playbook.md §15.4, scaled up:

Severity	Signal	CTO response
Soft	Off-week	Trust the EM; you don't need to know
Pattern	4+ weeks below bar	EM addresses; you're informed; written notes start
Hard	Multi-month underperformance	EM + People partner formal plan; you ratify
Leader-grade	An EM/director failing	You handle directly. Don't delegate.

The CTO failure: getting drawn into "soft" and "pattern" cases instead of trusting your EM layer. If you're 1:1ing with a struggling IC, your EM has either failed or you've taken the work from them. Both are wrong.

10.8 The retention conversation

When you sense someone might be considering leaving (energy drop, vague answers, sudden interest in random recruiters):

Have the conversation early. "I want to make sure you're in the right role for the next year. What does that look like for you?"
Listen for: scope, learning, comp, manager, mission alignment, life. Most attrition is one or two of these.
Be honest about what you can and can't change.
Don't make a counter-offer at the resignation moment. Make the right offer six months earlier.
If they leave, leave the door open. They might come back; they will refer.

A CTO who runs explicit retention conversations 2× a year with their top 10–20% retains them. The one who waits for the resignation has already lost.

11. 🏛️ Architecture at Org Scale

Architecture stops being "what's the right design for this feature" and becomes "what's the system of constraints that lets 50 engineers ship without colliding with each other."

11.1 The architecture function — who owns it

Three patterns that work:

CTO + lieutenants. You and 2–3 principals/staff own architecture. Works at <80 engineers.
Architecture Review Board (ARB). You + 4–6 principal-level engineers from across the org meet biweekly to review designs above a threshold. Works at 80–250.
Chief Architect role. A dedicated principal-level role partners with you. Works at 250+.

The pattern that doesn't work: no one owns architecture, every team decides their own. By month 18 the system is a Frankenstein.

11.2 The architecture review ritual

The biweekly architecture review is one of the highest-leverage rituals in a tech org. Format:

Cadence: every 2 weeks, 90 min, leadership-level reviewers
Threshold to bring: any design that
  - touches >1 service or team
  - changes a public API
  - introduces a new vendor or datastore category
  - estimated >2 weeks of work
  - is irreversible
Pre-read: 1-page proposal at least 48h ahead
In session:
  - 5 min: author presents the *trade-off space*, not the solution
  - 15 min: questions + critique
  - 5 min: decision (approve / revise / kill / spike)
  - Written decision recorded same day

The room norm: "We are looking for the strongest argument we have not yet heard, not for consensus." Repeat at the start of every session.

The architecture review is also the single best leadership-development venue for senior ICs. Watching a principal eng push back well on a director's proposal teaches every junior in the room more than 5 books.

11.3 Standards vs guidelines vs forbidden

Three buckets, made explicit:

Standards (you must use these unless you have a written exemption): the language(s), the database, the cloud, the auth provider, the observability stack, the coding style.
Guidelines (default; deviate if you have a reason and write it down): library choices, framework patterns, testing patterns, deployment patterns.
Forbidden (don't use without CTO approval): a new datastore category, a new language, a new auth provider, anything that creates a new compliance surface.

Publish the list. Re-ratify yearly. Without it, every team picks their own and your platform team weeps.

11.4 Build vs buy vs partner

The single most consequential architectural decision pattern after Series A. The framework:

Factor	Build	Buy	Partner
Core to differentiation	✅	❌	❌
Commodity (everyone has one)	❌	✅	maybe
Available, mature vendors	❌	✅	✅
Team has expertise	✅	❌	maybe
Compliance / security blocking	maybe	maybe	✅
5-year cost favors build	✅	❌	maybe
Speed-to-market is critical	❌	✅	✅

The default for a startup CTO today: buy 80%, build 20%, partner the rest. Most companies build 50% and spend 30% of engineering capacity rebuilding things that have $50/month vendors.

The exceptions where you build:

The thing is your unique value prop.
The vendors are expensive enough that build pays back in <18 months at your scale.
Compliance constrains where data can live.
A vendor outage takes down your business and there's no failover.

When in doubt, buy and revisit in 2 years. A wrong "buy" is reversible; a wrong "build" sucks 5% of your team forever.

11.5 The "boring tech" rule

Choose Boring Technology, by Dan McKinley, is one of the most CTO-relevant essays in the industry. The summary, applied:

You get a fixed number of "innovation tokens." Spend them carefully.
Most of your stack should be 5+ year old, well-documented, well-staffed-for technology.
The places to spend tokens are where your unique technical advantage lives.

A 2026 stack for a default SaaS startup:

Language: TypeScript and/or Go and/or Python (pick 1–2).
Database: Postgres. Always.
Cache/queue: Redis.
Compute: Cloud Run, Fly, Render, or AWS ECS Fargate.
Frontend: React + Vite.
Auth: Vendor (Clerk, WorkOS, Auth0, Stytch).
Observability: Vendor (Datadog, Honeycomb, Grafana Cloud).
CI: GitHub Actions or Buildkite.
AI: Anthropic, OpenAI, AWS Bedrock — model-agnostic abstraction layer.

If your stack has 3+ items unusual relative to this default, every one of them needs a written justification. Most don't have one and the CTO inherited the choices.

11.6 The migration pattern

You will run major migrations. Database, cloud, language, framework, vendor. Most of them go badly because they're under-scoped.

The migration playbook:

1. Strategy memo — why migrating, what we expect, exit criteria, kill criteria.
2. Phase the migration — never big-bang. Strangler pattern is the default.
3. Dual-write or dual-read first. Validate against the old system.
4. Migrate non-critical workloads first. Get reps.
5. Migrate the critical workload.
6. Run both systems for ≥30 days.
7. Decommission with a deprecation date and a written all-clear.
8. Postmortem the migration. What did we learn? What broke?

A migration estimated at 1 quarter usually takes 2. Plan for it. Communicate the expanded estimate to the CEO before the slip happens, not after.

11.7 The "every system has 1 systemic risk" exercise

Every quarter, list the top 3 systemic risks across the org. Examples:

"Auth depends on a single vendor with no failover. Outage = full downtime."
"Our primary database has no read replica."
"Our deploy pipeline depends on one engineer's knowledge."
"We have no kill-switch for a runaway AI cost."
"Our backup strategy was last tested 18 months ago."

Pick 1 to fix this quarter. Track in your scorecard. The CTO who fixes one quietly per quarter for two years has eliminated 8 silent killers; the one who waits will eat them all in a single bad week.

11.8 Documentation as architecture

A subtly important call: documentation quality is part of architecture quality. A perfectly-designed system nobody can reason about without the original author is worse than a moderately-designed system every engineer can reason about. This matters double now — AI agents work better on well-documented codebases.

The minimum bar:

Every service has a 1-page README: what it does, why it exists, who owns it, how to run it locally, key contacts.
Every public API has machine-readable docs (OpenAPI, gRPC, etc.).
ADRs in /docs/adr/ per service, plus a central org-wide ADR repo.
A CLAUDE.md (or equivalent) at root and per major package — see saas_template_playbook.md.
A monthly "stale doc" sweep — find docs that contradict the code and either fix or delete.

12. 🤖 The AI Strategy (2026)

Every CTO playbook written before 2024 is partially obsolete on this dimension. Companies whose CTO got the AI strategy right in 2024–2025 are now meaningfully ahead. Companies whose CTO didn't are pricing in the gap.

12.1 The two AI questions every CTO answers

There are two distinct questions, often conflated:

AI for our customers — what AI capabilities do our customers want from our product? What do we build in, what do we partner for, what do we wait on?
AI for our engineers — how do we use AI internally to ship faster, run cheaper, hire smarter?

You need a written stance on each. They overlap (the codebase you build for AI customers is also a codebase that AI agents work on), but the strategies, vendors, costs, and risks are different.

12.2 AI for customers — the strategic stance

The CTO + CPO co-write a 2-page AI product strategy. Sample structure:

# AI Product Strategy — Q[N] 2026

## Customer thesis
Who wants what AI capability, with what willingness to pay,
within what regulatory/data constraints.

## Our position
- Be: the AI-native [billing|reporting|workflow] platform for [segment]
- Avoid: building general-purpose AI; building model providers; building a chatbot if customers don't want one

## What we'll build
- Capability A — leverages our unique data
- Capability B — automates a workflow our customers do daily
- Capability C — lowers cost of customer-support workload

## What we'll buy
- Foundation models — we use [Anthropic/OpenAI/Bedrock] via abstraction layer
- Embeddings & vector — vendor X
- Orchestration framework — vendor Y, or in-house thin layer

## What we won't do this year
- Train our own foundation model
- Build a fully autonomous agent product
- Add AI to features customers don't ask for

## Risks
- Hallucination in regulated workflows
- Cost spiraling on a popular feature
- Vendor pricing changes
- Data governance (customer data, model providers)

## Success metrics
- Adoption (X% of accounts using feature Y)
- Retention lift in AI-feature cohort
- Cost per AI-call (declining)

The structure is more important than the specifics. Without it, your team builds 5 random AI features in parallel and ships 0 useful ones.

12.3 The build/buy/wait decision for each capability

For each AI capability your product might include, decide:

Decision	When
Build	Capability is core differentiator AND we have unique data AND build cost recovers in <18 months
Buy / wrap	A vendor solves it; you wrap their capability with your data + UX
Wait	Capability isn't mature enough; building now means rebuilding in 12 months at higher cost

The most common 2024–2025 mistake: building capabilities that vendors caught up to in 6 months. Today's mistake: waiting too long on capabilities that are now table stakes.

12.4 The model abstraction layer

Build (or use) a thin internal layer that lets your code switch between model providers without rewriting. Key reasons:

Pricing volatility. Models drop in price every 6 months; you want to take advantage.
Capability shift. Best model for use case X changes quarterly.
Vendor risk. A single-vendor outage is now a customer-impacting event.
Compliance variation. Some customers require specific vendors or regions.

Don't over-engineer this layer. A 200-line wrapper around the SDK calls is enough at most stages.

12.5 AI for engineers — the internal stance

Engineers without effective AI workflows are now 30–50% less productive than those with. The CTO must own the internal AI tooling stance.

Decisions you must make:

Approved IDE assistants. Claude Code, Cursor, Copilot, etc. — pick 1–2, license for everyone.
Approved agentic tools. Which agents are allowed, in what scopes, with what guardrails.
Approved models for code generation. Often distinct from product models for licensing/data reasons.
Data hygiene rules. No customer data in prompts. No secrets in prompts. No proprietary code into consumer-tier endpoints. Written policy, signed by every engineer.
AI-generated code review bar. Same as human code, no free pass. The engineer who shipped it owns it.
Mandatory AI fluency. Hire for it; coach to it. An engineer at >L4 today should be visibly AI-fluent.

A standard package: an IDE assistant for everyone (~$30/eng/mo), an agentic tool license for senior+ (~$100–500/eng/mo for premium tiers), a written policy, a quarterly tooling review. Total cost for a 50-person org: ~$50K–$250K/year — a tiny fraction of the productivity it returns when used well.

12.6 Coding agents at the org level

Beyond IDE assistants, coding agents (autonomous or semi-autonomous: Claude Code, Codex CLI, Cline, Aider, etc.) are now production engineering tools. The CTO call:

Where they run. Local-only, sandboxed, or in a managed cloud. Pick a default.
What they can touch. Read-only on master; can branch but not merge; can merge with human review; can merge autonomously (rare; usually only for tightly-scoped tasks). Write the policy.
Cost ceilings. Hard caps per engineer per day. Per-task budgets.
Audit trail. Every agent run logged, attributable to a human.
Failure modes. What does the team do when an agent makes a bad commit? Revert pattern? Postmortem threshold?

A surprising number of CTOs still treat agents as a tinkering thing. The companies whose CTO institutionalized them in 2025 are now shipping 1.5–2× the work per engineer.

See building_high_quality_ai_agents.md for the deep dive on agent architecture and claude_code_zero_to_hero.md for tactical use of one specific agent.

12.7 The AI cost problem

AI costs scale unpredictably. A $200/month feature can become a $20K/month feature in a viral week. CTOs in 2024–2025 got bitten repeatedly by this.

Defenses:

Per-customer cost telemetry from day 1. You must know cost-per-call, cost-per-customer, gross margin per AI feature.
Hard limits. Per-customer daily limits. Per-feature monthly limits. Auto-shutoff thresholds.
Caching aggressively. Prompt caching, embedding caching, response caching. Often the difference between 30% and 80% gross margin.
Model tiering. Cheap model for 80% of calls; expensive only for the 20% that need it.
Customer-paid AI. Some features are billed-through; the customer pays your AI cost plus margin. Worth designing for.
Quarterly cost-of-AI review. Same cadence as cloud cost review.

A CTO who can't answer "what's our gross margin on AI features?" within 5 minutes is a CTO whose CFO is about to surprise them.

12.8 Hiring for the AI era (recap)

From §9.4: spec-and-design > implementation, code-review > algorithm puzzles, AI fluency required, judgment over typing. Go re-read it.

12.9 What changes when AI is real

Things you didn't have to think about before that you have to think about now:

Compliance for AI (EU AI Act, sectoral rules, US state laws). See §13.
Data governance. What customer data is allowed where. PII into prompts is now a board-level risk.
Model deprecation cycles. A model retires; your customer integrations break. Plan for it.
The "vibe coding" risk. Junior engineers shipping plausibly-correct AI-generated code that subtly fails. Review bar must rise.
Retention risk for non-AI engineers. Senior engineers who refuse to adopt AI tooling become career risks. Coach hard.
Hiring brand. Companies with mature AI tooling for their engineers attract better engineers. Companies that don't lose them.

12.10 The CTO's own AI fluency

You can't lead what you don't use. Block 2 hours/week on AI tooling — your own. A competent CTO is now fluent at:

Drafting strategy memos with AI assistance.
Generating decision option-trees for hard calls.
Reviewing PRs with AI summarization on unfamiliar code.
Using AI agents for code review and small refactors.
Reading AI-generated code skeptically.

A CTO who can't open Claude Code and ship a small change today is a CTO whose technical credibility is on a 6-month decay curve. Practice in private; demonstrate in public when relevant.

13. 🛡️ Security, Compliance & Risk

The thing that's not urgent until it's the only thing. By the time most CTOs take security seriously, they have 6 months of debt to pay down.

13.1 The security maturity curve

Stage	Engineers	Security stance
Stage 0	<10	"We use 1Password and Cloudflare." Mostly true. Mostly fine.
Stage 1	10–30	First security policy doc, MDM, basic SSO, password rotation — minimum viable hygiene
Stage 2	30–80	First dedicated security owner (often part-time or fractional), SOC2 Type 1, vendor reviews
Stage 3	80–200	Dedicated security engineer/team, SOC2 Type 2, IS027001 if international, formal incident response
Stage 4	200+	CISO or head-of-security, security org, mature program, threat modeling, red team

Most CTOs are 1 stage behind where they should be. The cost of the gap shows up either as a customer asking for SOC2 you can't deliver, or a breach you weren't ready for.

13.2 The compliance reality (2026)

The standard SaaS company today juggles:

SOC2 Type 2 — table stakes for B2B SaaS.
ISO 27001 — table stakes if you sell to Europe at scale.
GDPR — required for any EU data subject.
HIPAA — if healthcare-adjacent.
PCI DSS — if you touch payment data directly.
EU AI Act — required if your product uses AI in EU market; tiered based on risk class.
State privacy laws (CCPA, CDPA, etc.) — patchwork US compliance.
Sectoral rules — financial (SEC, FINRA), education (FERPA), public sector (FedRAMP).

Most sub-300-person companies need SOC2 Type 2 + GDPR + (one industry-specific) + (EU AI Act if applicable). Don't chase certifications you don't need — each one costs 0.5–1 FTE-year ongoing.

13.3 The CTO's compliance posture

You don't run compliance. Your head of security or fractional CISO does. But you own the posture:

Compliance is a checkbox, not the goal. The goal is being secure; the checkbox is documentation that you are.
SOC2 = engineering hygiene. Most controls (access reviews, deploy approvals, vuln management, incident response) are things you should do anyway. The framework just forces them.
Treat audits as code. Continuous compliance tooling (Vanta, Drata, Secureframe) reduces auditor cost and forces real controls.
Audit your auditor. A bad auditor is worse than no audit; they sign off on broken controls and you discover the gap during a breach.

13.4 The "what would a breach cost us?" exercise

Once a year, the CTO + head of security + GC + CFO sit down and answer:

What's our most likely breach scenario? (Phishing, credential leak, vendor compromise, malicious insider.)
What's the dollar cost? (Direct: legal, notification, remediation, customer credits, regulatory. Indirect: customer churn, hiring damage, sales pipeline.)
What's the contractual obligation? (SLA credits, breach notification deadlines, customer-by-customer.)
What's the regulatory obligation? (GDPR fines up to 4% of revenue. CCPA penalties. Sectoral.)
What's our preparedness for each? (Run a tabletop exercise. Honestly.)

The answer terrifies most CTOs the first time they do it. That's the point. The honesty drives the security investment that no one funds otherwise.

13.5 The vendor security review

Every new vendor that touches code, data, or production gets a written review:

Data the vendor will receive (categories, volume, sensitivity).
Their certifications (SOC2 report on file, age <12 months).
Their breach history (Google them; check incident archives).
Their data retention and deletion policies.
Their subprocessors (where does your data flow downstream).
Contractual provisions (DPA, SCC, breach notification SLA).

A standard vendor with a current SOC2 Type 2 = quick approval. A vendor who can't produce a SOC2 = thorough manual review. A vendor who flinches at security questions = no.

13.6 The incident response runbook

A separate doc, kept current, drilled twice a year. The minimum:

INCIDENT RESPONSE — abbreviated
1. Detect (alert, customer report, vuln scan)
2. Triage (severity, scope) — paged people defined per severity
3. Contain (isolate, disable credentials, block traffic)
4. Eradicate (remove threat, patch)
5. Recover (validate, re-enable)
6. Communicate (per playbook: customers, regulators, board)
7. Postmortem (within 5 days)

People:
  Incident commander rotation: [list]
  Communications lead: [name]
  Legal lead: [name]
  Customer lead: [name]
  CEO/CTO escalation: [name + paged threshold]

Severity:
  Sev-0: Active breach with confirmed data exfiltration. Page CEO immediately.
  Sev-1: Suspected breach OR confirmed unauthorized access. Page CTO + Legal.
  Sev-2: Vulnerability exploited but no confirmed data access.
  Sev-3: Vulnerability discovered, no exploit yet.

Drill it. Twice a year. Tabletop with the leadership team. Most companies have a runbook that works on paper and falls apart in practice.

13.7 The security hire

When and who:

<30 engineers: part-time security lead among your engineers (with budget for tools + a fractional CISO advisor).
30–80 engineers: first full-time security engineer. Wide brief: tooling, policies, audits, incident response.
80–200 engineers: small security team (2–4) led by a head of security.
200+: dedicated CISO or head of security with a real org.

The first security hire is hard — security people range wildly in shape. You want a generalist with engineering depth, not a paper-policy person. They should be able to read code and write tooling, not just write policies.

13.8 The data protection posture

Above and beyond compliance, the CTO sets the company's stance on data:

What's collected (legally, ethically, operationally).
Where it lives (regions, vendors, replication).
How long it's kept (retention policy per category).
Who can access (role-based, audited, time-bounded).
What's encrypted (at rest, in transit, in use).
What's deleted on customer request (the right-to-be-forgotten workflow).

A 1-page data classification doc: public, internal, confidential, restricted. Each engineer should be able to articulate which category their feature touches and what the rules are. Most engineers can't, which means their CTO never enforced the framework.

13.9 The 2026 AI security overlay

Specific to AI:

No customer PII to consumer-tier model endpoints. Use enterprise tiers with no-training contracts.
No code or secrets in prompts. Coach engineers; enforce in tooling where possible.
Prompt injection threat modeling. Especially for agent-style features.
Data egress monitoring. What's leaving your network into model providers.
AI usage logs. Who, what, when. Auditable.

The breach class of 2026–2027 will be heavily prompt-injection and data-exfiltration-via-agent. CTOs who think about it now will look prescient; the rest will learn the hard way.

(...to be continued...) Read Part 3 here https://viblo.asia/p/the-cto-playbook-from-best-builder-to-best-bet-part-3-kNLr3DPqVgA

This playbook is a living document. The 2026 reality (AI-augmented engineering, distributed-async, post-ZIRP cost discipline, the rising bar on technical writing, regulatory complexity, model-vendor dynamics) keeps shifting. Update yours. Argue with mine. Ship the company that makes the next CTO playbook unnecessary.

If you found this helpful, let me know by leaving a 👍 or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! 😃

Android iOS JavaScript ReactJS