My Mac crashed yesterday.
Not a spinning beachball. Not a frozen app. A full reboot — forced by the OS — while I was mid-session across three active Claude Code instances. After the dust settled I pulled the system logs, traced the exact sequence, and documented it. The short version: three simultaneous Claude Code sessions consumed roughly 1–1.5GB of RAM, the machine was already under memory pressure from a full software stack, Zellij spawned four new shell processes simultaneously, and macOS JETSAM cascaded into a forced reboot at 15:27.
That crash turned out to be one of the more instructive debugging sessions I’ve had in a while — not because of what broke, but because of what it forced me to think through. Specifically: what is actually the right number of Claude Code sessions to run at once?
The answer splits into two very different questions. Can is a systems question — hardware, API limits, spend caps. Should is a human question — cognitive load, flow state, and how well any of us can actually direct AI at the speed it operates.
The “Can” Side: What Will Actually Stop You
1. Your RAM (the one that bit me)
Claude Code is a Node.js process. It allocates memory the moment it starts — roughly 150–200MB at idle, growing to 300–500MB per session as context accumulates, tools run, and conversations grow. That’s before you account for the MCP servers it spawns (Playwright, Perplexity, file servers, etc.), each of which is its own process.
On a machine with 16GB RAM and a full software stack — a browser, Slack, background sync agents, your editor — you’re often starting with 8–10GB already consumed before you open a terminal. Three Claude sessions is another 1–1.5GB. Add Zellij, your editor, and a few build processes and you’re at the edge.
macOS handles this through JETSAM — its memory pressure management system. When available memory drops below a threshold, it starts killing background processes. When it can’t recover enough that way, it can force a system reboot. That’s what happened in my case: not a graceful shutdown, a JETSAM cascade triggered by the combination of existing pressure and a burst of new process spawning.
Practical ceiling: 2 sessions on a 16GB machine with a full software stack. 3 is the cliff.
2. API Rate Limits
Anthropic enforces rate limits at the organization level: requests per minute, tokens per minute, tokens per day — tiered by how much you’ve spent. Running multiple sessions simultaneously multiplies your token consumption in real time. Three active sessions each processing a large file means three simultaneous API calls, each consuming tokens.
If you’re routing through a cloud provider, limits are per-region and per-model. Running multiple heavy sessions can exhaust your regional token quota mid-task, causing one or more sessions to stall mid-execution with a throttling error rather than a graceful pause.
3. Spend Caps (the one most people will hit first)
This is the practical ceiling for most power users before RAM or rate limits become relevant.
Claude Code on the Max plan ($100/month, 5× Pro) has weekly usage quotas that reset every 7 days. Reports from late 2025 showed Max users burning through their weekly Opus quota in 3–4 hours on complex codebases — not 40 hours. Running three simultaneous sessions doesn’t triple your output; it triples your token burn rate while your actual reviewed output is limited by how fast you can read and respond.
If you’re on direct API access, you’re spending against a monthly budget rather than a quota. Three sessions running in parallel on a frontier model at full context can burn $10–20/hour combined. That adds up fast.
The spend cap is usually the first limit you’ll hit in practice, not RAM — unless you do what I did and trigger a crash first.
The “Should” Side: The Human Bottleneck
The systems question has a reasonably clean answer: your hardware and your API tier set a hard ceiling. The human question is murkier, and more interesting.
What the research says about context switching
UC Irvine research puts the average time to regain full focus after an interruption at 23 minutes and 15 seconds. Separate studies show that even a 5-second interruption can triple error rates on the current task. Knowledge workers who toggle between apps frequently lose an estimated 4 hours per week just in micro-recovery time.
This is the baseline before AI enters the picture. Now add the reality of running multiple Claude Code sessions: each session is a stream of tool calls, file edits, approval requests, and output that requires your review. Each time you switch tabs to check what session 2 is doing, you’re paying a context switch tax on session 1 — and vice versa.
The cognitive overhead compounds with the number of sessions. At two sessions you’re context switching. At three you’re triaging. At four you’re mostly watching logs and hoping.
The real question: are you flying the plane or reading the flight report?
Here’s the distinction that changed how I think about this:
Synchronous sessions — where you’re actively directing the agent, reviewing its output, approving tool calls, providing feedback mid-task — have a hard human ceiling of 1–2. Not because of RAM. Because you cannot simultaneously read two streams of reasoning, assess their quality, and make good decisions about both. You will either slow each session down to the point of defeating the purpose, or you’ll approve things you haven’t actually read.
Asynchronous sessions — where you dispatch an agent on a clearly-scoped independent task, let it run to completion, and review the result when it surfaces — have a different limit. The ceiling isn’t cognitive load during execution; it’s review capacity afterward. You can queue 4–5 async agents if each is working on a genuinely independent task (separate repos, separate features, no shared files). But you still can’t review 4 substantial PRs simultaneously at quality.
The failure mode I see most often — and that I fell into — is conflating the two. You start sessions intending to be asynchronous (“I’ll check in on each one”) but end up synchronous (“I keep watching them because I can’t tell if they’re on track”). The result is that you’re partially attending to all of them and fully attending to none.
The right number, practically
- Active synchronous direction: 1, occasionally 2 if the tasks are truly independent and one is in a long-running tool call
- Background async on independent tasks: 2–3, reviewed in batches when complete, not monitored in real time
- Total open sessions: keep it at 2–3 running concurrently regardless of mode; exit sessions when done rather than leaving them idling
The mental model that works: tabs in a browser. You can have 20 open, but you’re reading one at a time. The others are just memory consumption and the illusion of multitasking.
What I Changed After the Crash
After tracing the incident I made two changes to my setup:
At launch: My multi-session startup command now checks memory pressure before opening Zellij. If available memory is below 20%, it blocks the launch entirely. Between 20–35%, it warns and asks for confirmation. Above 35%, it proceeds.
At session start: My session alias now counts active Claude processes before starting a new one. At 2+, it shows you the count and estimated memory in use, and asks if you want to continue.
Neither of these prevents you from running too many sessions. They just make the cost visible at the moment of decision, which is where visibility actually matters.
The Answer
Can: 2 sessions on a 16GB machine with typical background load. 3 is the practical cliff for RAM. Your spend cap will likely bite you before that if you’re on a quota-based plan running heavy tasks.
Should: 1–2 synchronous sessions where you’re actively directing. 2–3 asynchronous sessions where you’ve clearly scoped independent work and will review output in batches — not in real time.
The broader point: AI works at machine speed. Humans don’t. The value of parallel sessions isn’t that you do more things at once — it’s that you can delegate clearly scoped work and redirect your attention to where decisions actually need to be made. That requires discipline about what “clearly scoped” means and honesty about whether you’re actually reviewing or just watching.
Most people who run into trouble with parallel sessions aren’t hitting a hardware limit. They’re hitting a clarity limit — launching multiple agents without a sharp answer to “what does done look like for each of these, and when will I review it?”
That’s the question worth asking before you open the next tab.