GPT-5.4 Is Table Stakes. Routing Work Is the Edge
Frontier models are everywhere now. The win is who decides which expert runs on which thread, without another tab marathon.
Your firm already pays for the best models. So why does Tuesday still feel like a relay race between tabs, drafts, and half-finished threads?
March 2026 brought another wave of announcements that sound like relief: stronger reasoning, bigger context windows, and assistants that stay closer to the tools you already use. OpenAI’s public materials for GPT-5.4 frame the family for professional and agentic work, with explicit emphasis on tool-oriented behavior and large-context reasoning tracks. OpenAI’s introduction of GPT-5.4 and the companion business guide Inside GPT-5: our best model for work read less like a lab report and more like a bet on knowledge work. Third-party coverage in TechCrunch and Fortune reinforced the same storyline: the frontier is not “can it write,” it is “can it own the daily loop.”
Microsoft answered in the same window. A March 2026 Tech Community post describes Copilot in Outlook moving toward more agentic experiences where the assistant asks clarifying questions and updates the mail canvas as you answer, which is a real reduction in copy-paste friction for people who live inside Exchange threads. Read Copilot in Outlook: new agentic experiences for email and calendar alongside Microsoft’s March 2026 Microsoft 365 blog and you get a coherent promise: intelligence plus trust, delivered where mail already lives.
That sounds like progress. It is progress. It also leaves a structural question on the table that release notes rarely answer.
What changed with GPT-5.4 for everyday knowledge work?
GPT-5.4 matters for everyday knowledge work because vendors are no longer arguing that models are too weak for serious drafting, summarization, and tool use. The public positioning is explicitly about professional quality, efficiency, and agentic workflows, with OpenAI also shipping smaller variants aimed at volume workloads. That shifts the bottleneck from “get a decent paragraph” to “decide which expert behavior should run on which artifact, with what evidence, and who owns the outcome.” In plain terms, the model got sharper; your calendar did not get longer. The competitive edge is now routing, not raw eloquence.
Why smarter drafting does not automatically remove tab switching
Here is the uncomfortable part nobody puts on a keynote slide. Broad adoption of generative AI can coexist with uneven captured value. McKinsey’s ongoing State of AI reporting has repeated that pattern: lots of experimentation, fewer organizations scaling measurable impact. One reason is structural. When every vendor ships a smarter default assistant, the work that still leaks out is the long tail: obligation extraction from a messy contract PDF, a careful read on a spearphishy invoice, diligence notes from a forwarded data room link, or a security questionnaire that has to match your buyer’s standards.
Those tasks are not equally well served by one assistant personality tuned for polite business writing. They are different jobs. Treating them like the same job is how professionals end up as human middleware, copying outputs between surfaces because nobody built a clean handoff.
Research on attention fragmentation makes the tax concrete. Gloria Mark’s program at UC Irvine has published work on how information workers switch tasks and how short many focused episodes are; her group’s materials include primary sources such as the CHI 2005 PDF on interrupted work. Managerial writing has tried to translate the same dynamics into practice. Harvard Business Review’s plan for managing constant interruptions is not about models; it is about the cost of being pulled off task. Smarter drafting can make each interruption faster. It does not, by itself, reduce how many destinations you visit to finish one decision.
How Microsoft’s agentic Outlook flows help, and where specialist workflows still leak
Microsoft’s direction is rational for a massive installed base. If Copilot can draft from thread context and reduce round trips, that is worth shipping. The Copilot release plan is the kind of artifact enterprises actually plan around.
The boundary issue is depth. Suite copilots optimize for breadth: one assistant inside one tenant, many scenarios. Professionals often need narrow depth: a fraud read that is willing to say “this is suspicious,” a contract pass that lists obligations in a table your partner can skim, a verification pass that separates what is asserted from what is proven. When those outcomes require different prompts, different guardrails, and different review habits, “one copilot personality” becomes a ceiling, not a floor.
That is not an attack on Outlook. It is an honest map of where work escapes.
One assistant versus many small experts
The alternative model is almost boring, which is why it is easy to underestimate.
Instead of trying to train everyone to steer one mega-assistant through every edge case, you route work to small experts invoked where mail already lives. You send the thread, the PDF, or the pasted clause to a narrow agent behavior, get a structured reply, and keep the conversation in the same channel your firm already uses when something goes wrong.
That pattern shows up when teams compare “one interface versus tool sprawl” in the abstract. For a longer take on why cognitive load spikes when AI surfaces multiply, see AI brain fry is real: why one interface beats a dozen tools. For marketers living the copy-paste version of the same story, 91% of marketers use AI in email. Workflow is the bottleneck. lands in the same neighborhood: capability arrived; coordination did not.
What you can change this week if you refuse to be human middleware
You do not need a manifesto. You need a ledger.
Pick five recurring tasks that still force you out of mail into improvisation: security reads, contract passes, policy explanations, executive summaries of long threads, vendor claim verification. For each one, write down what artifact you start with, what artifact you need at the end, and which step currently happens in a different app because your default assistant cannot hold the frame.
Then test one narrow workflow against that ledger. If the job is “turn this rollout plan into language stakeholders will accept,” Frame AI Adoption is built for that translation job: email frame.ai.adoption@via.email with the messy details and stakeholder worries in the body. If the job is “this email asserts facts that will get forwarded,” Verify Email Claims verify.email.claims@via.email is the kind of narrow check that belongs in the loop before someone hits send. If the job is “this message smells wrong,” Spot Email Scams spot.email.scams@via.email is a specialist verdict channel, not a generic rewrite.
Those are examples of routing, not a claim that mail solves every governance problem. They are also how via.email, an email-based AI agents platform, stays out of your way: specialized agents at unique addresses, no new dashboard to babysit.
The bookmark test
The question is not whether GPT-5.4 is impressive. Capability is becoming table stakes. The question is whether your week gets calmer when models improve, or whether improvement only raises the expected speed of response.
If your answer is the second one, you are not behind on technology. You are ahead on honesty. The next productivity gain is not a hotter model. It is fewer round trips between places where nobody remembers the context.
The future of frontier models is not “one assistant that learns every job.” It is routing work to expert behavior without inventing another daily destination. Sometimes the least glamorous interface is the one everybody already has open.