AI CAPABILITY • LANDSCAPE

AI Landscape Navigator

The landscape is exploding. The gap between movers and waiters is widening.

More capability ships quarterly than used to ship in years. New models, tools, and platforms appear weekly. Competitive advantage is materialising now – not “someday.”

This living reference helps you understand what's out there, what matters, and where we stand. Not exhaustive – navigational.

A Living Reference

This page is updated as the landscape evolves. It reflects our current understanding and experience, not comprehensive market research. We include tools we've used, evaluated, or tracked closely. Last updated: May 2026.

MAY 2026 LANDSCAPE SHIFT

The useful question is no longer: which model is best?

For solo operators and small teams, the practical landscape question is now: which combination of tools gives reliable work at a sensible cost, with enough privacy, portability, and interaction quality for the work you actually do?

Capacity shows up as limits

Compute scarcity appears as rate limits, latency, pricing changes, outages, and tool-routing decisions.

Interaction changes adoption

Voice, screen context, interruption, and live correction make AI easier to use in real working situations.

Portability beats loyalty

The best small-business setup is rarely one model forever. It is a simple, portable harness that can route work well.

AT A GLANCE

The whole landscape in one view

Ten categories, grouped by where they sit in your day-to-day work. Click any card to open the detail below.

Models & Providers

1 category

Building & Creating

2 categories

Working & Integrating

7 categories

The AI models powering everything — who builds them, what they’re good at, and how they compare.

ANTHROPIC

Claude models

Anthropic is the clearest work-AI story of 2026. In a single week in late May the picture shifted again: Andre Karpathy joined the pre-training team to use Claude to accelerate AI research itself; Anthropic reported its first profitable quarter (the first for any foundation lab) on a $44B annualised run-rate; and the SpaceX compute partnership deepened with a $45B three-year contract for Colossus 1 and Colossus 2 capacity (~$1.25B/month, ramping May-June). Capability, harness quality, and compute supply now move together, and compute access is a literal balance-sheet item rather than analyst commentary.

Examples: Claude Opus 4.7, Claude Sonnet 4.6, Claude Haiku 4.5

Strengths:

  • +Frontier reasoning, vision, and document handling (Opus 4.7)
  • +Practitioner-grade tools: Claude Code, Claude Design, Routines, Skills, MCPs
  • +1M token context window; xhigh effort tier; Task Budgets (beta)
  • +SpaceX / Colossus 1 + Colossus 2 partnership ($45B over 3 years) expands inference capacity through 2029
  • +First profitable quarter of any foundation lab (Q2 2026: $10.9B revenue, ~$559M operating profit)
  • +Karpathy joining pre-training team to use Claude to accelerate AI research

Considerations:

  • April 1 2026 supply-chain incident (Claude Code auto-update shipped a hostile package for 3 hours)
  • Premium pricing for frontier models
  • Even after the SpaceX deal, token demand continues to outstrip available supply
  • Profitability is partly a function of being supply-constrained, not just demand strength

Our view: Our primary platform. We run Claude Code as our business operations hub, orchestrating strategy, research, delivery, and knowledge management daily. The April product wave made Claude a practitioner-grade work platform; the May SpaceX deal turned compute capacity into product strategy; the late-May Karpathy hire and profitability disclosure shifted market expectations more than any single product release this year.

OPENAI

GPT models & ChatGPT

OpenAI pioneered the current AI era and continues its strategic pivot to work AI. GPT-5.5 is broadly comparable to Opus 4.7 on most everyday tasks and slightly cheaper. May strengthened the interaction layer: GPT Realtime 2, Realtime Translate, and Realtime Whisper make voice and live context transfer more central, while Codex added long-running goal loops and browser access. OpenAI is also formalising enterprise deployment through Deploy Co, a signal that model access alone is not enough.

Examples: GPT-5.5, GPT Realtime 2, Codex, GPT Image 2, ChatGPT Plus

Strengths:

  • +Largest consumer ecosystem and integrations
  • +GPT-5.5: comparable to Opus 4.7 on everyday tasks, slightly cheaper
  • +GPT Image 2: legible text in images (signage, slide labels, packaging)
  • +Realtime 2, Realtime Translate, Realtime Whisper, Codex /goal, and browser extension

Considerations:

  • Anthropic now ahead on ARR ($30B vs OpenAI $25B); positioning is reversed
  • Sora shutdown signalled compute scarcity trade-offs
  • Microsoft dependency remains a concentration risk

Our view: The ecosystem leader on consumer reach and a serious work-AI competitor. GPT-5.5's price-performance makes multi-model routing compelling, while Codex and Realtime push OpenAI towards persistent, interactive work systems rather than chat alone.

GOOGLE

Gemini models + a sprawling AI product line

Google brings deep AI research heritage, the broadest distribution surface in technology, and (after Google I/O 2026) the most sprawling AI product line of any provider. The Gemini app jumped from 400M monthly active users (May 2025) to 900M (April 2026); monthly tokens processed across Google surfaces went from 480 trillion to 3.2 quadrillion in the same window. The scale advantage is real. The product clarity is not: a small-business operator now has to choose between Gemini, Gemini Advanced (AI Pro), Gemini Business (Workspace), AI Ultra, Spark, Anti Gravity 2.0, AI Studio, Jules, Flow, Veo, Omni, Nano Banana Pro, Google Pics, NotebookLM, and AI Mode in search — many overlapping, several launched without release dates, all evolving fast.

Examples: Gemini 3.5 Flash, Anti Gravity 2.0, Spark, Omni, Nano Banana Pro, AI Studio, Jules, Flow, Veo, Google Pics, NotebookLM, AI Mode (search)

Strengths:

  • +Largest distribution surface in consumer AI (900M Gemini app MAU; 3.2 quadrillion tokens/month)
  • +TPU compute moat — now externalised as a business line, not just internal capacity
  • +Native multimodality and very long context windows
  • +Omni (announced May 2026): editing-first multimodal model — a "Nano Banana for video"
  • +Anti Gravity 2.0: agent-first standalone desktop app with multi-agent teams and scheduled tasks (parity with Claude Code / Codex, not yet leadership)
  • +NotebookLM remains a category-defining product for research, study, and synthesis

Considerations:

  • Product sprawl is the dominant problem — the I/O 2026 lineup is genuinely hard to navigate, even for AI-fluent users
  • Gemini 3.5 Flash benchmarks well on Terminal Bench 2.0 (76.2%) and is state-of-the-art on OS World, but pricing has shifted (3x cost of previous Flash, 20x cost of 2.0 Flash) — speed is no longer paired with low cost
  • Spark and several other I/O launches announced without release dates
  • Google Ultra plan (May 2026) now uses compute-based usage limits; agentic tools (Anti Gravity, Flow) on usage-limit model — the subsidy era is ending here too
  • Strategic uncertainty: an internal split between Hassabis (world-models / robotics / continual learning) and a coding-agent-led RSI direction means priorities may shift again

Our view: Google may win consumer AI by sheer distribution: it already touches consumers everywhere, and Gemini scale numbers are remarkable. For solo and small-business operators, however, the product sprawl is the unmet need. Choosing what to use for what is now harder than using it. This is the single clearest argument all year for an AI-navigator role — a guide who can map the landscape rather than build everything inside it.

THINKING MACHINES

Interaction models

Thinking Machines Lab introduced a distinct model category in May 2026: interaction models trained from scratch for continuous, time-aware exchange rather than turn-based chat. The architecture pairs a foreground interaction model with a background model doing longer reasoning, browsing, and agentic work. The important signal is not raw benchmark performance; it is the shift from "prompt in, answer out" to AI that can notice, interrupt, translate, correct, and keep working while the human keeps talking.

Examples: TML Interaction Small, real-time video + speech, background model pairing

Strengths:

  • +Real-time audio and visual proactivity
  • +200ms micro-turns rather than conventional turn-based chat
  • +Foreground interaction plus background reasoning architecture
  • +Strong fit for meetings, training, education, coaching, and live collaboration

Considerations:

  • Early-stage lab, not yet a general platform choice
  • Frontier labs may copy the abstraction quickly
  • Commercial deployment path still unclear

Our view: A category signal more than a vendor recommendation today. Interaction is becoming capability, not interface polish. This belongs in the landscape because it changes what practitioners can expect from future harnesses.

XAI

Grok models

xAI has shifted from pure model challenger to infrastructure signal. Grok remains integrated with X, but the May 2026 Anthropic / SpaceX partnership reframed the story: xAI / SpaceX has enormous compute capacity, while Anthropic has stronger model and harness demand. Elon has also indicated xAI will be dissolved as a separate company into SpaceX AI. Treat xAI less as a dependable frontier model platform and more as a window into AI compute infrastructure.

Examples: Grok 4, Colossus 1, Colossus 2, SpaceX AI

Strengths:

  • +Real-time information from X
  • +Colossus 1 and Colossus 2 make SpaceX a meaningful compute actor
  • +Potential path towards orbital and vertically integrated AI compute
  • +Less guardrails on topics

Considerations:

  • Limited enterprise features
  • Tied to X ecosystem
  • Significant organisational fragility (9/11 co-founders departed)
  • Grok has not kept pace with the strongest model + harness combinations

Our view: Do not depend on Grok as a core work platform. Do monitor SpaceX AI as infrastructure: compute supply is now a strategic lever in the AI race, and Elon may be more consequential as a compute operator than as a model builder.

META

Llama models (open-source)

Meta's open-source approach has democratised access to capable models. Llama can be run locally or on private infrastructure, offering control and privacy that hosted APIs cannot.

Examples: Llama 4 Maverick, Llama 4 Scout, Llama 4 Behemoth (preview)

Strengths:

  • +Open source and customisable
  • +Can run locally/privately
  • +No per-token API costs
  • +Growing ecosystem

Considerations:

  • Requires technical expertise to deploy
  • Smaller models than frontier APIs
  • Self-managed infrastructure

Our view: Important for privacy-sensitive deployments and organisations with technical capability.

MISTRAL

European AI models

European-founded AI company offering competitive models with strong performance-to-cost ratios. Open-weight models available for self-hosting, with API access for convenience. Le Chat Pro is one of the privacy-first tools commonly used on the "private side" of a Public/Private wall for solo regulated practices.

Examples: Mistral Large 3, Mistral Medium 3, Mistral Small 3.1, Le Chat Pro

Strengths:

  • +European data sovereignty option
  • +Strong price/performance
  • +Open-weight models available
  • +Le Chat Pro: privacy-first option for client-confidential work

Considerations:

  • Smaller ecosystem than US providers
  • Enterprise features still developing

Our view: Good option for European data sovereignty requirements. Le Chat Pro features prominently on the private side of the Public/Private wall pattern (see /ai/foundation).

APPLE

On-device AI + Private Cloud Compute

Apple announced its CEO succession in April 2026: hardware VP John Ternus replaces Tim Cook (rather than software-side or COO Jeff Williams). The signal is structural: Apple is betting on on-device silicon plus Private Cloud Compute, not the frontier-lab race. Apple Foundation Models run inside the device for most tasks; harder workloads route to Apple's own private cloud with verifiable guarantees that data stays out of training. For solo practitioners handling protected client data, this is one of the most consequential strategic signals of 2026.

Examples: Apple Foundation Models, Apple Intelligence, Private Cloud Compute

Strengths:

  • +On-device by default for most tasks (privacy by architecture)
  • +Private Cloud Compute with verifiable hardware-rooted guarantees
  • +Tight integration across Apple ecosystem (iOS, macOS, iCloud)
  • +Hardware-first strategic positioning under Ternus

Considerations:

  • Apple ecosystem only
  • Less raw frontier capability than dedicated lab models
  • Still relatively new vs. Anthropic / OpenAI / Google

Our view: Watch closely. The on-device AI path becomes more compelling every quarter, particularly for solo regulated practitioners and personal-life users where privacy and offline capability matter. Apple Intelligence is a natural complement to Lumo, Mistral Le Chat, and Maple on the private side of a Public/Private wall.

DEEPSEEK

Chinese frontier AI at fraction of the cost

DeepSeek shook the AI industry by producing frontier-competitive models at a fraction of US lab costs. DeepSeek V4 shipped on 27 April 2026 in Pro and Flash variants, priced at less than one-seventh the cost of Opus 4.6 for roughly one-generation-behind capability. R1 (reasoning) matches earlier o1 performance; V3 rivals GPT-4o. The arithmetic is now unambiguous: for routine tasks where "good enough" is genuinely good enough, DeepSeek changes the cost calculus.

Examples: DeepSeek V4 (Pro and Flash), V3, R1

Strengths:

  • +V4 ships at <1/7th the cost of Opus 4.6 for one-generation-behind capability
  • +Open-weight models available (R1, V3)
  • +Efficiency breakthroughs in training methodology
  • +Strong coding and mathematical reasoning; natively multimodal

Considerations:

  • Chinese company — data sovereignty concerns for some organisations
  • API reliability and availability can vary
  • Censorship on certain topics (Chinese regulatory compliance)
  • Rapidly evolving — model versions shift fast

Our view: The biggest disruption in AI economics since GPT-3. DeepSeek proved that frontier capability doesn't require frontier budgets. Essential for multi-model strategy — particularly for cost-sensitive workloads where R1 or V3 can match more expensive alternatives.

How We Navigate This

With so many options, how do you choose? Here's our approach.

Start with the Problem

Don't start with “what AI should we use?” Start with “what problem are we solving?” The tool follows from the task, not the other way around.

Favour Simplicity

The simplest tool that solves the problem is usually the right choice. Complexity has ongoing costs. Start simple; add sophistication when you hit limits.

Build for Portability

The landscape changes fast. Avoid deep lock-in where you can. Use standards (MCP, OpenAI-compatible APIs) that let you switch if better options emerge.

Test with Real Work

Demos impress; production reveals. Before committing, test tools on your actual tasks. What works in a demo may struggle with your specific context.

What's Not Here

Comprehensive Coverage

This isn't a complete market survey. We focus on tools we've used or seriously evaluated. Many good options aren't listed because we haven't worked with them.

Full Enterprise Stack

We cover M365 Copilot and Graph, but not the full enterprise AI stack (Copilot Studio, Power Platform AI, Salesforce Einstein, ServiceNow, etc.). These require enterprise-specific context.

Infrastructure Deep Dives

We now track AI compute infrastructure because it explains limits, pricing, and reliability. We do not attempt a full survey of GPU providers, cloud infrastructure, power markets, or on-premise deployment. Those choices need infrastructure-specific advice.

Pricing Details

Pricing changes frequently. We mention pricing considerations but don't list specific prices. Check provider websites for current rates.

Why Timing Matters

The landscape is not just moving faster. Capacity, pricing, interaction, and deployment support now change what small teams can actually do with AI.

Clock Speed Reality

Features ship faster than conferences can announce them. The useful habit is not memorising every launch, but spotting which changes alter real work: better context transfer, cheaper execution, safer privacy, or more reliable delegation.

Leaders Pulling Ahead

The gap between organisations that get AI and those still experimenting is widening. Not because technology is inaccessible, but because execution speed is separating leaders from laggards. The pattern starts with documentation, research, and workflow support before it reaches core professional judgement.

Model Commoditisation

The models themselves are increasingly commoditised. Your advantage is less about choosing one winner and more about building a portable way of working: saved context, reusable instructions, clear routing, and enough fluency to move between tools when cost, limits, or quality shift.

Labs Eating the App Layer

AI labs are moving down-stack into code review, security scanning, meetings, design, and deployment support. For small teams, the question is practical: build on tools that are useful now, but keep your context, documents, and working method portable enough that a bundled feature does not strand you.

Need Help Navigating?

The landscape is overwhelming. We've been navigating it daily. Let's talk about what makes sense for your situation.