The AI Adoption Window Is Closed. The Optimization Window Is Open.

Over the past several months, I've had the same conversation in three different forms: across the table from clients rethinking their tech stack, on conference floors comparing notes on software spend, and in team standups tracking what we're actually burning on AI tools each month.

The conversation used to be about adoption.

"Should we be using AI?"

"Where do we start?"

"What's the ROI?"

That conversation is over.

The new one is harder. It's about cost, dependency, and whether the way your business consumes AI today is actually sustainable into next year.

The next competitive advantage isn't who uses AI the most. It's who gets the most value per dollar — and knows when not to use a model at all.

The pricing floor just moved

A few data points worth sitting with.

OpenAI's GPT-5.5 is listed at $5.00 per million input tokens and $30.00 per million output tokens, roughly double the cost of GPT-5.4 according to pricing analysis from The Register and OpenAI's own materials. Google followed with Gemini 3.5 Flash, which public analysis puts at three to six times more expensive than prior Flash-tier models. Microsoft is raising Microsoft 365 commercial prices effective July 1, 2026. GitHub Copilot is switching to token-metered GitHub AI Credits starting June 1, 2026.

One developer ran the numbers on his own Copilot account and projected his bill moving from roughly $67 to more than $960 in a single billing cycle. Edge case. But edge cases become budget conversations once enough people start using the tools heavily.

Anthropic introduced weekly usage caps on Claude Pro and Max plans in August 2025. Not throttles on bad actors — guardrails on normal, engaged users paying $20 to $200 a month.

Gartner's May 2026 forecast puts worldwide AI spending at $2.59 trillion in 2026, up 47% year over year. The compute required to run these models is extraordinarily expensive, and providers are no longer absorbing that cost at adoption-phase rates.

The pricing floor has moved. It will keep moving.

This does not mean AI is becoming unaffordable. It means AI is becoming a managed resource — like cloud compute or paid media. The companies that win won't be the ones using the most AI. They'll be the ones that know which work deserves expensive intelligence, which work can run on something cheaper, and which work should skip the model entirely.

The Uber warning

The clearest example of where this leads without governance: Uber. In April and May 2026, CTO Praveen Neppalli Naga said the company had exhausted its full-year 2026 AI budget in roughly four months.

Claude Code adoption across Uber's 5,000-person engineering team reportedly jumped from 32% to 84% in about three months. Sounds like a success story. But individual engineers were burning hundreds of dollars per month, and internal leaderboards tracked who was using the most tokens — which created an organizational incentive to keep burning. A budget designed to last twelve months was gone by April.

Uber is not a small company operating on thin margins. That is the point.

Ramp's data tells the same story from a different angle. Average monthly AI token spend across its customer base has increased 13x since January 2025, and finance teams are still learning how to forecast usage-based costs. If that's true at companies with dedicated financial infrastructure, I genuinely wonder what it looks like at mid-market ecommerce operators still running AI on a credit card and a hope.

Why ecommerce operators should care

AI is no longer sitting off to the side in experiments. It shows up in product recommendations, customer service automation, content generation, merchandising workflows, fraud review, search, and personalization.

Those aren't novelty use cases. They sit close to conversion, margin, and customer experience.

If AI is writing product descriptions, enriching catalog data, powering chat, handling service tickets, routing fraud signals, or assisting your developers, then AI spend isn't a generic software cost anymore. It's part of your operating model. Same discipline applies as you'd give to cloud infrastructure, paid media, inventory, and labor.

What smart operators are doing right now

Three moves produce real results without requiring a platform migration or a six-month project.

1. Turn on caching and audit your prompts

Both Anthropic and Google offer significant discounts on cached input tokens. Gemini 2.5 and later models can receive a 90% input-token discount when referencing an existing context cache. Anthropic's prompt caching is designed for repeated prefixes — tools, system instructions, examples, long background context. OpenAI also supports prompt caching for eligible prompts.

The catch: caching only works when your prompt structure cooperates. If your system prompt includes a timestamp, session ID, or anything else that changes per request, you're probably invalidating the cache on every call and paying full price every time.

This is the highest-ROI fix in many implementations, and it costs a few hours of engineering time. Move static content to the prefix. Put variable content at the end. Audit your prompts the way you'd audit SQL queries.

2. Route by task, not by habit

Most teams default to their flagship model for everything because it's the one they trust. But a lot of what ecommerce AI does doesn't require frontier-level intelligence.

Classification, extraction, tagging, review summarization, basic support triage, catalog enrichment, and structured rewriting can often run on cheaper models or non-LLM systems. If 70% of your workload is classification and you're running all of it through your most expensive model, you're overpaying.

Tools like LiteLLM, Portkey, and OpenRouter let you configure routing rules as a change to your gateway config — not a rewrite of your application. Use the cheapest model that gets the job done reliably. Escalate only when the task actually requires it.

3. Move deterministic work off the LLM entirely

This one takes the most organizational honesty.

Somewhere in your stack, you're probably using an AI API to do something a regular script could do better, faster, and for free.

SKU parsing. Email validation. Address normalization. Currency conversion. Date math. Structured extraction from predictable HTML. These are not reasoning problems. Regex, standard libraries, spaCy, scikit-learn, sentence-transformers, and basic rule engines can handle a meaningful share of ecommerce workloads with higher speed, lower cost, and more predictable behavior.

A 2026 benchmark comparing LLMs to deterministic code for routine function-calling tasks found the deterministic approach cost roughly 40 times less at scale, with comparable task completion. Using the right tool for the job is not a regression. It's basic engineering.

The resilience problem nobody planned for

Cost is only half the issue. The other half is dependency.

On October 20, 2025, an AWS US-EAST-1 failure caused significant disruption across AWS-dependent services — DNS resolution issues with regional DynamoDB endpoints, followed by cascading recovery problems. On November 18, 2025, Cloudflare suffered a major outage triggered by a bug in its Bot Management feature file, with impacts showing up across ChatGPT, X, Spotify, Canva, and others.

These were not AI provider outages in isolation. They exposed something more fundamental: the AI dependency stack is highly concentrated. Most businesses aren't just dependent on one model provider. They're dependent on that provider's cloud infrastructure, that provider's network path, that provider's API gateway — and often the same handful of internet infrastructure companies that everyone else uses.

If AI is in your customer service path, your product recommendation engine, or your conversational search, a three-to-six hour outage is a three-to-six hour revenue drag. That's a business continuity problem.

The practical response has two layers.

First: a multi-provider gateway. Tools like LiteLLM and Portkey let you define failover rules so that if one provider returns an error, your system retries against another within milliseconds. This is a configuration change, not a project.

Second: a local fallback. For businesses where AI touches the customer experience directly, a self-hosted model on Ollama or llama.cpp can provide a degraded-mode backup that operates independently of cloud AI provider status. A local Llama, Mistral, or Qwen model won't match a frontier model for everything. But it can answer a basic product question, classify an incoming ticket, or keep a customer-facing flow alive while the frontier model is down.

Uptime beats quality in a crisis.

A CFO wouldn't put 100% of the company's cash in one bank. The same logic applies here.

Where this is all going

IDC has warned that Global 1,000 companies will underestimate AI infrastructure costs by 30% through 2027. The FinOps Foundation's 2026 State of FinOps report found that 98% of teams now manage AI spend, up from 31% two years earlier. Gartner's framing from its 2026 analysis is the one I keep returning to: organizations are increasingly prioritizing proven outcomes over speculative potential.

That's not a retreat from AI. It's what happens when a technology actually sticks.

The organizations getting this right treat AI spend the way they treat cloud infrastructure: observability, routing rules, cost governance, fallback architecture, clear accountability for outcomes. The speculative phase is behind us. The question shifts from how much AI you're consuming to what you're producing per dollar spent.

That's the conversation I'm having at conferences and with clients. Every ecommerce operator needs to be having it with their own team — before the next price hike lands, before an outage takes a chunk out of revenue, before AI spend shows up as a line item in a quarterly review that nobody can explain.

The adoption window is closed.

The optimization window is open.

Research note

This post draws on provider pricing pages and announcements, public AI pricing analysis, technology press coverage, incident reports, and FinOps research, including:

OpenAI API pricing and prompt caching documentation: OpenAI API pricing and OpenAI prompt caching
GPT-5.5 pricing analysis: The Register
Gemini 3.5 Flash pricing analysis: Simon Willison
GitHub Copilot AI Credits and usage-based billing: GitHub Blog
Microsoft 365 July 2026 commercial pricing updates: Microsoft licensing announcement
Claude Pro and Max weekly rate limits: TechCrunch
Gartner May 2026 AI spending forecast: Gartner press release
FinOps Foundation 2026 State of FinOps: FinOps Foundation report
Uber AI budget reporting: Forbes and AI Magazine
Ramp AI token spend reporting: The New Stack and Ramp Spring 2026 Business Spending Report
AWS October 20, 2025 outage: AWS update and ThousandEyes outage analysis
Cloudflare November 18, 2025 outage: Cloudflare incident report and The Verge coverage
Anthropic prompt caching: Claude API documentation
Google Gemini context caching: Google Cloud documentation