Most organizations are setting themselves up for failure by adopting AI the same way they buy software: pick a vendor, standardize, and roll it out everywhere.
The assumption is that one model will solve every problem. But a model that excels at code generation might struggle with security analysis, while a frontier model that’s perfect for prototyping may not meet your data residency requirements.
Solving this mismatch requires flexibility in how you deploy AI models. Some teams need advanced, large-scale models for cutting-edge reasoning, while others need specialized models for domain-specific work. And, critically, you need the ability to mix and match based on the task at hand.
The AI Paradox: Coding is just one piece of the puzzle
The current wave of AI adoption focuses almost entirely on accelerating code generation. But coding represents a fraction of what developers actually do. According to GitLab’s 2025 Global DevSecOps Survey, developers spend only about 15% of their time writing code. The rest goes to planning, reviewing code, testing, debugging, managing dependencies, coordinating with teammates, and navigating compliance requirements.
This creates an AI paradox: AI is accelerating coding, but disconnected toolchains and manual coordination overhead have slowed overall productivity so much that it’s costing nearly a full workday per developer each week.
To solve this, AI needs to work across the entire development lifecycle, not just code generation. The challenge is that different activities across the software lifecycle have fundamentally different performance requirements:
- Speed-critical tasks like auto-completing code or suggesting fixes during active development need sub-second response times, which might favor smaller, locally hosted models.
- Quality-critical tasks like architectural planning or security analysis justify the cost of frontier models with superior reasoning.
- Cost-sensitive tasks at high volume, like running tests or updating dependencies across hundreds of repositories, require economical options to stay viable.
A single model can’t optimize for all three simultaneously. The organizations that gain the most from AI are the ones building systems flexible enough to route each task to the model that best fits its performance, quality, and cost profile.
Premium models for every task will break your budget
Price is a major factor influencing which model teams should choose for each job. A practical cost-to-value strategy might look like this:
- For high-volume, routine work like writing commit messages, summarizing log files, or writing test cases, teams lean toward cheaper and faster options, including open-source models where feasible.
- For tasks that demand complex reasoning, like code generation, teams pay for more capability.
- For specialized models that are more deterministic, teams might be willing to pay a premium for infrastructure-as-code generation or high-accuracy data transformation.
Optionality — the ability to choose between different models based on the task — is a hedge against model performance differences, pricing swings, and the reality that providers may sunset products or go out of business altogether.
But where does that optionality come from? There are essentially three sources:
- Commercial frontier models (Anthropic, OpenAI, Google) deliver cutting-edge performance and are constantly improving, but you’re dependent on vendor roadmaps and pricing.
- Self-hosted commercial or open-source models give you control over data residency, costs, and availability, but require infrastructure management and, in the case of open source, cannot handle agentic workflows.
- Domain-specific models you’ve trained can outperform general models on narrow, high-stakes tasks where you have unique data and clear success criteria, but such work requires specialist expertise and can be operationally very expensive.
Each approach has trade-offs. The key is building systems that let you use all three strategically.
Managing AI like cloud infrastructure
Model flexibility only creates value if you can manage the economics behind it. The price gap between models is substantial: complex reasoning models can cost 500% more per request than general-purpose models that work fine for routine tasks.
This is where model routing — the ability to define which models get used for which tasks — becomes critical. A code review might route to a frontier model, while commit message generation uses a faster, cheaper option.
But routing alone isn’t enough. Enterprises need the same financial controls they have for cloud infrastructure: quotas to prevent runaway spending, limits to enforce budget discipline, and chargeback models that allocate costs to the departments actually using AI. Without these guardrails, AI adoption becomes financially unsustainable.
This is why FinOps practices are extending to AI. According to IDC, organizations will underestimate their AI infrastructure costs by 30% through 2027, and combining GenAI with FinOps processes will be essential for managing this complexity. (And that's just for GenAI — the impact of underestimating agentic AI could be considerably higher.) The organizations that treat AI spend like cloud spend — with visibility, accountability, and governance — are the ones that will scale AI successfully.
How customization delivers ROI
Model flexibility also depends on context. AI needs information spread across systems that weren’t designed to work together by default. A developer debugging an issue might need to reference the work backlog, pull recent discussions from Slack, and review app performance metrics in Grafana. If every system has its own AI experience and none of them connect cleanly, AI slows productivity.
Fortunately, recent open source developments have paved the way forward: Model Context Protocol (MCP) makes AI reliable by letting tools share relevant context and actions across a single workspace.
This shared foundation (with unified context) is what makes meaningful customization possible. In fact, the most effective customization works best in layers because you can encode how your organization performs work within each one.
I think about this in three layers. Most developers will rely on pre-built agents and workflows that make AI available for common tasks without requiring expertise. Power users shape how a model operates through detailed prompting, essentially teaching it to follow their organization's playbook. And experts connect multiple agents into governed flows that mirror how humans deliver work, with strict review protocols in place.
Organizations get the most ROI by designing a system where AI is constrained by context and accountability, and where they can connect different models based on their requirements: frontier commercial models, self-hosted instances of commercial models for data residency, or specialized models they’ve already trained for domain-specific work.
Reliability comes from orchestration, not standardization
Enterprise AI adoption requires outputs that reliably hold up inside real systems, under real constraints.
The most successful organizations are building systems that support model diversity while maintaining strict governance. They treat AI spend like cloud spend, with model routing, quotas, and chargeback. And they’re focusing on orchestration to ensure AI fits naturally into daily workflows and relevant context is shared across tools.
A rigorous selection process matters here. The best platforms use under-the-hood subagents that evaluate models across quality, performance, and cost for each type of operation — and make those evaluations visible to users so teams understand why a given model is being used for a given task. That transparency creates trust. And when teams have requirements that differ from the defaults, they should be able to override model selections with their own preferences, or bring their own models entirely.
This approach lets them use frontier models where performance matters, self-hosted models where data residency is required, and specialized models where domain expertise makes the difference. The common thread is the governed control plane that maintains the same standards for reliability and security regardless of model source.
The future of enterprise AI isn’t about finding one perfect model. It’s about building systems that let you connect the models that match your requirements.
Next steps
Research Report: The Economics of Software Innovation
Learn what global C-suite executives are saying about AI-powered business growth, agentic AI adoption, upskilling, and how to demonstrate the impact of software innovation.
Frequently asked questions
Key takeaways
- No single AI model can optimize for speed, quality, and cost at once. Enterprises that route tasks to the right model extract more value from AI while keeping spending under control.
- IDC projects that organizations will underestimate AI infrastructure costs by 30% through 2027. Applying FinOps practices to model spend gives enterprises the visibility and control they need to scale.
- The most resilient enterprise AI systems support model diversity within a governed control plane, enabling teams to use frontier, self-hosted, and custom models without sacrificing reliability or compliance.

