Build vs buy is one of the oldest questions in software, but AI gives it new dimensions. The stakes are higher because the choices compound: build the wrong component and you are locked into maintaining it while the ecosystem moves on. Buy the wrong component and you inherit constraints that shape every architectural decision downstream. The key insight is that build vs buy for AI is not one decision — it is a stack of decisions, one per layer of your AI infrastructure, and the right answer at each layer is often different.
Why the Decision Has to Be Made Layer by Layer
AI systems have distinct infrastructure layers that have very different build vs buy economics. At the foundation model layer, the answer is almost always buy — building a foundation model from scratch requires compute and research investment that is out of reach for any organisation that is not a dedicated AI lab. At the business logic layer, the answer is almost always build — your competitive advantage comes from the domain knowledge you encode, not from the infrastructure that runs it.
The complexity is in the middle layers: orchestration, data pipelines, vector databases, evaluation tooling. These layers have mature commercial options, strong open-source alternatives, and genuine trade-offs between control, cost, and maintenance overhead.
Making a single build-vs-buy decision for "the AI system" without distinguishing between layers leads to either massively over-building commodity infrastructure or under-investing in the components that differentiate the product.
What to Almost Always Buy (or Use Open-Source)
Several categories of AI infrastructure have reached sufficient maturity that building them from scratch is almost never the right call. Building these is expensive, slow, and distracts from the work that actually creates business value.
- Foundation models (GPT-4, Claude, Gemini, Llama) — use API access or managed hosting; the compute and research costs of training are prohibitive
- Embedding models — OpenAI, Cohere, or open-source alternatives are battle-tested; the quality gap from custom training is rarely worth the cost
- Vector databases (Pinecone, Weaviate, Qdrant, pgvector) — mature products; the schema and query design is custom, not the database itself
- Observability infrastructure (Langfuse, Helicone, Arize) — instrumenting AI systems is a solved problem; buy the tooling, focus on what you measure
- Document parsing and OCR (AWS Textract, Azure Document Intelligence) — commodity capability with known accuracy profiles
- API gateway and rate limiting — this problem was solved before AI existed; use existing solutions
What to Almost Always Build
The components you build are the components where your domain expertise creates differentiation. These are the parts of the system where a generic commercial solution will not fit your use case, because your use case is specific to your business.
- Prompts and prompt templates — these encode your domain expertise; a generic prompt is a generic output
- Output validation and business rule enforcement — what makes a good output for your use case is specific to your domain, your customers, and your risk tolerance
- Domain-specific evaluation criteria and test sets — only you know what "correct" means for your outputs
- Data schemas and transformation logic — the shape of your data is specific to your systems and your business
- Integration connectors for your internal systems — your CRM, ERP, and data stores have specific schemas and access patterns
- Human review workflows and escalation logic — the decisions about when to escalate and how to review are operational decisions that reflect your business process
What Requires Careful Evaluation Before Deciding
Several categories sit in the middle, with genuine trade-offs that depend on your team, your constraints, and your long-term architecture goals.
Orchestration frameworks (LangChain, LlamaIndex, CrewAI)
These frameworks accelerate prototyping significantly — they handle common patterns like RAG, agent loops, and tool use with minimal boilerplate. The risk is that they abstract away complexity that eventually needs to be understood and controlled. Teams that reach production scale often find themselves refactoring away from framework magic toward more explicit orchestration code.
The pragmatic approach: use frameworks for prototyping and early-stage development, then evaluate whether to maintain the framework dependency or extract the orchestration logic as the system matures.
Fine-tuning vs prompt engineering
Fine-tuning gives you a model that is specialised for your domain — potentially better performance at lower cost per call. It also requires training data, compute, evaluation infrastructure, and re-training processes as the world changes. Prompt engineering is more flexible, faster to iterate, and requires no training infrastructure.
The decision rule: start with prompt engineering. Move to fine-tuning when you have: a high-volume use case where per-call cost matters at scale, a stable domain where the training data will not become stale quickly, and the infrastructure to evaluate and maintain a fine-tuned model.
AI-assisted workflow platforms (Make, Zapier AI, n8n)
Low-code automation platforms with AI integration have become genuinely capable. For use cases that fit their model — connecting SaaS products, triggering workflows from events, moving data between systems — they can deliver results in hours rather than weeks.
The constraint is predictable: when your use case requires custom business logic, multi-step AI reasoning, or integration with systems these platforms do not support, you will be fighting the platform rather than leveraging it. Evaluate whether your use case fits cleanly before committing to the platform model.
The Decision Criteria That Should Drive Every Layer
Rather than evaluating each component in isolation, apply a consistent set of criteria across every build-vs-buy decision in your AI stack.
- Differentiation: does this component contribute to your competitive advantage, or is it commodity infrastructure? Build differentiators; buy commodities.
- Maintenance burden: what is the ongoing cost of ownership if you build this? AI infrastructure has a higher maintenance burden than traditional software because it has to evolve as models and data distributions change.
- Vendor lock-in risk: how constrained will you be if the vendor changes pricing, deprecates an API, or is acquired? Design abstraction layers for high-risk dependencies.
- Team capability: does your team have the skills to build and maintain this well? A badly-built internal component is worse than a bought one — be honest about capability gaps.
- Time to value: how long will it take to build vs buy? In early-stage AI projects, time to learning (not time to perfection) is often the critical constraint.
Evaluating AI Vendors: What to Actually Ask
When buying AI infrastructure components, the standard vendor evaluation questions are necessary but not sufficient. These are the AI-specific questions that surface the constraints that will affect you in production.
- What is the data retention and training policy — does my data contribute to model training? What are the contractual guarantees?
- What is the SLA for the inference API, and what happens to my system when you have an outage? Do you have a status page with historical uptime data?
- How do model updates get communicated, and how much notice will I have before a model that I depend on changes behaviour?
- What are the rate limits at my expected production scale, and what is the pricing for exceeding them?
- Is there a migration path if I decide to switch vendors or use a different model? What does my data portability look like?