Evaluation ScorecardTechnical Leaders & Procurement Teams

AI Vendor Evaluation Scorecard

Score any AI vendor across 6 criteria that predict whether their infrastructure will hold up in production — before you sign a contract.

Vendors who deflect operational questions in sales are the ones whose operational reality does not survive scrutiny

SLA without historical uptime data is marketing — ask for the last 12 months of incidents and resolution times

Data handling policies should be in writing, in the contract, before you share any production data

A vendor's response to a technical question you cannot answer reveals their actual support quality more than any sales call

Why This Matters

AI vendors are exceptionally good at demonstrations and exceptionally opaque about the operational details that matter in production. The vendor that impresses in a demo may be the one whose API degrades under load, whose pricing changes with 30 days notice, and whose support team goes silent during incidents. This scorecard covers the six criteria that predict production reliability and relationship quality — and the specific questions that surface the answers.

Reliability and SLA

Critical

Questions to Ask

01What is the published SLA for API availability? (percentage uptime, excluding scheduled maintenance)
02Can you provide 12 months of actual uptime data, including incident history and resolution times?
03What is your status page URL, and does it provide historical incident data?
04What are the geographic availability zones, and what is the failover architecture?
05What is your communication protocol during incidents — how quickly are customers notified, through what channel, with what level of detail?

Red Flags

SLA provided but no historical incident data available
Status page shows only current status with no history
Incident communication is described as "email" with no committed notification time
"We rarely have downtime" without data to support it

Data Handling and Privacy

Critical

Questions to Ask

01Does your service use customer data to train or improve your models? What are the contractual guarantees?
02Where is data stored and processed geographically? Can we restrict processing to specific regions?
03What is the data retention period? Can we request deletion of specific data or all data on contract termination?
04Are you SOC 2 Type II certified? Can you provide the current certificate?
05What encryption is applied to data in transit and at rest? What key management approach is used?

Red Flags

Training data use described as "model improvement" without specific opt-out or contractual guarantee
Cannot confirm geographic data processing location
No data deletion guarantee on termination
SOC 2 Type I only (Type I is a snapshot; Type II is an ongoing audit)
Vague responses about encryption standards

Pricing Transparency and Predictability

Important

Questions to Ask

01What is the full pricing model? Are there per-request costs, volume tiers, minimum commitments?
02What are the rate limits at the tier we are considering, and what is the cost of exceeding them?
03How much notice is provided before pricing changes? What is the contractual guarantee on price stability?
04Are there any costs not captured in the published pricing? (egress, support tiers, additional features)
05At our expected production volume, what is the estimated monthly cost? Can you model this for us?

Red Flags

Cannot provide a cost model at expected volume
Price change notice is less than 30 days
Rate limits that would be hit at expected production volume without an upgrade path
"Custom pricing" with no published starting point
Significant hidden costs surfaced only after scoping

Model and API Stability

Important

Questions to Ask

01What is the model versioning policy? Can we pin to a specific model version?
02How are model updates communicated? What is the minimum notice period before a model version is deprecated?
03What is the API versioning policy? How long are deprecated API versions supported after sunset announcement?
04Has the model behaviour changed in the past 12 months in ways that affected customer systems? How was this handled?
05What is the process for testing our implementation against a new model version before it becomes the default?

Red Flags

No model versioning — "latest" is the only option
Model deprecation notice shorter than 60 days
Cannot share history of model behaviour changes in the past 12 months
No sandbox or staging environment for testing against upcoming model versions

Support Quality and Technical Responsiveness

Important

Questions to Ask

01What is the support SLA for different severity levels? (P1 incident response time, P2, P3)
02Is there a dedicated technical account manager or solutions engineer for our account?
03What is the escalation path for a production incident that is not being resolved at tier-1 support?
04Can we speak to a customer at a similar scale who has had to use emergency support in the past 6 months?
05Ask a specific technical question you do not know the answer to and observe the quality and honesty of the response.

Red Flags

Support SLA is "best effort" for any tier
No escalation path beyond submitting another ticket
Cannot provide a reference customer for support quality
Technical question receives a deflective or incorrect answer
Support is email-only with no synchronous option for P1 incidents

Exit and Data Portability

Useful

Questions to Ask

01Can we export all of our data, including embeddings, fine-tuning data, and inference logs, in a standard format?
02What does migration to a different vendor look like? What would we need to rebuild?
03What happens to our data if your company is acquired or ceases operations?
04Are there any contractual lock-in mechanisms (minimum commitment periods, data export fees, termination penalties)?
05What is the contractual process and timeline for termination and data deletion?

Red Flags

Data cannot be exported in a portable format
Migration path described as "we can help you migrate" without specifics
No data protection clause in acquisition scenarios
Early termination penalties that make switching economically prohibitive
Data deletion process is slow (> 30 days) or not contractually guaranteed

Back to all resources

AI Systems Architect

Want help applying this to your business?

A strategy call is where the framework meets your specific situation, team, and goals.

Book a Strategy Call

More Resources