Skip to main content
AI Risk Assessment

How to Evaluate AI Vendor Risk in 2026

March 22, 2026

AI-powered tools have moved from experimental pilots to core business infrastructure in less than three years. Your engineering team uses AI code assistants, your marketing team uses AI content tools, your support team uses AI chatbots, and your analytics team feeds proprietary data into AI-powered platforms daily. Each of these tools introduces risk categories that traditional vendor risk assessments weren't designed to capture — data ingestion practices, model training policies, and output liability questions that didn't exist before 2023. This guide provides a practical framework for evaluating AI vendor risk that goes beyond checking a SOC 2 box.

Why AI Vendors Require Different Risk Assessment Criteria

Traditional vendor risk assessment evaluates whether a vendor protects your data — can they keep it confidential, available, and intact? AI vendors introduce a fundamentally different question: what does the vendor do with your data beyond storing it? When you upload a document to a conventional cloud storage provider, the risk model is straightforward. When you paste that same document into an AI tool, it may be tokenized, embedded, used to fine-tune a model, or surfaced in responses to other users.

This distinction breaks the standard confidentiality model. A vendor can be SOC 2 certified, encrypt data at rest and in transit, and implement strong access controls while simultaneously using your data to improve their foundation model — making derivatives of your proprietary information available to every other customer. The security controls are technically sound, but the data usage creates a business risk that security certifications weren't designed to evaluate.

AI vendors also create output liability risk that has no parallel in traditional SaaS. When an AI tool generates content, code, or recommendations based on your data, questions emerge about intellectual property ownership, accuracy liability, and downstream consequences of AI-generated outputs. Your vendor due diligence process needs to address these categories explicitly.

The Three AI Vendor Risk Categories

Data Ingestion Risk covers what happens to information you provide to an AI vendor. This includes direct inputs (prompts, uploaded files, API calls) and indirect data collection (usage patterns, metadata, interaction logs). The critical questions are whether the vendor retains your inputs, how long retention lasts, who can access retained data, and whether inputs are used for any purpose beyond serving your request. Many AI vendors have shifted policies multiple times on data retention, so point-in-time assessments aren't sufficient — you need to monitor for policy changes.

Model Training Risk is the category that generates the most concern. When a vendor uses customer data to train or fine-tune their models, your proprietary information becomes embedded in model weights and can influence outputs for other customers. Even when a vendor claims data is "anonymized" before training, research has demonstrated that large language models can memorize and reproduce training data. The key distinction is between vendors that train on customer data by default (opt-out) versus those that never train on customer data without explicit opt-in. This single policy distinction should be a primary factor in your risk assessment.

Output Liability Risk addresses who is responsible when AI-generated content causes harm. If an AI code assistant introduces a security vulnerability, if an AI contract tool produces non-compliant language, or if an AI analytics platform generates inaccurate conclusions that drive business decisions — the liability chain matters. Most AI vendors disclaim all liability for output accuracy in their terms of service, which means your organization bears the risk of acting on AI-generated outputs.

What to Look for in an AI Vendor's Data Usage Policy

The most important document in an AI vendor assessment isn't the SOC 2 report — it's the data usage policy, and specifically the sections that address model training. Look for explicit, unambiguous statements about whether customer data is used to train models. Phrases like "we may use data to improve our services" are red flags because "improve our services" can encompass model training. You want to see language like "we do not use customer data to train our models" or "customer data is never used for model training without explicit written consent."

Training data commitments should be contractual, not just marketing language. A blog post saying "we don't train on your data" has no legal weight. Look for this commitment in the Data Processing Agreement (DPA), Master Service Agreement (MSA), or a dedicated AI-specific addendum. If the vendor offers an enterprise tier with different data usage terms than their self-service tier, ensure your contract explicitly references the enterprise terms.

Third-party model providers add a layer of complexity. Many AI-powered SaaS tools don't run their own models — they call APIs from OpenAI, Anthropic, Google, or other foundation model providers. When evaluating these vendors, you need to understand the data flow end-to-end. Does the vendor's API agreement with the model provider include a zero-training commitment? Some model providers offer zero-retention API tiers, but the SaaS vendor must specifically opt into them.

Data retention and deletion policies for AI vendors should specify how long inputs and outputs are retained, whether they're stored in retrievable form versus embedded in model weights (which can't be selectively deleted), and what happens to your data when you terminate the contract. A vendor that offers a 30-day deletion policy for stored data but has already used that data for model training hasn't truly deleted your information.

Key Questions to Ask AI Vendors During Due Diligence

Structure your AI vendor due diligence around specific, answerable questions rather than general security questionnaires. Generic questionnaires designed for traditional SaaS vendors miss the AI-specific risks entirely.

Start with data handling: Do you use customer inputs or outputs to train, fine-tune, or improve your AI models? If yes, is this opt-in or opt-out? Follow up with: Do any third-party model providers you use have access to our data, and do their terms permit training on API inputs? These two questions alone will disqualify a significant number of vendors for organizations handling sensitive data.

On the infrastructure side, ask: Where is AI inference performed — on your infrastructure, on a third-party model provider's infrastructure, or on-device? The answer determines which compliance frameworks apply and where your data physically travels. For organizations with data residency requirements, this is a threshold question.

For output risk, ask: What accuracy commitments or SLAs exist for AI-generated outputs? Do you carry errors and omissions insurance that covers AI-generated content? Most vendors will answer "none" and "no" respectively, which is honest but means your organization must build its own validation layer around AI outputs.

Finally, assess governance maturity: Do you have an AI ethics board or governance committee? Have you completed an ISO 42001 assessment or certification? Can you provide your model cards or AI system documentation? These questions evaluate whether the vendor treats AI governance as a structured discipline or an afterthought. ThirdProof reports include an AI governance section that evaluates these factors automatically.

How SOC 2, ISO 27001, and ISO 42001 Apply to AI Vendors

SOC 2 and ISO 27001 remain relevant for AI vendors because they validate fundamental security controls — access management, encryption, incident response, and operational security. An AI vendor without SOC 2 or ISO 27001 certification has a baseline security maturity gap regardless of their AI-specific practices. However, these certifications don't address AI-specific risks. A SOC 2 Type II report will tell you whether the vendor has proper access controls and change management, but it won't tell you whether your data is being used to train models.

ISO 42001 (AI Management Systems) is the emerging standard specifically designed for AI governance. Published in 2023, ISO 42001 provides a framework for responsible AI development and deployment, covering areas like AI system impact assessments, data governance for AI, transparency, and bias management. As of 2026, ISO 42001 certification is still relatively uncommon, but vendors that have achieved it demonstrate a structured approach to AI governance that goes beyond ad hoc policies.

When evaluating AI vendors, treat SOC 2 and ISO 27001 as necessary but not sufficient. They establish that the vendor can protect data using standard security controls. Layer AI-specific evaluation on top: data usage policies, training commitments, third-party model provider agreements, and ideally ISO 42001 certification or evidence of an equivalent internal AI governance program. Tools like Datadog and other major SaaS platforms increasingly document their AI practices — check their ThirdProof profiles for current compliance posture.

Red Flags: Vendors That Train on Customer Data by Default

The single biggest red flag in an AI vendor assessment is a default-on training policy — meaning the vendor uses your data to train their models unless you explicitly opt out. This practice was common in early AI products and persists in many tools today, particularly in lower-tier plans. The problem with opt-out training isn't just the data usage itself; it signals a vendor that prioritizes model improvement over customer data protection as a default posture.

Watch for these specific patterns: Tiered training policies where free or standard plans include data training but enterprise plans don't. This means the vendor has the technical infrastructure to use customer data for training and only restricts it when contractually required. If your contract ever lapses or gets miscategorized, the default behavior is to train. Ambiguous policy language like "we may use anonymized data to improve our services" — anonymization of text data used in LLM training is not straightforward, and "improve our services" is broad enough to encompass training.

Retroactive policy changes are another serious concern. If a vendor's current policy says they don't train on customer data, check whether they had a different policy when you first started using the product. Data submitted under the old policy may have already been used for training. There is no reliable way to "untrain" a model on specific data points, so historical training is permanent.

When you identify these red flags during assessment, it doesn't necessarily mean you can't use the vendor — but it means you need contractual protections specifically addressing training exclusion, you should limit the sensitivity of data you provide, and you should monitor the vendor's data usage policies for changes. ThirdProof's AI data usage risk analysis evaluates these factors as part of every vendor investigation.

See this in action

ThirdProof automates vendor risk assessment across 24 intelligence sources. Investigate any vendor in under 2 minutes — no questionnaires, no vendor cooperation required.

Try ThirdProof Free →

No credit card required

Frequently asked questions

How is AI vendor risk different from regular SaaS vendor risk?+
Traditional SaaS vendor risk focuses on whether the vendor can protect your data — confidentiality, integrity, availability. AI vendors introduce additional risk categories: whether your data is used to train models (making derivatives available to other customers), who bears liability for AI-generated outputs, and how third-party model providers in the vendor's supply chain handle your data. Standard security certifications like SOC 2 don't address these AI-specific concerns.
What is ISO 42001 and should I require it from AI vendors?+
ISO 42001 is an international standard for AI management systems, published in 2023. It covers responsible AI development, data governance for AI, impact assessments, transparency, and bias management. While requiring ISO 42001 would eliminate most vendors today due to low adoption, asking whether a vendor is pursuing it or has equivalent internal AI governance frameworks is a useful maturity indicator during due diligence.
Can a vendor be SOC 2 compliant and still train on my data?+
Yes. SOC 2 evaluates security controls like access management, encryption, and change management. It does not evaluate whether a vendor uses customer data for model training. A vendor can maintain perfect SOC 2 compliance while using your inputs to fine-tune their AI models, as long as the data is handled securely throughout the process. This is why AI-specific assessment criteria are necessary beyond traditional compliance checks.
What should I do if my AI vendor changes their data usage policy?+
First, ensure your contract includes a clause requiring advance notice of material policy changes, ideally with a right to terminate if the new policy is unacceptable. Monitor vendor policy pages and subscribe to update notifications. If a vendor changes to a more permissive training policy, assess whether your contractual terms override the general policy, consider whether data already submitted is affected, and evaluate alternative vendors. ThirdProof continuously monitors vendor policies and surfaces changes in vendor reports.
How does ThirdProof evaluate AI vendor risk?+
ThirdProof investigations include an AI governance section that evaluates data usage policies, training commitments, third-party model provider relationships, and the vendor's overall AI governance maturity. This assessment draws on public documentation, trust pages, and compliance evidence. Visit our [methodology page](/methodology) for details on how AI-specific risk factors are incorporated into vendor risk assessments.

Put this into practice

Investigate any vendor across 24 intelligence sources in under 2 minutes. Your first 3 investigations are free.

Start Free Investigation →

No credit card required