Foundry AI Gateway + Defender for AI Service + Foundry Guardrails
The Azure-side AI security stack on SC-500: AI Gateway in Azure API Management for Microsoft Foundry traffic, Microsoft Defender for AI Service threat protection, agent guardrails in Foundry, and the Data and AI security dashboard in Defender for Cloud.
The Azure-side AI security stack
The last three modules covered the M365 + identity surfaces (Purview DSPM, Copilot Studio governance, Entra Agent ID). This module covers the Azure-side stack β the controls that protect the Foundry models themselves and the traffic going to them.
Four moving parts:
- AI Gateway in Azure API Management (APIM) β a set of APIM policies that act as a reverse proxy in front of Microsoft Foundry models. Centralises authentication, rate-limits by token usage, logs prompts and completions, and gives you a single chokepoint to enforce policy across multiple AI consumers.
- Microsoft Defender for AI Service β a Defender for Cloud workload protection plan that monitors Foundry AI workloads for threats β prompt injection, suspicious model usage, data exfiltration via model output. Alerts surface in Defender for Cloud and Defender XDR.
- Foundry agent guardrails β content filters, blocklists, evaluations, and topic-restriction settings applied to Foundry-built agents to prevent unsafe output and contain agent behaviour.
- Data and AI security dashboard in Defender for Cloud β the executive view of AI security posture: which workloads have Defender for AI Service enabled, which agents have guardrails configured, top threats detected, compliance posture for AI-specific frameworks.
AI Gateway in Azure API Management
The AI Gateway pattern uses Azure API Management as a reverse proxy in front of Microsoft Foundry model endpoints. APIs are not the only beneficiary β agents that call Foundry models go through the same gateway. Key capabilities:
| AI Gateway capability | What it does | Why it matters |
|---|---|---|
| Token rate limiting | `azure-openai-token-limit` policy enforces per-key, per-IP, or per-subscription token quotas | Stops runaway clients from burning a whole tenant's Foundry quota |
| Token usage metrics | `azure-openai-emit-token-metric` emits prompt/completion token counts as APIM metrics | Gives the security and finance teams real per-consumer Foundry usage data |
| Semantic caching | `azure-openai-semantic-cache-lookup` + `-store` policies cache responses based on semantic prompt similarity (via embeddings) | Cuts cost and latency for repeated similar queries β and keeps cached PII out of recomputation |
| Load balancing across deployments | APIM backend pools spread requests across multiple Foundry deployments (regions, models, capacities) | High availability and quota multiplexing |
| Centralised auth | Consumers authenticate to APIM using subscription keys, OAuth tokens, or managed identity; APIM uses managed identity to authenticate to Foundry | No Foundry API keys distributed to consumers; one identity surface to govern |
| Prompt and completion logging | APIM diagnostic settings log requests/responses to Application Insights / Log Analytics / Storage | Audit trail for what's being asked and what's being answered, with retention you control |
A reference architecture sketch
For Ravi at Maple Genomics:
[Foundry-using app] [Copilot Studio agent]
β β
βββββββββββββββΊ APIM βββββββββββββββ
β
AI Gateway policies:
β’ token rate limiting per consumer
β’ token metrics emitted
β’ semantic cache lookup/store
β’ managed-identity auth to Foundry
β’ prompt/response logging
β
βΌ
Microsoft Foundry endpoints
(multiple deployments load-balanced)
The βconsumerβ can be a custom app, a Copilot Studio agent, a Foundry-built agent, a Logic App, or anything else that talks HTTP to Foundry. The gateway gives Ravi one place to enforce policy and gather telemetry across all of them.
Microsoft Defender for AI Service
Defender for AI Service is one of the workload protection plans in Microsoft Defender for Cloud. Like Defender for Servers / SQL / Storage / Containers / Key Vault, it is enabled per-subscription, billed per protected resource, and surfaces alerts in Defender for Cloud and through the Defender XDR connector into Microsoft Sentinel.
What it detects
- Prompt injection β both direct (an attacker controlling the user prompt to override the system prompt) and indirect (an attacker placing injection content in grounding data the agent reads).
- Jailbreak attempts β patterns matching Microsoft-curated jailbreak prompts.
- Suspicious model interaction patterns β unusual volume, atypical access patterns, anomalous prompts from a single account.
- Data exfiltration via model output β output containing detected sensitive content patterns leaving the workload in a way that suggests exfiltration.
- Anomalous Entra Agent ID activity β correlates with Entra Agent ID signals via Defender XDR.
How itβs enabled
In Microsoft Defender for Cloud, navigate to Environment settings β [Subscription] β Defender plans β AI workloads β Status: On. Some agent and gateway integrations require additional configuration (e.g. AI Gateway in APIM forwarding prompts to the Defender for AI inspection endpoint, when configured).
Foundry agent guardrails
Microsoft Foundry agents have a built-in guardrails configuration that applies safety controls at the agent layer. Conceptually distinct from APIM policies and Defender alerts β these are agent-developer-time controls embedded in the agent definition itself.
| Guardrail | What it does |
|---|---|
| Content filters | Classify and block input and output across categories (violence, hate, sexual, self-harm) at thresholds you set (low / medium / high) |
| Blocklists (custom) | Curated term lists you maintain β block input or output containing the terms (e.g. customer names, project codewords, regulated phrases) |
| Topic restrictions | Define the in-scope topics the agent will engage with; out-of-scope requests are politely declined or redirected |
| Evaluations | Automated quality and safety scoring on agent responses β feeds into agent improvement and compliance reporting |
| Prompt shields | Detects and mitigates direct and indirect prompt-injection attempts via dedicated Foundry classifiers |
| Grounding required | Force the agent to ground responses on configured knowledge β refuse to answer if grounding is unavailable |
Guardrails complement Defender for AI Service (runtime detection) and the AI Gateway (traffic-level policy). The three together are the defense-in-depth pattern for Foundry agents.
Data and AI security dashboard in Defender for Cloud
The Data and AI security dashboard is the executive view in Microsoft Defender for Cloud aggregating AI-related security posture across the subscriptions:
- AI workload inventory β which Foundry workloads exist, where, and whether Defender for AI Service is enabled on them
- Top threats in the period β most-detected Defender for AI alerts and their distribution
- AI compliance posture β alignment with AI controls in the Microsoft Cloud Security Benchmark and other frameworks
- Recommended actions β links into Defender recommendations for AI workloads
- Agent governance signals β coverage of Entra Agent ID conditional access, real-time protection coverage for Copilot Studio agents, integration with Purview DSPM signals
For SC-500, you should know where the dashboard lives (Defender for Cloud > Data and AI security dashboard) and what it aggregates (Defender for AI alerts, plan coverage, compliance posture, recommendations).
Scenario: Ravi assembles the full Azure-side AI security stack
Maple Genomics has 3 Microsoft Foundry deployments serving 12 internal apps and 4 Copilot Studio agents. Ravi assembles the Azure-side controls:
-
AI Gateway in APIM
- Single APIM instance fronts all 3 Foundry deployments via backend pools (load balancing across deployments for resilience).
- All consumers (apps + Copilot Studio agents) authenticate to APIM via managed identity; APIM authenticates to Foundry via APIMβs own managed identity.
- Token rate limit policy: 50K tokens/min per consumer; emit token metrics.
- Semantic caching policy on the genomics-Q&A agentβs endpoint: ~30% of queries are semantic-similar repeats; caching halves latency and Foundry token spend.
- Diagnostic settings: log all prompts and completions to a dedicated Log Analytics workspace with a 90-day retention.
-
Defender for AI Service enabled on the 3 subscriptions hosting Foundry. Alerts route to the same Sentinel workspace as the rest of Maple Genomicsβ Defender stack.
-
Foundry agent guardrails on the 4 Copilot Studio agents and on Foundry-built custom agents:
- Content filters: medium across all four categories (violence, hate, sexual, self-harm); blocks input + output that exceeds threshold.
- Blocklists: customer names, project codewords, regulated phrases (HIPAA categories).
- Topic restriction: each agent has its in-scope topic list; out-of-scope queries politely declined.
- Prompt shields: on (mitigates direct and indirect injection).
-
Data and AI security dashboard in Defender for Cloud: reviewed weekly in the security ops sync. Tracks Defender for AI alert volume, plan coverage (target 100%), and AI compliance posture.
-
Integration with the M365-side stack (previous modules): Purview DSPM for AI continues to monitor Copilot Studio agents; Entra Agent ID CA policies gate invocation; Microsoft 365 admin center governs agent lifecycle. Defender XDR ties M365 alerts and Azure Defender for AI alerts into one incident graph.
End result: prompts and completions are logged centrally; per-consumer quotas are enforced; semantic caching cuts cost; threats against Foundry are detected; agents have guardrails; the dashboard surfaces gaps. Defense-in-depth across identity, runtime, traffic, posture, and detection.
Key terms
Knowledge check
Ravi at Maple Genomics has 14 different applications and agents calling Microsoft Foundry. He wants ONE place to enforce per-consumer token rate limits, log every prompt and response, and ensure no Foundry API keys are distributed to consumers. Which pattern fits best?
Esme at Northwind Bank is enabling threat protection for the Foundry workloads in the bank's data subscription. Which Microsoft Defender for Cloud plan should she enable?
Asha at Aurora Health Service wants the executive view of AI security posture across all 47 subscriptions: which workloads have Defender for AI enabled, what the top threats were in the period, compliance posture against the Microsoft Cloud Security Benchmark for AI, and recommended actions. Where does she look?
Domain 3 AI security wrap-up
Youβve covered the four AI security modules β the genuine differentiator on SC-500 vs AZ-500:
- Discovery + posture β Microsoft Purview DSPM for AI
- M365-side governance β Copilot Studio real-time protection + Microsoft 365 admin center agent management
- Identity + access β Microsoft Entra Agent ID + Defender XDR blast radius
- Azure-side runtime + traffic β AI Gateway in APIM + Defender for AI Service + Foundry agent guardrails + Data and AI security dashboard
The next four modules in Domain 3 cover the remaining compute-security surfaces: VM and server hardening, Defender for Servers across hybrid/multicloud (via Azure Arc), Defender for Containers (AKS/ACR/ACI/Container Apps), and application platform security (App Service, Functions, Logic Apps, WAF, APIM API protection).