Foundry AI Gateway + Defender for AI Service + Foundry…

The Azure-side AI security stack

Simple explanation

The last three modules covered the M365 + identity surfaces (Purview DSPM, Copilot Studio governance, Entra Agent ID). This module covers the Azure-side stack — the controls that protect the Foundry models themselves and the traffic going to them.

Four moving parts:

AI Gateway in Azure API Management (APIM) — a set of APIM policies that act as a reverse proxy in front of Microsoft Foundry models. Centralises authentication, rate-limits by token usage, logs prompts and completions, and gives you a single chokepoint to enforce policy across multiple AI consumers.
Microsoft Defender for AI Service — a Defender for Cloud workload protection plan that monitors Foundry AI workloads for threats — prompt injection, suspicious model usage, data exfiltration via model output. Alerts surface in Defender for Cloud and Defender XDR.
Foundry agent guardrails — content filters, blocklists, evaluations, and topic-restriction settings applied to Foundry-built agents to prevent unsafe output and contain agent behaviour.
Data and AI security dashboard in Defender for Cloud — the executive view of AI security posture: which workloads have Defender for AI Service enabled, which agents have guardrails configured, top threats detected, compliance posture for AI-specific frameworks.

AI Gateway in Azure API Management

The AI Gateway pattern uses Azure API Management as a reverse proxy in front of Microsoft Foundry model endpoints. APIs are not the only beneficiary — agents that call Foundry models go through the same gateway. Key capabilities:

AI Gateway in Azure API Management — capabilities and what they buy you
AI Gateway capability	What it does	Why it matters
Token rate limiting	`azure-openai-token-limit` policy enforces per-key, per-IP, or per-subscription token quotas	Stops runaway clients from burning a whole tenant's Foundry quota
Token usage metrics	`azure-openai-emit-token-metric` emits prompt/completion token counts as APIM metrics	Gives the security and finance teams real per-consumer Foundry usage data
Semantic caching	`azure-openai-semantic-cache-lookup` + `-store` policies cache responses based on semantic prompt similarity (via embeddings)	Cuts cost and latency for repeated similar queries — and keeps cached PII out of recomputation
Load balancing across deployments	APIM backend pools spread requests across multiple Foundry deployments (regions, models, capacities)	High availability and quota multiplexing
Centralised auth	Consumers authenticate to APIM using subscription keys, OAuth tokens, or managed identity; APIM uses managed identity to authenticate to Foundry	No Foundry API keys distributed to consumers; one identity surface to govern
Prompt and completion logging	APIM diagnostic settings log requests/responses to Application Insights / Log Analytics / Storage	Audit trail for what's being asked and what's being answered, with retention you control

A reference architecture sketch

For Ravi at Maple Genomics:

[Foundry-using app]              [Copilot Studio agent]
        │                                  │
        └─────────────► APIM ◄─────────────┘
                         │
                  AI Gateway policies:
                  • token rate limiting per consumer
                  • token metrics emitted
                  • semantic cache lookup/store
                  • managed-identity auth to Foundry
                  • prompt/response logging
                         │
                         ▼
              Microsoft Foundry endpoints
              (multiple deployments load-balanced)

The “consumer” can be a custom app, a Copilot Studio agent, a Foundry-built agent, a Logic App, or anything else that talks HTTP to Foundry. The gateway gives Ravi one place to enforce policy and gather telemetry across all of them.

Microsoft Defender for AI Service

Defender for AI Service is one of the workload protection plans in Microsoft Defender for Cloud. Like Defender for Servers / SQL / Storage / Containers / Key Vault, it is enabled per-subscription, billed per protected resource, and surfaces alerts in Defender for Cloud and through the Defender XDR connector into Microsoft Sentinel.

What it detects

Prompt injection — both direct (an attacker controlling the user prompt to override the system prompt) and indirect (an attacker placing injection content in grounding data the agent reads).
Jailbreak attempts — patterns matching Microsoft-curated jailbreak prompts.
Suspicious model interaction patterns — unusual volume, atypical access patterns, anomalous prompts from a single account.
Data exfiltration via model output — output containing detected sensitive content patterns leaving the workload in a way that suggests exfiltration.
Anomalous Entra Agent ID activity — correlates with Entra Agent ID signals via Defender XDR.

How it’s enabled

In Microsoft Defender for Cloud, navigate to Environment settings → [Subscription] → Defender plans → AI workloads → Status: On. Some agent and gateway integrations require additional configuration (e.g. AI Gateway in APIM forwarding prompts to the Defender for AI inspection endpoint, when configured).

Foundry agent guardrails

Microsoft Foundry agents have a built-in guardrails configuration that applies safety controls at the agent layer. Conceptually distinct from APIM policies and Defender alerts — these are agent-developer-time controls embedded in the agent definition itself.

Foundry agent guardrails — agent-developer-time safety controls
Guardrail	What it does
Content filters	Classify and block input and output across categories (violence, hate, sexual, self-harm) at thresholds you set (low / medium / high)
Blocklists (custom)	Curated term lists you maintain — block input or output containing the terms (e.g. customer names, project codewords, regulated phrases)
Topic restrictions	Define the in-scope topics the agent will engage with; out-of-scope requests are politely declined or redirected
Evaluations	Automated quality and safety scoring on agent responses — feeds into agent improvement and compliance reporting
Prompt shields	Detects and mitigates direct and indirect prompt-injection attempts via dedicated Foundry classifiers
Grounding required	Force the agent to ground responses on configured knowledge — refuse to answer if grounding is unavailable

Guardrails complement Defender for AI Service (runtime detection) and the AI Gateway (traffic-level policy). The three together are the defense-in-depth pattern for Foundry agents.

Data and AI security dashboard in Defender for Cloud

The Data and AI security dashboard is the executive view in Microsoft Defender for Cloud aggregating AI-related security posture across the subscriptions:

AI workload inventory — which Foundry workloads exist, where, and whether Defender for AI Service is enabled on them
Top threats in the period — most-detected Defender for AI alerts and their distribution
AI compliance posture — alignment with AI controls in the Microsoft Cloud Security Benchmark and other frameworks
Recommended actions — links into Defender recommendations for AI workloads
Agent governance signals — coverage of Entra Agent ID conditional access, real-time protection coverage for Copilot Studio agents, integration with Purview DSPM signals

For SC-500, you should know where the dashboard lives (Defender for Cloud > Data and AI security dashboard) and what it aggregates (Defender for AI alerts, plan coverage, compliance posture, recommendations).

Scenario: Ravi assembles the full Azure-side AI security stack

Maple Genomics has 3 Microsoft Foundry deployments serving 12 internal apps and 4 Copilot Studio agents. Ravi assembles the Azure-side controls:

AI Gateway in APIM
- Single APIM instance fronts all 3 Foundry deployments via backend pools (load balancing across deployments for resilience).
- All consumers (apps + Copilot Studio agents) authenticate to APIM via managed identity; APIM authenticates to Foundry via APIM’s own managed identity.
- Token rate limit policy: 50K tokens/min per consumer; emit token metrics.
- Semantic caching policy on the genomics-Q&A agent’s endpoint: ~30% of queries are semantic-similar repeats; caching halves latency and Foundry token spend.
- Diagnostic settings: log all prompts and completions to a dedicated Log Analytics workspace with a 90-day retention.
Defender for AI Service enabled on the 3 subscriptions hosting Foundry. Alerts route to the same Sentinel workspace as the rest of Maple Genomics’ Defender stack.
Foundry agent guardrails on the 4 Copilot Studio agents and on Foundry-built custom agents:
- Content filters: medium across all four categories (violence, hate, sexual, self-harm); blocks input + output that exceeds threshold.
- Blocklists: customer names, project codewords, regulated phrases (HIPAA categories).
- Topic restriction: each agent has its in-scope topic list; out-of-scope queries politely declined.
- Prompt shields: on (mitigates direct and indirect injection).
Data and AI security dashboard in Defender for Cloud: reviewed weekly in the security ops sync. Tracks Defender for AI alert volume, plan coverage (target 100%), and AI compliance posture.
Integration with the M365-side stack (previous modules): Purview DSPM for AI continues to monitor Copilot Studio agents; Entra Agent ID CA policies gate invocation; Microsoft 365 admin center governs agent lifecycle. Defender XDR ties M365 alerts and Azure Defender for AI alerts into one incident graph.

End result: prompts and completions are logged centrally; per-consumer quotas are enforced; semantic caching cuts cost; threats against Foundry are detected; agents have guardrails; the dashboard surfaces gaps. Defense-in-depth across identity, runtime, traffic, posture, and detection.