Domain 1 β€” Module 8 of 8 100%
8 of 27 overall
Domain 1: Plan and Manage an Azure AI Solution Free ⏱ ~14 min read

Responsible AI: Filters, Auditing & Governance

Building AI is easy. Building AI responsibly is the hard part. Learn how to configure safety filters, implement evaluation, audit AI decisions, and govern agent behaviour with oversight controls.

Responsible AI is not optional

Simple explanation

Responsible AI is like having safety features on a car β€” seatbelts, airbags, speed limiters, and dashcams.

Safety filters stop the AI from saying harmful things (seatbelt). Guardrails keep agents from going off-script (speed limiter). Evaluation tools check if the AI is trustworthy (MOT inspection). Audit logs record what the AI did and why (dashcam). And governance controls decide which tools the agent is allowed to use (keys to certain rooms).

The exam has 4 bullet points just on responsible AI β€” it’s heavily tested.

Safety filters and content moderation

Microsoft Foundry provides configurable content filters on every model deployment:

Filter CategoryWhat It CatchesSeverity Levels
Hate and fairnessDiscriminatory or prejudiced contentSafe (annotation only), Low, Medium, High
SexualSexually explicit or suggestive contentSafe (annotation only), Low, Medium, High
ViolenceViolent or graphic contentSafe (annotation only), Low, Medium, High
Self-harmContent promoting self-harmSafe (annotation only), Low, Medium, High
Prompt shieldsJailbreak attempts and prompt injectionEnabled/Disabled
Groundedness detectionResponses not grounded in provided dataEnabled/Disabled
Exam tip: Custom vs default content filters

Every deployment has default content filters enabled. You can create custom content filter configurations to:

  • Tighten filters for customer-facing apps (block medium severity, not just high)
  • Relax filters for internal research tools (allow clinical/medical terminology)
  • Add prompt shields to prevent injection attacks

The exam tests when to customise filters. Key rule: customer-facing = stricter, internal = can be looser, healthcare/legal = needs domain-specific tuning.

Evaluation instrumentation

Foundry’s evaluation framework lets you measure AI quality systematically:

EvaluatorWhat It MeasuresWhen to Use
GroundednessIs the response based on retrieved data?RAG applications
RelevanceDoes the response answer the question?All generative apps
CoherenceIs the response well-structured and logical?Content generation
FluencyIs the language natural and readable?Customer-facing output
SafetyDoes the response contain harmful content?All applications
F1 scoreDoes the response match expected output?Extraction and classification

Running evaluations

MethodWhen to Use
Manual evaluationOne-off quality check, debugging specific issues
Automated in CI/CDEvery code change, gate deployments on quality scores
Continuous monitoringProduction, detect drift over time
Red teamingPre-launch, adversarial testing to find safety gaps
Real-world example: NeuralMed's safety evaluation

Before deploying their patient chatbot, NeuralMed runs three evaluation passes:

  1. Groundedness evaluation β€” 500 test questions, checking every response cites source medical articles
  2. Safety evaluation β€” adversarial prompts trying to extract diagnosis advice beyond the bot’s scope
  3. Red teaming β€” security team attempts prompt injection, jailbreaks, and social engineering

The chatbot must score above 0.85 groundedness and pass all safety checks before going live. These evaluations run automatically in CI/CD on every model or prompt change.

Auditing: trace logging and provenance

Auditing ComponentWhat It RecordsWhy It Matters
Trace loggingEvery model call, input, output, latency, tokens usedDebug issues, track costs, investigate incidents
Provenance metadataSource documents used for each responseProve responses are grounded, support citations
Approval workflowsHuman review before high-stakes agent actionsPrevent autonomous mistakes in critical workflows

Agent governance

Agents that act autonomously need boundaries. Governance controls include:

Agent oversight modes
FeatureOversight ModeWhat It Means
Full autonomyAutonomousAgent acts without human approval. Use for low-risk, well-tested workflows.
Human-in-the-loopSemiautonomousAgent proposes actions, human approves before execution. Use for high-stakes decisions.
Report onlyAdvisoryAgent recommends but never acts. Use for new or untrusted agents.

Tool-access controls

ControlWhat It DoesExample
Tool allowlistAgent can only use approved toolsCompliance agent can search regulations but not modify records
Tool blocklistAgent explicitly blocked from certain actionsCustomer service bot can look up orders but can’t issue refunds over a threshold
Rate constraintsLimit how often an agent can call a toolAgent can create max 10 support tickets per minute
Approval gatesRequire human approval before specific tool callsAgent must get approval before sending external emails
Real-world example: Atlas Financial's agent governance

Atlas Financial’s compliance agent operates in semiautonomous mode:

  • Autonomous: Search regulations, retrieve documents, generate compliance assessments
  • Human approval required: Flag a loan application as non-compliant, escalate to regulatory team
  • Blocked: Cannot modify loan applications, cannot communicate with external regulators directly

Every action is trace-logged. Provenance metadata links every compliance assessment to the specific regulations it cited. Monthly audit reports are generated automatically from trace logs.

Key terms

Question

What are prompt shields?

Click or press Enter to reveal answer

Answer

A safety feature that detects and blocks jailbreak attempts and prompt injection attacks. Prompt shields analyse both user inputs and any injected content (e.g., text hidden in images or documents) to prevent manipulation of the AI.

Click to flip back

Question

What is groundedness detection?

Click or press Enter to reveal answer

Answer

An evaluation that checks whether a model's response is actually based on the retrieved source data. Ungrounded responses contain fabricated information not found in the provided context β€” a key quality concern in RAG applications.

Click to flip back

Question

What is provenance metadata?

Click or press Enter to reveal answer

Answer

Data recorded alongside AI responses that tracks which source documents were used to generate the response. Enables auditability, citation verification, and compliance reporting.

Click to flip back

Question

What is human-in-the-loop (semiautonomous) mode?

Click or press Enter to reveal answer

Answer

An agent oversight mode where the agent proposes actions but requires human approval before executing them. Used for high-stakes decisions where autonomous mistakes could be costly or harmful.

Click to flip back

Question

What is red teaming for AI?

Click or press Enter to reveal answer

Answer

Adversarial testing where security experts deliberately try to break the AI β€” prompt injection, jailbreaks, social engineering, edge cases. The goal is to find safety gaps before users do.

Click to flip back

Knowledge check

Knowledge Check

NeuralMed's patient chatbot should NEVER provide specific medical diagnoses β€” only direct patients to consult their doctor. A safety evaluation reveals the chatbot occasionally says 'Based on your symptoms, you likely have...' Which control should they implement?

Knowledge Check

Atlas Financial's compliance agent can autonomously search regulations and generate assessments. However, the team wants any decision to flag a loan as 'non-compliant' to require manager approval. Which governance control should they configure?

Knowledge Check

Which of the following is an example of provenance metadata in an AI system?