Skip to main content
AI Agents & Automation

⏱ About 20 min20 XP

Module Check: Trustworthy Agents

This module asked a single organizing question: what does it mean for an agent to be trustworthy? Answering that question required building a vocabulary — failure modes, evaluation dimensions, guardrail layers, oversight patterns, observability pillars, deployment stages — and a set of engineering practices that translate that vocabulary into systems that actually work. Before the synthesis activity, review the key terms one more time. Then the quiz sequence will probe your ability to apply these ideas, not just recite them.

Flashcards — click each card to reveal the answer

Module Quiz

An agent has a per-step accuracy of 85%. If the agent must complete a 6-step task entirely correctly to deliver value, approximately what is the probability of full task success?

A team wants to evaluate their agent but finds it too expensive to run every test case five times to get stable pass rates. They decide to run each case once and treat the result as ground truth. What is the most important risk of this approach?

An agent for a financial services company has the following tools: get_account_balance, transfer_funds, send_email, read_transaction_history, and update_customer_profile. The agent is deployed specifically to answer balance inquiries. Applying least privilege, which tools should it have?

A human reviewer approved 98% of an agent's proposed actions over four months without rejecting a single one. An independent observer argues this proves the agent is highly reliable. A reliability engineer disagrees. Who is correct and why?

A monitoring system alerts that an agent's average step count per task has increased from 4.2 to 11.8 over the past 48 hours, while task success rate has dropped from 89% to 74%. Which failure mode should the reliability engineer investigate first?

A company skips shadow mode and deploys their new agent directly to 10% of production traffic, arguing that their eval set has 300 cases and an 88% pass rate is sufficient evidence for a live rollout. What is the most significant risk they are accepting?

Synthesis: Build a Reliability Stack

Design a Complete Reliability Stack

  1. You have been hired as the reliability lead for a new agent deployment. The agent — called StudyCoach — is designed for high school students. Given a student's course schedule, assignment list, and self-reported study hours, StudyCoach: generates a personalized weekly study plan, sends reminder messages via the school's messaging system, and updates the student's study log in the school's learning management system (LMS) as tasks are completed.
  2. StudyCoach has four tools: read_student_profile(student_id), send_message(student_id, message_text), update_lms_record(student_id, record_type, data), and generate_study_plan(profile_data).
  3. Your job is to design the complete reliability stack before this agent goes live. Produce a written reliability plan covering all five pillars of this module.
  4. PILLAR 1 — FAILURE MODE DEFENSE
  5. For each of the four failure modes, describe: (a) the specific way it could manifest in StudyCoach, and (b) one concrete technical countermeasure that would prevent or detect it.
  6. PILLAR 2 — EVALUATION FRAMEWORK
  7. Design a 12-task eval set for StudyCoach. List each task (4 typical, 3 edge, 3 adversarial, 2 regression based on plausible bugs), its success criteria, and whether success can be verified programmatically or requires human judgment. Choose your top 6 tasks for the fast suite and explain why.
  8. PILLAR 3 — GUARDRAIL STACK
  9. Design at least four guardrails across input, output, and infrastructure layers. For each, specify the layer, what it checks or blocks, the threat it mitigates, and whether it is automated or requires a human confirmation step.
  10. PILLAR 4 — HUMAN OVERSIGHT DESIGN
  11. StudyCoach will send up to 5 messages per student per week. Design the oversight model: which actions require explicit student (or parent) approval, which can run autonomously, and what the approval gate looks like for the highest-stakes action. Propose one countermeasure against automation bias.
  12. PILLAR 5 — DEPLOYMENT PLAN
  13. Write a three-stage deployment plan for StudyCoach, beginning with shadow mode. For each stage: the scope of deployment, the actions authorized, the advancement criteria (specific metrics), the rollback trigger, and the minimum observability infrastructure required at that stage.
  14. FINAL DELIVERABLE: Write a one-paragraph executive summary of your reliability plan — what are the three greatest risks in this deployment, and how does your plan address each one? This paragraph should be understandable to a school administrator with no technical background.