Platform Architecture

Human-in-the-Loop Controls: Escalation Design, Override Mechanisms, and AI Liability When HITL Is Bypassed

📅 Updated February 2026 🕑 16 min read ⚖ EU AI Act, ISO 42001, NIST AI RMF

On February 14, 2024, the British Columbia Civil Resolution Tribunal issued a landmark ruling that would reverberate through every enterprise deploying customer-facing AI agents. In Moffatt v. Air Canada, the tribunal held Air Canada directly liable for erroneous information provided by its chatbot — rejecting the airline's argument that its AI system was a "separate legal entity" responsible for its own statements. The total award: approximately $812 CAD in damages and costs. The precedent: incalculable.

That same year, the EU AI Act entered into force, embedding mandatory human oversight requirements into Articles 9, 13, and 14 for high-risk AI systems — with fines reaching €15 million or 3% of global annual turnover for non-compliance. ISO 42001:2023, the first international standard for AI management systems, dedicates an entire control domain to human oversight mechanisms. And NIST AI RMF's GOVERN function makes human accountability the foundational requirement for trustworthy AI.

Yet most enterprise AI deployments treat human-in-the-loop controls as an afterthought — a checkbox rather than an architecture. This analysis examines what real HITL compliance requires: from escalation trigger design to override audit trails, from liability assignment to the precise engineering patterns that satisfy EU AI Act Article 14's "meaningful human control" standard.

Landmark Case: Moffatt v. Air Canada (2024)

Case: Moffatt v. Air Canada, 2024 BCCRT 149 | British Columbia Civil Resolution Tribunal | Decision: February 14, 2024

Facts: Jake Moffatt purchased Air Canada tickets and relied on information from the airline's chatbot, which incorrectly stated that bereavement fares could be claimed retroactively within 90 days of booking. Air Canada's actual policy required applications before travel. Air Canada denied the refund claim and argued its chatbot was a "separate legal entity" whose statements did not bind the airline.

Ruling: Tribunal Member Christopher Rivers rejected Air Canada's argument, holding that the airline is "responsible for all information on its website" including its chatbot. Air Canada was ordered to pay $812.02 CAD in damages and tribunal fees. The ruling established that AI-generated misinformation creates direct corporate liability, with no "autonomous AI" defense available.

HITL Failure: The chatbot operated without escalation to human agents when providing policy-specific information outside its verified knowledge base. No override mechanism triggered review of the incorrect bereavement fare claim. No audit trail captured the misstatement for quality review. A properly designed HITL system would have escalated the policy question to a human representative before providing financial guidance.

Liability Warning: The Air Canada ruling creates a clear doctrine: organizations cannot disclaim responsibility for AI agent outputs by treating the system as an autonomous actor. Every AI agent statement is a corporate statement. HITL bypass is not a cost-saving measure — it is a liability assumption.

€15M EU AI Act fine for human oversight violations (Article 14)

72% of AI incidents involve inadequate human oversight (MIT Sloan, 2024)

Article 14 EU AI Act mandates "meaningful" human oversight for high-risk AI systems

The Regulatory Landscape for Human Oversight

Human-in-the-loop requirements are no longer a design preference — they are a multi-jurisdictional legal mandate. Three frameworks define the compliance floor for enterprise AI agents in 2026.

EU AI Act: Articles 9, 13, and 14

The EU AI Act, in force since August 2024 with high-risk AI system obligations applying from August 2026, creates the most detailed mandatory HITL requirements of any regulatory framework. The relevant articles work in sequence:

Article	Requirement	HITL Implication	Fine for Violation
Article 9	Risk management system — identify, analyze, and evaluate known and foreseeable risks	Must document scenarios requiring human escalation; risk register must include HITL bypass scenarios	€15M or 3% global turnover
Article 13	Transparency and provision of information to deployers — AI systems must be sufficiently transparent to enable meaningful human oversight	System must expose confidence scores, uncertainty signals, and decision rationale to human reviewers	€15M or 3% global turnover
Article 14	Human oversight — design must allow natural persons to understand, monitor, and intervene; override capability mandatory for high-risk systems	Physical override controls, escalation pathways, and audit-trail-captured human interventions required by design	€15M or 3% global turnover
Article 15	Accuracy, robustness, and cybersecurity — systems must behave as intended; human correction mechanisms must address behavioral drift	HITL controls must include mechanisms to detect and correct AI performance degradation	€15M or 3% global turnover

Article 14(1) specifies that high-risk AI systems "shall be designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons." Article 14(4) goes further, requiring that designated persons can "decide not to use the high-risk AI system or to otherwise disregard, override or reverse" its output — with the organization required to document and retain records of such interventions.

ISO 42001:2023: Human Oversight Control Domain

ISO 42001, published November 2023, is the first certifiable AI management system standard. Its control framework includes a dedicated human oversight domain with three mandatory control categories:

Clause 6.1.2 — AI risk treatment: Organizations must determine whether identified AI risks require human control as a treatment measure, documenting the rationale for automated versus human decision pathways.
Clause 8.4 — AI system operation: Requires procedures for monitoring AI system behavior, including defined escalation thresholds and human intervention protocols for anomalous outputs.
Annex A, Control A.6.1.5 — Human oversight of AI systems: Mandates that organizations establish mechanisms enabling appropriate human oversight commensurate with the risk profile of AI system decisions.

NIST AI RMF: GOVERN Function

NIST AI RMF 1.0 (January 2023) establishes the GOVERN function as the foundation of trustworthy AI — the organizational structures, policies, and accountabilities that make all other risk management functions effective. Human oversight requirements appear throughout:

GOVERN 1.1: Policies, processes, and procedures for AI risk management — must explicitly address human oversight responsibilities across the AI lifecycle.
GOVERN 1.4: Organizational teams accountable for AI risks include operations teams responsible for monitoring and intervention. HITL responsibilities must be formally assigned, not assumed.
GOVERN 4.1: Organizational teams are committed to a culture that considers and communicates AI risk — requires training human reviewers on escalation criteria and override authority.
MANAGE 2.4: Mechanisms for detecting and responding to AI incidents include escalation pathways and post-incident review requiring human decision records.

HITL Architecture: Four Design Patterns

Human-in-the-loop is not a single pattern — it is a spectrum of control mechanisms applied at different points in an AI agent's decision and action lifecycle. Enterprise deployments typically require all four patterns operating in parallel.

Pattern 1: Pre-execution Approval

Human approval required before AI agent takes an action. Mandatory for irreversible operations: financial transactions above threshold, contract commitments, data deletion, and external communications. Highest friction, maximum control.

Pattern 2: Confidence-Threshold Escalation

AI proceeds autonomously above a confidence threshold; escalates to human review below it. Requires calibrated confidence scoring — not just softmax probability but epistemic uncertainty estimation. Most common pattern for customer-facing agents.

Pattern 3: Asynchronous Review

AI acts autonomously but logs all decisions for post-hoc human review within a defined window (e.g., 24 hours). Action can be reversed if human review finds error. Requires reversibility design as prerequisite. Suitable for lower-stakes decisions with audit requirement.

Pattern 4: Exception-Based Monitoring

AI operates fully autonomously but triggers human escalation when anomaly signals exceed thresholds (volume spikes, sentiment shifts, error rates, policy violations). Least friction, requires robust anomaly detection. Appropriate only for well-characterized low-risk domains.

Escalation Trigger Design

The critical engineering challenge in HITL is defining escalation triggers with sufficient precision to be implementable and sufficient coverage to satisfy EU AI Act Article 14's "meaningful oversight" standard. Triggers fall into four categories:

Confidence triggers: Model confidence below threshold (e.g., intent classification confidence < 0.75); multiple intents within 0.05 of each other indicating ambiguity; out-of-distribution detection scoring above threshold.
Topic triggers: Hard-coded escalation for regulated domains — legal commitments, pricing exceptions, medical advice, financial guarantees, discrimination complaints, legal threats. These must never be confidence-gated; they escalate regardless of model confidence.
Behavioral triggers: Conversation length exceeding N turns without resolution; repeated user frustration signals (negative sentiment for 3+ consecutive turns); user explicitly requesting human agent.
Anomaly triggers: Actions not matching established behavioral patterns; tool calls to systems not in pre-approved scope; output containing patterns matching restricted categories (PII, credentials, prohibited content).

Design Failure Pattern — Air Canada Gap: Air Canada's chatbot lacked topic-based escalation triggers for policy-specific questions involving financial commitments. A properly designed system would have classified any query involving "refund," "bereavement fare," or "retroactive claim" as mandatory-escalation topics, routing to a human agent before providing policy interpretation. The $812 liability arose from a missing 10-line escalation rule.

Override Mechanism Architecture

EU AI Act Article 14(4) requires that human overseers can "decide not to use" the AI system or "override or reverse" its outputs. This requires a technically distinct override architecture from the escalation layer:

Override mechanisms must operate at three levels. At the session level, an authorized human agent must be able to take over any active AI conversation, receive full context (conversation history, tool calls made, outputs generated), and continue the interaction with the AI disabled. At the action level, reversible actions taken by the AI (bookings, cancellations, data updates) must expose a reversal endpoint accessible to human reviewers within the review window. At the model level, operations teams must have controls to disable individual AI capabilities (specific tool access, specific topic handling) without taking the entire system offline — partial degradation rather than binary on/off.

Implementation: HITL State Machine

Production HITL systems require a state machine that cleanly models the lifecycle of each AI interaction from fully automated to fully human-controlled, with defined state transitions and audit-logged events at every transition point.

# Enterprise HITL State Machine with EU AI Act Article 14 compliance logging
from enum import Enum
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, List, Dict
import uuid

class HITLState(Enum):
    AUTONOMOUS        = "autonomous"        # AI operating without human oversight
    MONITORED         = "monitored"         # Human can observe; no active intervention
    PENDING_REVIEW    = "pending_review"    # Awaiting human approval before proceeding
    HUMAN_TAKEOVER    = "human_takeover"    # Human agent has control; AI disabled
    OVERRIDDEN        = "overridden"        # Human reversed AI decision; logged for Article 14
    ESCALATED         = "escalated"         # Routed to specialist; AI suspended

class EscalationReason(Enum):
    CONFIDENCE_BELOW_THRESHOLD  = "confidence_low"
    TOPIC_MANDATORY_ESCALATION  = "topic_required"
    USER_REQUESTED_HUMAN        = "user_requested"
    ANOMALY_DETECTED            = "anomaly"
    CONVERSATION_UNRESOLVED     = "unresolved"
    HIGH_RISK_ACTION            = "high_risk_action"
    POLICY_VIOLATION_DETECTED   = "policy_violation"

@dataclass
class HITLAuditEvent:
    """Every state transition creates an immutable audit event for Article 14 compliance"""
    event_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    session_id: str = ""
    timestamp: str = field(default_factory=lambda: datetime.utcnow().isoformat())
    from_state: HITLState = HITLState.AUTONOMOUS
    to_state: HITLState = HITLState.AUTONOMOUS
    reason: Optional[EscalationReason] = None
    triggered_by: str = ""    # "system" or human_agent_id
    ai_output_at_transition: str = ""
    human_decision: Optional[str] = None   # Required if human acted
    override_rationale: Optional[str] = None  # Required for OVERRIDDEN state
    retention_required_until: str = ""  # ISO 42001 minimum 3 years for high-risk

class HITLController:
    """
    Implements EU AI Act Article 14 human oversight with full audit trail.
    Mandatory escalation topics defined in MANDATORY_ESCALATION_TOPICS must
    NEVER be confidence-gated — they always escalate regardless of model output.
    """

    MANDATORY_ESCALATION_TOPICS = {
        "refund_policy", "fare_exception", "bereavement_fare",
        "contract_commitment", "legal_threat", "discrimination_complaint",
        "financial_guarantee", "medical_advice", "safety_concern",
        "regulatory_question", "pricing_exception"
    }
    CONFIDENCE_THRESHOLD = 0.75
    MAX_AUTONOMOUS_TURNS = 12

    def evaluate_for_escalation(
        self,
        session_id: str,
        ai_output: Dict,
        detected_topics: List[str],
        turn_count: int,
        user_sentiment_score: float
    ) -> Optional[EscalationReason]:

        # 1. Mandatory topic check — cannot be overridden by confidence score
        for topic in detected_topics:
            if topic in self.MANDATORY_ESCALATION_TOPICS:
                return EscalationReason.TOPIC_MANDATORY_ESCALATION

        # 2. Confidence threshold check
        confidence = ai_output.get("confidence", 1.0)
        if confidence < self.CONFIDENCE_THRESHOLD:
            return EscalationReason.CONFIDENCE_BELOW_THRESHOLD

        # 3. Conversation length — unresolved after max turns
        if turn_count >= self.MAX_AUTONOMOUS_TURNS:
            return EscalationReason.CONVERSATION_UNRESOLVED

        # 4. Sustained negative sentiment (score < -0.5 indicates frustration)
        if user_sentiment_score < -0.5:
            return EscalationReason.ANOMALY_DETECTED

        return None  # No escalation required

    def record_human_override(
        self,
        session_id: str,
        agent_id: str,
        ai_decision: str,
        human_decision: str,
        rationale: str
    ) -> HITLAuditEvent:
        # Article 14(4): override must be logged with rationale for compliance
        event = HITLAuditEvent(
            session_id=session_id,
            from_state=HITLState.PENDING_REVIEW,
            to_state=HITLState.OVERRIDDEN,
            reason=None,
            triggered_by=agent_id,
            ai_output_at_transition=ai_decision,
            human_decision=human_decision,
            override_rationale=rationale
        )
        self._persist_audit_event(event)
        return event

Liability When HITL Is Bypassed: The Air Canada Doctrine

The Moffatt v. Air Canada ruling established several liability principles that apply directly to enterprise AI agent deployments, regardless of jurisdiction — because they reflect the general common law and consumer protection principles that courts globally apply to AI misinformation cases.

The Four Liability Principles from Moffatt

Principle 1: No "Autonomous AI" Defense. The tribunal explicitly rejected Air Canada's claim that its chatbot was a "separate legal entity" whose statements did not bind the corporation. In Member Rivers' words: "Air Canada does not explain why it should not be held responsible for information provided by its agent." For liability purposes, the AI agent is the organization speaking — not an independent third party.

Principle 2: Negligent Misrepresentation Standard. The case was decided on negligent misrepresentation principles: Air Canada had a duty to ensure accurate policy information, breached that duty through its chatbot's incorrect statement, and Mr. Moffatt reasonably relied on the misrepresentation to his financial detriment. This standard applies regardless of whether the misinformation was intentional — the chatbot's confident, incorrect assertion was sufficient.

Principle 3: Constructive Knowledge. Air Canada's argument that it "could not have known" about the chatbot error was rejected because the organization was responsible for testing, monitoring, and maintaining its AI system. Lack of actual knowledge did not absolve the duty to ensure accuracy — particularly for policy questions with direct financial consequences.

Principle 4: Reasonable Reliance. A user who receives a specific, confident answer from an official corporate chatbot is entitled to rely on it. The tribunal found Mr. Moffatt's reliance entirely reasonable. Organizations cannot disclaim liability by burying a "chatbot may be inaccurate" disclaimer while deploying a system that presents information confidently and specifically.

Practical Implication: These four principles establish that HITL bypass on policy-interpretation and financial-commitment queries creates direct corporate liability for every incorrect answer the AI provides. The calculation is not "probability of error × $812 CAD" — it is "probability of regulatory action + cumulative customer reliance claims × class action multiplier." HITL investment must be evaluated against this risk baseline.

EU AI Act Liability Extension

The EU AI Act's Article 14 requirements create an additional, distinct liability pathway beyond common law negligent misrepresentation. Under the Act's enforcement regime (which the European Commission can investigate directly for systemic violations), a high-risk AI system deployed without adequate human oversight controls can face:

Regulatory fines: €15 million or 3% of total global annual turnover for violations of Articles 9, 13, or 14 — whichever is higher. For a mid-size enterprise with €500M revenue, this is €15M. For a large enterprise with €5B revenue, this is €150M.
Market withdrawal orders: National market surveillance authorities can require withdrawal of the AI system from the EU market until compliance is demonstrated.
EU AI Liability Directive (proposed): The companion AI Liability Directive, currently in legislative process, would create a rebuttable presumption of causality for AI harms where the operator violated mandatory requirements — including Article 14 human oversight. This transforms HITL bypass from a compliance risk into presumptive tortious liability.

Escalation Design Patterns for Production Systems

Effective escalation design requires solving three engineering problems simultaneously: routing accuracy (getting the right query to the right human), queue management (ensuring human reviewers can actually respond within acceptable time), and context fidelity (giving human reviewers enough information to make informed decisions quickly).

Routing Architecture

Escalation routing cannot be a single queue. Production systems require at minimum a three-tier routing architecture:

Tier 1 — Standard escalation: Confidence-threshold triggers routed to general customer service queue. Target response time: 2 minutes during business hours. Acceptable for non-urgent customer queries where AI was uncertain.
Tier 2 — Policy escalation: Mandatory-topic triggers (financial commitments, policy interpretations, exceptions) routed to trained policy team with decision authority. Target response time: 5 minutes. Requires specialist knowledge and override authority.
Tier 3 — Compliance escalation: Legal threat, regulatory complaint, discrimination, safety concern topics routed directly to compliance or legal team. Target response time: 15 minutes with 24/7 on-call coverage. Documented decision rationale required per Article 14(4).

Context Handoff Package

When an AI agent escalates to a human, the human reviewer must receive a standardized context package — not just the transcript. The package must include: full conversation history with timestamps; AI confidence scores at each turn; tools called and outputs received; the specific escalation trigger and reason; user account context (history, tier, prior escalations); and a recommended action from the AI system based on available context (even if not confident enough to execute autonomously). Human reviewers who receive complete context packages resolve escalations 3.4x faster with 67% fewer errors than those reading raw transcripts.

Queue Overflow Design

A critical failure mode in production HITL systems is queue overflow — when escalation volume exceeds human reviewer capacity, forcing a choice between letting queries wait (degrading customer experience) and silently routing escalations back to the AI (creating liability). The correct design pattern is explicit degraded-mode operation: when queues exceed defined depth, AI agents must transparently inform users that a human review is required and provide a specific callback commitment, rather than proceeding autonomously with uncertain output. "I need to connect you with a specialist who can confirm this policy — I'll have someone contact you within 2 hours" is a compliant HITL response. "Your bereavement fare can be claimed retroactively" without human review is the liability-creating alternative.

Audit Trail Requirements: Article 14(4) Compliance

EU AI Act Article 14(4) requires that human oversight mechanisms include the ability to "identify, as appropriate, situations in which the high-risk AI system may need to be updated." This implies a mandatory audit trail capturing not just that human interventions occurred, but what pattern of interventions reveals about systemic AI failures requiring model updates or system reconfiguration.

Required Audit Events

The following events must be captured with immutable audit records for Article 14 compliance:

Every escalation event: timestamp, session ID, escalation reason, escalation tier, queue entry time
Every human review decision: reviewer ID, time-to-decision, decision outcome (approve/override/escalate-higher)
Every override: AI output overridden, human decision, rationale, downstream action taken
Every human takeover: session ID, takeover timestamp, reason, resolution time, outcome
Every queue overflow event: when escalation was queued beyond SLA, whether autonomous fallback occurred (this must be flagged as a compliance event)
Every model confidence calibration event: when actual error rates in human-reviewed cases diverge from model confidence scores, triggering recalibration

Retention Requirements

ISO 42001 Clause 7.5 requires documented information from AI management system operations be retained for periods appropriate to the risk level. For high-risk AI systems under the EU AI Act, National authorities' guidance consistently recommends minimum 3-year retention for human oversight records, aligned with standard product liability limitation periods. GDPR Article 5(1)(e)'s storage limitation principle requires records containing personal data be anonymized where retention beyond immediate operational necessity is required — audit trails should separate interaction content (short retention) from oversight decision records (long retention) via pseudonymization architecture.

ISO 42001 Implementation: Human Oversight Control Domain

ISO 42001 certification requires documented evidence that human oversight controls are not merely designed but operationally effective. Certification auditors look for four categories of evidence that many organizations fail to produce:

Evidence Category 1: Risk-Based Oversight Assignment

Organizations must demonstrate that the level of human oversight is proportionate to AI system risk — not uniform. A risk register must document the specific oversight controls assigned to each AI system or function, with documented rationale for why the assigned oversight level is appropriate. Auditors will look for evidence that lower-risk functions are not over-controlled (creating reviewer fatigue that degrades oversight quality for high-risk functions) and that high-risk functions are not under-controlled due to cost pressure.

Evidence Category 2: Reviewer Competency

ISO 42001 Clause 7.2 requires that persons performing human oversight functions are competent to do so. "Having a human in the loop" is not sufficient if the human reviewer lacks the knowledge to make informed decisions about AI outputs. Organizations must document: what training reviewers receive on AI system capabilities and limitations; how reviewer competency is assessed; and how competency requirements are updated when AI system capabilities change.

Evidence Category 3: Effectiveness Measurement

Clause 9.1 requires organizations to evaluate the performance of their AI management system, including human oversight effectiveness. This requires measurable metrics: escalation rates by topic and time period; human override rates (too high suggests AI confidence miscalibration; too low may suggest reviewers are rubber-stamping); time-to-decision by reviewer and tier; post-decision accuracy (where outcomes are observable); and reviewer agreement rates on escalated cases (divergence indicates unclear decision criteria).

Evidence Category 4: Continuous Improvement Loop

Clause 10.2 requires nonconformities to trigger corrective action. For HITL systems, this means: patterns in human overrides must feed back into model fine-tuning or prompt adjustment; systematic escalation spikes in particular topic areas must trigger topic-specific AI capability review; and cases where AI operated in HITL-bypass mode due to queue overflow must be reviewed and counted as incidents against SLA.

HITL Technical Audit Checklist (EU AI Act Article 14 / ISO 42001 / NIST AI RMF)

Mandatory topic list defined and code-reviewed: Topics requiring escalation regardless of confidence score are explicitly enumerated, peer-reviewed by legal/compliance, and version-controlled alongside model deployment configuration.

Confidence calibration validated: Model confidence scores are empirically calibrated against held-out validation set; 0.75 confidence threshold corresponds to <25% actual error rate verified quarterly.

Three-tier escalation routing operational: Standard, policy, and compliance tiers exist with separate queues, SLAs, reviewer pools, and escalation-to-next-tier triggers when SLA is breached.

Override mechanism tested and accessible: Human agents can take over active AI sessions, receive full context package within 5 seconds, and disable AI autonomy for the session within 2 clicks.

Audit trail immutable and complete: Every escalation, review decision, override, and takeover event is logged to append-only audit store with cryptographic integrity; no events suppressible by application code.

Queue overflow policy explicit and compliant: When reviewer queues exceed defined depth, AI shifts to transparent hold mode rather than autonomous fallback; queue overflow events are flagged as compliance incidents.

Retention policy implemented and tested: Oversight decision records retained minimum 3 years; interaction content pseudonymized after operational retention period; deletion schedules tested and documented.

Reviewer competency training current: All human reviewers have completed AI capabilities and limitations training within past 12 months; training updated within 30 days of material AI system changes.

Effectiveness metrics dashboard operational: Escalation rates, override rates, time-to-decision, and post-decision accuracy reviewed monthly; trends documented and acted upon per ISO 42001 Clause 10.2.

Article 14(4) override documentation complete: Human override decisions include structured rationale field; override patterns are reviewed quarterly to identify AI system improvement opportunities.

Partial capability disable controls tested: Operations team can disable specific AI tool access or topic handling without full system shutdown; partial disable events are logged and require documented justification.

Reversibility architecture validated: For all AI actions subject to asynchronous review (Pattern 3), reversal endpoints are tested monthly; reversal SLA (e.g., 24-hour review window) is contractually defined and monitored.

Claire's Human-in-the-Loop Architecture

Mandatory escalation topic registry: Claire maintains a versioned, compliance-reviewed mandatory escalation topic list covering financial commitments, policy interpretation, legal concerns, and regulatory matters — updated with every model deployment, never confidence-gated.
Three-tier routing with SLA enforcement: Every Claire deployment includes configurable three-tier escalation routing with queue depth monitoring, SLA breach alerts, and explicit degraded-mode behavior that keeps users informed rather than silently falling back to AI autonomy.
Immutable audit trail with ISO 42001 retention: Claire's audit architecture writes all HITL events to an append-only log with cryptographic integrity verification; retention policies are configured per jurisdiction with automated pseudonymization of personal data after operational period.
Article 14 override documentation: Claire's reviewer interface includes a structured override rationale capture — not a free-text field but a guided decision tree — ensuring override records satisfy Article 14(4) and can be aggregated for quarterly AI improvement review.
Calibration feedback loop: Claire continuously compares model confidence scores against human override outcomes, surfacing calibration drift to AI operations teams before it creates systematic under-escalation — preventing the pattern that created Air Canada's liability.